Kampala’s Transit Network – Nangosyah Tom Wellard

This analysis is based on a publicly released dataset of Kampala’s Greater Metropolitan Area (GKMA) paratransit network, collected by a consortium of urban transport researchers between 2019 and 2020. It explores the scale, frequency, shape, and economics of a transit system that was never formally planned, yet moves millions of people every day with a consistency that rivals many networks that were.

Background

Kampala, Uganda’s capital, is home to more than three million people and is one of the fastest-growing cities in sub-Saharan Africa. Its roads were not built for the city it has become, and it has no metro, tram, or light-rail system of any kind. What it has instead is a dense, city-wide fleet of shared 14-seat minibuses known locally as taxis or matatus in the broader East African context supplemented by intercity buses also called coasters connecting Kampala to surrounding districts, and by boda bodas (motorcycle taxis) that fill the gaps everywhere else.

None of these vehicles operate with published timetables, central dispatch, or government subsidy. Drivers fill their seats and go. Passengers flag them down at recognised stops, some of these formally marked, many more known only by shared habit and neighbourhood memory. A good example is a Kasasiro stage in Kamwokya. The last time I used a taxi in Kampala, this wasn’t formally marked and was representative of a common place where the community threw rubbish (kasasiro), hence the stage name.

Despite all of this, the system moved an estimated two million passenger trips per day across a metropolitan area of more than 1,500 km². Understanding how it actually works, how large it is, how frequently it runs, where it concentrates, and what it costs is what this analysis set out to explore.

The dataset used here covers the period 2019/20 and should not be read as a description of Kampala’s transit system today. Kampala’s routes, fares, and network shape change continuously. What this data provides is a well-documented baseline, a snapshot of what the system looked like before the disruptions of 2020 and the broad patterns it reveals are likely to reflect durable structural features of how the city moves.

The Data

This analysis is built on a dataset commissioned by the French Development Agency (AfD) as part of a broader study into paratransit usage and road conditions in Greater Kampala. The data was collected on the ground by a consortium that included Transport for Cairo, MapUganda, and the Swiss transport planning firm Transitec.

Ground-level data collection took place in late 2019, and was converted into a standardised transit data format, the same format used by Google Maps and other navigation apps in April 2020. The dataset covers a full service year: October 2019 to September 2020, and is published under a Creative Commons open licence, making it freely available for research. It includes both GTFS files and GIS shapefiles, allowing for a detailed analysis of the network’s structure, frequency, and spatial distribution.

Table 1. Dataset contents.
What it contains	How much
Mapped routes	397 routes across the metropolitan area
Scheduled trip instances	4,696 individual trip schedules
Stops and stages	1,242 stop locations with GPS coordinates
Arrival-time records	140,800 individual stop-time entries
GPS route shape points	87,792 points defining exact route paths
Trip geometries (GIS)	1,492 trip lines with fare and length data
Major terminal polygons	110 terminal areas with trip-frequency counts

Loading the Data

Show R code

# read GTFS files
routes     <- read_csv(unz("GTFS/GTFS.zip", "routes.txt"),     show_col_types = FALSE)
trips      <- read_csv(unz("GTFS/GTFS.zip", "trips.txt"),      show_col_types = FALSE)
stops      <- read_csv(unz("GTFS/GTFS.zip", "stops.txt"),      show_col_types = FALSE)
freqs      <- read_csv(unz("GTFS/GTFS.zip", "frequencies.txt"),show_col_types = FALSE)
stop_times <- read_csv(unz("GTFS/GTFS.zip", "stop_times.txt"), show_col_types = FALSE)

Show R code

read_shp_from_zip <- function(zip_path) {
  tmp <- tempfile(); dir.create(tmp)
  unzip(zip_path, exdir = tmp)
  shp <- list.files(tmp, pattern = "\\.shp$", full.names = TRUE)
  st_read(shp, quiet = TRUE)
}

sf_use_s2(FALSE)
trips_sf  <- read_shp_from_zip("GIS/Processed_Trips.zip")
stops_sf  <- read_shp_from_zip("GIS/Processed_Stops.zip")
stages_sf <- read_shp_from_zip("GIS/Stages.zip") |> st_make_valid()
unique_sf <- read_shp_from_zip("GIS/Unique_Trips.zip")

The Network at a Glance

Before getting into the detail, it helps to appreciate the sheer scale of what the data captures.

Table 2. GKMA transit network.
Metric	Value
Total routes	397
a. shared-taxi routes	369
b. bus routes	28
Total scheduled trips	4,696
Total stops and stages	1,242
Estimated daily vehicle departures	92,189
Stop-time records	140,800
GPS shape-points	87,792
Service day starts	06:30
Service day ends	23:00
Daily service window	16.5 hours

The figure that stands out most in Table 2 is the 92,189 estimated vehicle departures per day. This is not a count of fixed scheduled services, it is a measure of individual vehicle movements, calculated from how often vehicles cycle along each route throughout the day. It reflects a system where vehicles are almost continuously in motion, rather than completing a small number of fixed daily runs.

Taxis vs Buses

The taxi network is the city’s backbone. It handles all movement within Kampala itself with shorter trips, denser stops at high frequency. Buses and coasters play a supporting role at the edges, connecting the city to surrounding towns like Entebbe, Mukono, and Wakiso. If you are crossing Kampala, you are almost certainly in a taxi.

Show R code

route_summary <- unique_sf |>
  st_drop_geometry() |>
  mutate(
    operator_label = if_else(agency_id == "taxi", "Shared Taxi (14-seat)", "Bus"),
    operator_short = if_else(agency_id == "taxi", "Shared Taxi", "Bus")
  )

op_colours <- c(
  "Shared Taxi (14-seat)" = unname(PAL["taxi"]),
  "Bus"                   = unname(PAL["bus"])
)
op_colours_short <- c(
  "Shared Taxi" = unname(PAL["taxi"]),
  "Bus"         = unname(PAL["bus"])
)

p1 <- route_summary |>
  count(operator_label) |>
  mutate(operator_label = fct_reorder(operator_label, n)) |>
  ggplot(aes(x = operator_label, y = n, fill = operator_label)) +
  geom_col(width = 0.45, show.legend = FALSE) +
  geom_text(aes(label = paste0(n, " routes")),
            hjust = -0.12, size = 4, fontface = "bold", colour = "#2c3e50") +
  scale_fill_manual(values = op_colours) +
  scale_y_continuous(limits = c(0, 720), expand = c(0, 0)) +
  coord_flip() +
  labs(
    title    = "Routes by Operator Type",
    subtitle = "Shared taxis account for 94% of all routes in the GKMA network",
    x        = NULL,
    y        = "Number of routes"
  ) +
  theme_kampala()

p1

Figure 1. Shared taxis account for 94% of all routes in the GKMA network. Buses serve primarily longer intercity corridors.

Taxis charge a higher fare per kilometre than buses, roughly 50% more on average. But for most urban trips, the most common price point across the network was 1,000 UGX (approximately $0.27 at 2019 exchange rates). Fare pricing broadly follows distance with longer routes costing more but prices cluster around round numbers rather than exact per-kilometre calculations. The market sets the price, not a regulator. This could be one of the reasons, but having used the taxis myself, there sometimes seems to be a body of taxi operators that define and agree on what prices are competitive to keep them in business.

Show R code

p2 <- route_summary |>
  filter(LEN_KM > 0, FARE_Q120 > 0) |>
  group_by(operator_short) |>
  summarise(
    `Avg Length (km)` = mean(LEN_KM,    na.rm = TRUE),
    `Avg Fare (UGX)`  = mean(FARE_Q120, na.rm = TRUE),
    n_routes          = n(),
    .groups = "drop"
  ) |>
  pivot_longer(cols = c(`Avg Length (km)`, `Avg Fare (UGX)`),
               names_to  = "stat",
               values_to = "value") |>
  mutate(
    label = if_else(
      stat == "Avg Length (km)",
      paste0(round(value, 1), " km"),
      paste0(comma(round(value)), " UGX")
    ),
    operator_short = fct_relevel(operator_short, "Shared Taxi", "Bus")
  ) |>
  ggplot(aes(x = operator_short, y = value, fill = operator_short)) +
  geom_col(width = 0.45, show.legend = FALSE) +
  geom_text(aes(label = label),
            vjust = -0.55, size = 3.8, fontface = "bold", colour = "#2c3e50") +
  scale_fill_manual(values = op_colours_short) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.22))) +
  facet_wrap(~stat, scales = "free_y") +
  labs(
    title    = "Route Characteristics by Operator",
    subtitle = "Average length and fare",
    x        = NULL,
    y        = NULL
  ) +
  theme_kampala()

p2

Figure 2. Taxi fares versus route distance (km) in the GKMA network.

Where Does the Network Operate?

Note

The network spans roughly 45 km north–south and 34 km east–west, covering an area of approximately 1,560 km². Stops average about 0.79 per km² across the whole metropolitan area, but this number hides a steep drop-off in coverage as you move away from the city centre. Central Kampala is densely served; the edges of the metropolitan area are much thinner.

Show R code

trips_sf_wgs  <- trips_sf  |> st_transform(4326)
stops_sf_wgs  <- stops_sf  |> st_transform(4326)
stages_sf_wgs <- stages_sf |> st_transform(4326)

taxi_trips <- trips_sf_wgs |> filter(str_detect(tolower(agency_id), "taxi"))
bus_trips  <- trips_sf_wgs |> filter(str_detect(tolower(agency_id), "bus"))

# centroids of stages — project to UTM 36N for accuracy, back to WGS84 for leaflet
stage_centroids <- stages_sf |>
  st_transform(32636) |>
  st_centroid() |>
  st_transform(4326) |>
  mutate(
    lon = st_coordinates(geometry)[, 1],
    lat = st_coordinates(geometry)[, 2]
  )

leaflet() |>
  addProviderTiles(providers$CartoDB.Positron, group = "Map") |>
  addProviderTiles(providers$Esri.WorldImagery, group = "Satellite") |>

  addPolylines(
    data    = bus_trips,
    color   = unname(PAL["bus"]),
    weight  = 2.2,
    opacity = 0.75,
    label   = ~route_long,
    group   = "Bus routes"
  ) |>

  addPolylines(
    data    = taxi_trips,
    color   = unname(PAL["taxi"]),
    weight  = 1.4,
    opacity = 0.35,
    label   = ~route_long,
    group   = "Taxi routes"
  ) |>

  addCircleMarkers(
    data        = stops_sf,
    radius      = 3,
    color       = unname(PAL["teal"]),
    fillColor   = unname(PAL["teal"]),
    fillOpacity = 0.6,
    stroke      = FALSE,
    label       = ~stop_name,
    group       = "Stops"
  ) |>

  addCircleMarkers(
    data        = stage_centroids,
    radius      = 4,
    color       = "#ffffff",
    weight      = 2,
    fillColor   = unname(PAL["red"]),
    fillOpacity = 0.85,
    label       = ~Name,
    group       = "Major terminals"
  ) |>

  addLayersControl(
    baseGroups    = c("Map", "Satellite"),
    overlayGroups = c("Taxi routes", "Bus routes", "Stops", "Major terminals"),
    options       = layersControlOptions(collapsed = FALSE)
  ) |>
  addLegend(
    position = "bottomright",
    colors   = unname(PAL[c("taxi", "bus", "teal", "red")]),
    labels   = c("Taxi routes", "Bus routes", "Stops", "Major terminals"),
    title    = "Layer",
    opacity  = 0.85
  )

Figure 3. Every route and stop in the GKMA dataset. Interactive map of the GKMA transit network where Navy lines are taxi routes; gold lines are bus routes. Toggle layers using the control in the top right. Zoom and click stops for names.

The visual density of overlapping lines in the map’s centre tells the network’s most important story: many routes converge on a small number of nodes in the CBD before spreading outward. This hub-and-spoke structure is what the next section measures in detail.

The Hub Structure

Which Terminals Are Most Connected?

To understand how the network is organised, we count how many routes begin or end at each named terminal.

Show R code

route_terminals <- routes |>
  mutate(
    origin = str_split_fixed(route_long_name, "-", 2)[, 1] |> str_trim(),
    dest   = str_split_fixed(route_long_name, "-", 2)[, 2] |> str_trim()
  ) |>
  filter(origin != "", dest != "")

hub_scores <- bind_rows(
  route_terminals |> select(terminal = origin),
  route_terminals |> select(terminal = dest)
) |>
  count(terminal, name = "connections") |>
  arrange(desc(connections)) |>
  head(25)

hub_scores |>
  kbl(caption = "Table 3. Top 25 transit hubs by total route connections.",
      col.names = c("Terminal / Stage", "Total route connections")) |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "left") |>
  row_spec(1, bold = TRUE, background = "#eaf0f6") |>
  row_spec(2:3, bold = TRUE)

Table 3. Top 25 transit hubs by total route connections.
Terminal / Stage	Total route connections
Old Taxi Park	77
New Taxi Park	54
City Square	50
Kisenyi	28
Usafi Taxi Park	25
Nateete Taxi Park	23
Busega	19
Bweyogerere	18
Bwaise	17
Kitoro Taxi Park	17
Wakiso	17
Gayaza Taxi Park	15
Ntinda	15
Mukono Taxi Park	14
Nakawa Taxi Park	13
Zana Stage	12
Matugga Stage	11
Kalerwe Taxi Park	10
Kyanja Taxi Stage	10
Kasangati	9
Kirombe Taxi Stage	9
Kobil Kibuye Taxi Park	9
Namugongo	9
Namungoona	9
Bakuli Stage	7

Show R code

hub_top20 <- hub_scores |> head(20)

origins_count <- route_terminals |> count(terminal = origin, name = "as_origin")
dests_count   <- route_terminals |> count(terminal = dest,   name = "as_dest")

hub_directional <- hub_top20 |>
  left_join(origins_count, by = "terminal") |>
  left_join(dests_count,   by = "terminal") |>
  replace_na(list(as_origin = 0, as_dest = 0)) |>
  pivot_longer(cols = c(as_origin, as_dest),
               names_to = "direction", values_to = "count") |>
  mutate(
    direction = if_else(direction == "as_origin", "As origin", "As destination"),
    terminal  = factor(terminal, levels = rev(hub_top20$terminal))
  )

ggplot(hub_directional,
       aes(y = terminal, x = count, fill = direction)) +
  geom_col(position = "stack", width = 0.7) +
  geom_text(
    data = hub_top20 |>
      mutate(terminal = factor(terminal, levels = rev(hub_top20$terminal))),
    aes(y = terminal, x = connections + 0.5, label = connections, fill = NULL),
    hjust = 0, size = 3.2, fontface = "bold", colour = "#2c3e50"
  ) +
  scale_fill_manual(
    values = c("As origin" = unname(PAL["teal"]), "As destination" = unname(PAL["taxi"])),
    name   = "Role in route"
  ) +
  scale_x_continuous(limits = c(0, 90), expand = c(0, 0)) +
  labs(
    title    = "The 20 Most Connected Transit Hubs",
    subtitle = "Routes beginning or ending at each terminal - Greater Kampala, 2019/20",
    x        = "Number of route connections",
    y        = NULL,
    caption  = "Source: AfD / Transport for Cairo / MapUganda GTFS dataset (CC-BY-3.0)"
  ) +
  theme_kampala() +
  theme(legend.position = "bottom")

Figure 4. Top 20 transit hubs by total route connections. Old Taxi Park leads by a clear margin. The gap between the top node and the rest is the hallmark of a strongly centralised, star-topology network.

The data shows an extreme concentration of connectivity at a single point. Old Taxi Park, a large open-air terminal in the heart of Kampala’s CBD appears at one end of 77 different routes, meaning it features in roughly 19.4% of all routes in the network.

Crucially, 66 of those 77 connections are as a destination far more routes terminate at Old Taxi Park than originate from it. It is where the network converges. New Taxi Park (54 connections) and City Square (50 connections) form a secondary cluster just nearby, all three within a few hundred metres of each other in the CBD.

Show R code

otp_share <- routes |>
  mutate(
    involves_otp = str_detect(route_long_name, "Old Taxi Park"),
    category = if_else(involves_otp, "Routes through\nOld Taxi Park", "All other routes")
  ) |>
  count(category)

ggplot(otp_share, aes(x = "", y = n, fill = category)) +
  geom_col(width = 0.45) +
  geom_text(aes(label = paste0(n, "\n(", round(n/sum(n)*100, 1), "%)")),
            position = position_stack(vjust = 0.5),
            size = 4, fontface = "bold", colour = "white") +
  scale_fill_manual(values = c(
    "Routes through\nOld Taxi Park" = unname(PAL["taxi"]),
    "All other routes"              = unname(PAL["muted"])
  )) +
  coord_flip() +
  labs(title = "Share of Routes Involving Old Taxi Park",
       x = NULL, y = "Number of routes", fill = NULL) +
  theme(axis.text.y = element_blank(), legend.position = "right")

Figure 5. Nearly one in five routes in the GKMA network passes through Old Taxi Park.

This centralisation has a practical consequence. For most cross-city journeys in 2019/20 say, from Nansana in the northwest to Mukono in the east the fastest route almost certainly required passing through central Kampala, adding time, cost, and routing passengers through already congested streets. Direct suburb-to-suburb routes that bypass the centre existed in the dataset but were underrepresented relative to what demand would suggest is needed.

Service Frequency

How Long Does a Passenger Wait?

Show R code

freqs_min <- freqs |>
  mutate(headway_min = as.numeric(headway_secs) / 60)

med_hw  <- median(freqs_min$headway_min)
mean_hw <- mean(freqs_min$headway_min)
pct_10  <- mean(freqs_min$headway_min <= 10) * 100
pct_30  <- mean(freqs_min$headway_min <= 30) * 100

ggplot(freqs_min, aes(x = headway_min)) +
  annotate("rect", xmin = 0, xmax = 10, ymin = 0, ymax = Inf,
           alpha = 0.07, fill = PAL["green"]) +
  geom_histogram(binwidth = 1, fill = PAL["taxi"], colour = "white",
                 linewidth = 0.25, alpha = 0.92) +
  geom_vline(xintercept = med_hw,  colour = PAL["red"],    linewidth = 0.6,
             linetype = "dashed") +
  geom_vline(xintercept = mean_hw, colour = PAL["orange"], linewidth = 0.6,
             linetype = "dotted") +
  annotate("text", x = 0.4, y = 2350,
           label  = paste0(round(pct_10, 1), "% of service\nat ≤10 min"),
           colour = PAL["green"], hjust = 0, vjust = 1,
           size = 3.3, fontface = "bold") +
  annotate("text", x = med_hw + 0.4, y = 2550,
           label  = paste0("Median: ", round(med_hw, 1), " min"),
           colour = PAL["red"], hjust = 0, size = 3.4, fontface = "bold") +
  annotate("text", x = mean_hw + 0.4, y = 2350,
           label  = paste0("Mean: ", round(mean_hw, 1), " min"),
           colour = PAL["orange"], hjust = 0, size = 3.4, fontface = "bold") +
  annotate("text", x = 29.5, y = 50,
           label  = paste0(round(pct_30, 1), "% of service shown"),
           colour = PAL["muted"], hjust = 1, size = 3.0, fontface = "italic") +
  coord_cartesian(xlim = c(0, 30), ylim = c(0, 2650)) +
  scale_x_continuous(breaks = seq(0, 30, 5),
                     labels = \(x) paste0(x, " min")) +
  scale_y_continuous(labels = comma,
                     expand = expansion(mult = c(0, 0.02))) +
  labs(
    title    = "How Long Does a Passenger Wait? Usually Less Than 10 Minutes",
    subtitle = paste0("n = 4,696 service windows  ·  tail beyond 30 min omitted (",
                      round(100 - pct_30, 1), "% of windows)"),
    x        = "Minutes between vehicles",
    y        = "Number of service windows",
    caption  = "Source: GTFS frequencies.txt — AfD / Transport for Cairo / MapUganda"
  ) +
  theme_kampala()

Figure 6. Distribution of scheduled wait times across all 4,696 service windows. The large majority of services cluster around 6–10 minutes. 86.1% of all service runs at 10 minutes or better.

The median wait time between vehicles across the network is 7.8 minutes. At that frequency, a passenger arriving at any covered stop can expect to wait about four minutes on average before a vehicle appears. You do not need a timetable. You just go and there will be a taxi to pick you up.

The Same Frequency, All Day Long

This is the most surprising finding in the data. In most formal transit systems, frequency tracks demand: more vehicles during morning and evening rush hours, fewer during quiet afternoon periods. Kampala’s taxis show almost none of this pattern.

Show R code

headway_by_window <- freqs |>
  mutate(
    start_h = gtfs_time_to_h(start_time),
    end_h   = gtfs_time_to_h(end_time),
    window_label = paste0(
      sprintf("%02.0f:00", floor(start_h)), "–",
      sprintf("%02.0f:00", floor(end_h))
    )
  ) |>
  group_by(window_label, start_h) |>
  summarise(
    avg_hw  = mean(headway_secs / 60, na.rm = TRUE),
    p10_hw  = quantile(headway_secs / 60, 0.10),
    p90_hw  = quantile(headway_secs / 60, 0.90),
    .groups = "drop"
  ) |>
  arrange(start_h)

ggplot(headway_by_window, aes(x = start_h, y = avg_hw)) +
  annotate("rect", xmin = -Inf, xmax = Inf, ymin = 15, ymax = 20,
           fill = "#e74c3c", alpha = 0.08) +
  annotate("text", x = 22.5, y = 19.2,
           label = "Typical European off-peak bus (15–20 min)",
           colour = PAL["red"], size = 3.0, hjust = 1, fontface = "italic") +
  geom_ribbon(aes(ymin = p10_hw, ymax = p90_hw),
              fill = PAL["taxi"], alpha = 0.15) +
  geom_line(colour = PAL["taxi"], linewidth = 1.5) +
  geom_point(colour = PAL["taxi"], size = 3.5, fill = "white",
             shape = 21, stroke = 2) +
  geom_text(aes(label = round(avg_hw, 1)),
            vjust = -1.1, size = 3.2, fontface = "bold", colour = PAL["taxi"]) +
  scale_x_continuous(
    breaks = headway_by_window$start_h,
    labels = headway_by_window$window_label
  ) +
  scale_y_continuous(limits = c(0, 22), breaks = seq(0, 20, 5),
                     labels = \(x) paste0(x, " min")) +
  labs(
    title    = "Frequency Profile for taxis from Dawn to 11pm",
    subtitle = "Average wait time (± spread across routes) by time of day",
    x        = "Time of day",
    y        = "Average wait between vehicles (minutes)",
    caption  = "Source: GTFS frequencies.txt. European reference from UITP Global Transit Barometer (2022)."
  ) +
  theme_kampala() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

Figure 7. Average wait time by service window across the day. The line is nearly flat — varying only between 7.3 and 8.1 minutes over a 16.5-hour period. The grey band shows what a typical European city bus network’s off-peak service looks like (15–20 min) for comparison.

The average wait time fluctuates only between 7.3 minutes (evenings) and 8.1 minutes (mid-morning to afternoon) across the entire operating day. There is no peak boost and no off-peak collapse.

The explanation is structural. Each driver earns money by carrying passengers, and as long as people are travelling on any given road at any point in the day, it is worth driving. The market mechanism produces consistency that no timetable enforces and the result is a system that, at 22:00 on a weeknight, still runs more frequently than most European urban buses at their peak.

Important

At the system’s median wait time of 7.8 minutes, a passenger arriving at any served stop at any time between 06:30 and 23:00 could expect to board within roughly four minutes on average. This was maintained throughout the entire day, without central coordination, published timetables, or any form of subsidy on the transport fares.

The Full Picture of Wait Times

Show R code

bucket_data <- freqs |>
  mutate(
    hw_min = as.numeric(headway_secs) / 60,
    bucket = cut(hw_min,
                 breaks = c(0, 5, 10, 15, 20, 30, 60, Inf),
                 labels = c("≤5 min", "6–10 min", "11–15 min",
                            "16–20 min", "21–30 min", "31–60 min", ">60 min"),
                 right  = TRUE)
  ) |>
  count(bucket) |>
  mutate(pct = n / sum(n) * 100)

ggplot(bucket_data, aes(x = bucket, y = pct, fill = bucket)) +
  geom_col(width = 0.75, show.legend = FALSE) +
  geom_text(aes(label = paste0(round(pct, 1), "%")),
            vjust = -0.4, size = 3.3, fontface = "bold") +
  scale_fill_manual(values = c(
    "≤5 min"    = PAL["green"],
    "6–10 min"  = PAL["teal"],
    "11–15 min" = PAL["taxi"],
    "16–20 min" = PAL["orange"],
    "21–30 min" = PAL["red"],
    "31–60 min" = PAL["muted"],
    ">60 min"   = "#bdc3c7"
  )) +
  scale_y_continuous(limits = c(0, 72), labels = \(x) paste0(x, "%")) +
  labs(
    title    = "86% of Service Runs Every 10 Minutes or Better",
    subtitle = "Share of service windows by wait-time band across the whole network",
    x        = "Minutes between vehicles",
    y        = "Share of service windows (%)",
    caption  = "Source: GTFS frequencies.txt — AfD / Transport for Cairo / MapUganda"
  ) +
  theme_kampala()

Figure 8. Share of service windows by wait-time band. The vast majority of service, 86.1% runs at 10 minutes or better.

Route Characteristics

How Many Stops Does a Journey Cover?

Show R code

stops_per_trip <- stop_times |> count(trip_id, name = "n_stops")
med_spt  <- median(stops_per_trip$n_stops)
mean_spt <- mean(stops_per_trip$n_stops)

ggplot(stops_per_trip, aes(x = n_stops)) +
  geom_histogram(binwidth = 3, fill = PAL["teal"], colour = "white",
                 linewidth = 0.1, alpha = 0.9) +
  geom_vline(xintercept = med_spt,  colour = PAL["red"],    linewidth = 0.6, linetype = "dashed") +
  geom_vline(xintercept = mean_spt, colour = PAL["orange"], linewidth = 0.6, linetype = "dotted") +
  annotate("text", x = med_spt  + 1, y = 360,
           label  = paste0("Median: ", round(med_spt)),
           colour = PAL["red"],    hjust = 0, size = 3.4, fontface = "bold") +
  annotate("text", x = mean_spt + 1, y = 310,
           label  = paste0("Mean: ",   round(mean_spt, 1)),
           colour = PAL["orange"], hjust = 0, size = 3.4, fontface = "bold") +
  scale_x_continuous(breaks = seq(0, 100, 10)) +
  scale_y_continuous(labels = comma) +
  labs(
    title    = "A Typical Trip Passes Through 27 Stops",
    subtitle = paste0("n = ", fmt_n(nrow(stops_per_trip)), " trips  ·  min: ",
                      min(stops_per_trip$n_stops), "  ·  max: ", max(stops_per_trip$n_stops)),
    x        = "Number of stops served per trip",
    y        = "Number of trips",
    caption  = "Source: GTFS stop_times.txt — AfD / Transport for Cairo / MapUganda"
  ) +
  theme_kampala()

Figure 9. Distribution of stop counts per trip. The median trip serves 27 stops reflecting the continuous pick-up and drop-off pattern that characterises informal transit. Long-distance bus routes extend to 92 stops.

The high stop count is a feature of how informal transit works. Taxis pick up and drop off passengers at recognised points continuously along the route, rather than following a fixed set of official stops. This makes them accessible, you rarely need to walk far to catch one but it also contributes to the slow average speeds that anyone who has been in Kampala traffic will recognise.

Route Length and Fare

Show R code

fare_data <- unique_sf |>
  st_drop_geometry() |>
  filter(LEN_KM > 0, FARE_Q120 > 0) |>
  mutate(
    operator_label = if_else(agency_id == "taxi", "Shared Taxi", "Bus"),
    fare_per_km    = FARE_Q120 / LEN_KM,
    fare_band      = paste0(comma(round(FARE_Q120 / 500) * 500), " UGX"),
    fare_band      = fct_reorder(fare_band, FARE_Q120)
  )

fare_per_km_summary <- fare_data |>
  group_by(operator_label) |>
  summarise(avg_per_km = mean(fare_per_km, na.rm = TRUE), n = n(), .groups = "drop")

p_scatter <- ggplot(fare_data, aes(x = LEN_KM, y = FARE_Q120, colour = operator_label)) +
  geom_jitter(aes(shape = operator_label), width = 0.3, height = 50,
              alpha = 0.45, size = 2.2) +
  geom_smooth(data = fare_data |> filter(operator_label == "Shared Taxi"),
              method = "lm", se = TRUE,
              colour = unname(PAL["taxi"]), fill = unname(PAL["taxi"]),
              alpha = 0.10, linewidth = 0.9, linetype = "dashed", show.legend = FALSE) +
  annotate("segment", x = 0, xend = 44, y = 1000, yend = 1000,
           linetype = "dotted", colour = unname(PAL["green"]), linewidth = 0.4) +
  annotate("text", x = 44, y = 1000,
           label = "1,000 UGX — most common fare",
           hjust = 1, vjust = -0.4, size = 2.9,
           colour = unname(PAL["green"]), fontface = "italic") +
  annotate("segment", x = 0, xend = 44, y = 3000, yend = 3000,
           linetype = "dotted", colour = unname(PAL["muted"]), linewidth = 0.4) +
  annotate("text", x = 44, y = 3000,
           label = "3,000 UGX — long-distance fare",
           hjust = 1, vjust = -0.4, size = 2.9,
           colour = unname(PAL["green"]), fontface = "italic") +
  scale_colour_manual(values = c("Shared Taxi" = unname(PAL["taxi"]), "Bus" = unname(PAL["red"]))) +
  scale_shape_manual(values = c("Shared Taxi" = 16, "Bus" = 17)) +
  scale_x_continuous(labels = \(x) paste0(x, " km"), limits = c(0, 45), expand = c(0, 0)) +
  scale_y_continuous(labels = \(x) paste0(comma(x), " UGX"), breaks = seq(500, 4000, 500)) +
  labs(
    title    = "Longer Routes Cost More",
    subtitle = "Each dot is one route",
    x        = "Route length (km)",
    y        = "Fare (UGX)",
    colour   = "Operator",
    shape    = "Operator"
  ) +
  theme_kampala() +
  theme(legend.position = "bottom")

p_scatter

Figure 10. Route length versus fare, coloured by operator. Fares rise with distance but cluster around round numbers, a signature of informal price-setting. Bus routes sit in the longer-distance, higher-fare range.

Route Length Distribution

Show R code

length_data <- fare_data |>
  mutate(bucket = cut(LEN_KM,
                      breaks = c(0, 5, 10, 15, 20, 30, Inf),
                      labels = c("0–5 km", "6–10 km", "11–15 km",
                                 "16–20 km", "21–30 km", ">30 km"),
                      right  = TRUE))

bucket_counts <- length_data |>
  count(operator_label, bucket) |>
  group_by(operator_label) |>
  mutate(pct = n / sum(n) * 100) |>
  ungroup() |>
  mutate(operator_label = fct_relevel(operator_label, "Shared Taxi", "Bus"))

length_colours <- c(
  "0–5 km"    = unname(PAL["green"]),
  "6–10 km"   = unname(PAL["teal"]),
  "11–15 km"  = unname(PAL["taxi"]),
  "16–20 km"  = unname(PAL["orange"]),
  "21–30 km"  = unname(PAL["red"]),
  ">30 km"    = unname(PAL["muted"])
)

p_buckets <- ggplot(bucket_counts, aes(x = bucket, y = pct, fill = bucket)) +
  geom_col(width = 0.65, show.legend = TRUE) +
  geom_text(aes(label = paste0(round(pct, 0), "%")),
            vjust = -0.4, size = 3.2, fontface = "bold", colour = "#2c3e50") +
  scale_fill_manual(values = length_colours, name = "Length band") +
  scale_y_continuous(limits = c(0, 75), expand = c(0, 0),
                     labels = \(x) paste0(x, "%")) +
  facet_wrap(~operator_label) +
  labs(
    title    = "Most Taxi Routes Are Under 15 km",
    subtitle = "Share of routes in each distance band, by operator",
    x        = "Route length",
    y        = "Share of routes (%)"
  ) +
  theme_kampala() +
  theme(legend.position = "none", axis.text.x = element_text(angle = 30, hjust = 1))

p_buckets

Figure 11. Route length distributions by operator.

Show R code

length_summary <- fare_data |>
  group_by(operator_label) |>
  summarise(avg_km = mean(LEN_KM), med_km = median(LEN_KM), n = n(), .groups = "drop") |>
  mutate(operator_label = fct_reorder(operator_label, avg_km))

p_avg <- ggplot(length_summary, aes(y = operator_label, x = avg_km, fill = operator_label)) +
  geom_col(width = 0.45, show.legend = FALSE) +
  geom_point(aes(x = med_km), shape = 21, size = 4,
             fill = "white", colour = "#2c3e50", stroke = 1.5) +
  geom_text(aes(label = paste0("Avg: ", round(avg_km, 1), " km")),
            hjust = -0.15, size = 3.8, fontface = "bold", colour = "#2c3e50") +
  geom_text(aes(x = med_km, label = paste0("Med: ", round(med_km, 1))),
            vjust = -1.2, size = 3.0, colour = "#2c3e50") +
  annotate("text", x = 28, y = 0.55, label = "◯ = median",
           size = 3.0, colour = unname(PAL["muted"]), fontface = "italic") +
  scale_fill_manual(values = c("Shared Taxi" = unname(PAL["taxi"]), "Bus" = unname(PAL["bus"]))) +
  scale_x_continuous(limits = c(0, 35), expand = c(0, 0),
                     labels = \(x) paste0(x, " km")) +
  labs(title = "Bus Routes Are Roughly Twice as Long on Average",
       subtitle = "Average (bar) and median (circle) route length per operator",
       x = "Route length (km)", y = NULL) +
  theme_kampala()

p_avg

Figure 12. Most taxi routes fall below 15 km within city trips. Bus routes extend much further, reflecting intercity corridors.

The Terminal Network in Detail

Show R code

stage_data <- stages_sf |>
  st_transform(32636) |>
  st_centroid() |>
  st_transform(4326) |>
  mutate(
    lon      = st_coordinates(geometry)[, 1],
    lat      = st_coordinates(geometry)[, 2],
    combined = tripspu_o + tripspu_d
  )

leaflet(stage_data) |>
  addProviderTiles(providers$CartoDB.Positron) |>
  addCircleMarkers(
    lng         = ~lon,
    lat         = ~lat,
    radius      = ~pmin(5 + combined * 0.8, 30),
    color       = "#1a3a5c",
    weight      = 1.5,
    fillColor   = ~colorNumeric("YlOrRd", tripspu_d)(tripspu_d),
    fillOpacity = 0.8,
    label       = ~paste0(Name, " — ", combined, " connections"),
    popup       = ~paste0(
      "<strong>", Name, "</strong><br>",
      "Departures/unit: ",   tripspu_o, "<br>",
      "Arrivals/unit: ",     tripspu_d, "<br>",
      "Total: ",             combined
    )
  ) |>
  addLegend(
    position = "bottomright",
    pal      = colorNumeric("YlOrRd", stage_data$tripspu_d),
    values   = ~tripspu_d,
    title    = "Arriving routes<br>per terminal",
    opacity  = 0.9
  )

Figure 13. Major transit terminals across the GKMA, sized by their combined trip activity (arrivals + departures). Old Taxi Park and nearby CBD terminals dominate. Colour shows how many routes terminate at each terminal.

What This Data Can and Cannot Tell Us

Every significant finding in this analysis flows from one basic fact: the Kampala taxi system was not, and has never been, operated by a transit authority we could make a case for organisations like SACCOs or groups of taxi owners and that own and run the taxis as a formal authourity in the background. No one planned the routes, set the frequencies, or determined the fares. Individual operators made individual decisions, day after day, and the aggregate result of those decisions is the network this data describes.

Measured against the numbers, that network holds up remarkably well:

Frequency: Median 7.8-minute wait, stable across a 16.5-hour day
Coverage: 397 routes serving 1,242 stops across 1,560 km²
Capacity: Approximately 92,000 vehicle departures per day
Operating hours: Earlier start and later finish than most formal systems

But this is where we must be careful about what to conclude in the analysis.

The Informal vs Formal Question Is Not Simple

It is tempting to look at an eight-minute median wait time and declare the informal network a success story that formal planning should leave alone. That conclusion would be too quick.

The frequency numbers are real, but frequency is only one dimension of what makes a transit system work for the people who depend on it. This data cannot tell us how long a journey actually takes door-to-door once loading time, CBD transfers, and traffic are factored in. It cannot tell us how safe those 92,000 daily vehicle movements are, or what the environmental cost of running them is. It cannot tell us whether the 1,000 UGX fare is genuinely affordable for the lowest-income residents of the metropolitan area, or whether those residents are effectively excluded from parts of the network that do not serve their neighbourhoods.

There is also a deeper cultural dimension that raw transit data cannot capture. The way time is perceived and managed in Kampala, as in much of Uganda is not identical to the clock-driven, schedule-optimised relationship with time that underpins how Western transit systems are designed and evaluated. An eight-minute headway is meaningful in a context where people structure their days around fixed appointments. It means something different in a context where time is more relational and flexible, and where waiting is a normal part of daily life rather than a failure of the system. This does not make frequency irrelevant, it does not but it does mean that borrowing Western planning benchmarks wholesale and applying them to Kampala requires care.

What the Data Could Not Capture

There are specific gaps in this dataset that limit what conclusions can be drawn with confidence including:

Weather is one of the most significant. Kampala receives heavy rainfall on a significant number of days each year, and anyone familiar with the city knows that rain does not simply slow traffic, it can reshape how and whether people travel entirely. Roads flood, journeys become longer or impossible, and demand patterns shift in ways that a single annual average cannot reflect. This dataset cannot show us how the network behaves on a wet Tuesday in April versus a dry one in January, and that distinction matters enormously for understanding how reliable the system actually is in practice.

Similarly, the data captures a formal snapshot of routes and frequencies but cannot fully represent the informal flexibility that is one of the system’s genuine strengths. Drivers deviate, stage locations shift with the seasons, fares negotiate on certain corridors, and the whole network responds to events including political gatherings, market days, school calendars in ways that a GTFS file cannot encode.

Reading This as a Baseline

This dataset captures Kampala’s transit system in the months immediately before COVID-19 disrupted everything. Reduced passenger demand, economic pressure on operators, and the aftermath of lockdowns almost certainly altered the network in ways this data cannot show.

That is precisely why this baseline matters. It documents what the system was capable of producing on its own terms, before external shock and it does so with enough rigour to serve as a genuine reference point. Any future effort to formalise, expand, or reform Greater Kampala’s transit will need to grapple honestly with what is shown in the data: not as a ceiling to be preserved, but as a floor that any replacement or reform must at minimum match, and ideally exceed, for the over two million people who depend on it every day.

Notes

Item	Detail
R version	R version 4.5.1 (2025-06-13)
Quarto version	1.5.57
Primary packages	tidyverse, sf, leaflet, kableExtra, patchwork, ggrepel
Data format	GTFS (General Transit Feed Specification)
Spatial CRS	WGS 84 (EPSG:4326)
GTFS version	1.15042020 (April 2020)
Service calendar	10 October 2019 – 29 September 2020
Data licence	Creative Commons Attribution 3.0 (CC-BY-3.0)

Daily departure estimate. The figure of approximately 92,000 daily vehicle departures was calculated by taking each route’s operating window and dividing it by how frequently vehicles cycle, then summing across all routes. This gives a conservative lower-bound figure; actual departures may be higher during high-demand periods.

Hub connectivity. Terminal names were extracted from the route name field in the dataset, which follows an Origin–Destination convention separated by a hyphen. Where route names contained multiple hyphens, only the first was used to split origin from destination.

Limitations. This dataset is a snapshot, not a census. Kampala’s taxi network is dynamic: routes open and close, fares adjust with fuel prices, and new terminals emerge as the city grows. The 2019/20 data should be read as a baseline for the period it covers, not as a fixed description of the network.

Data source: GKMA Transit Dataset, CC-BY-3.0.