Autonomous Vehicle Data: What Cities Must Build Now

Mākslīgais intelekts ražošanā un viedajās rūpnīcāsBy 3L3C

Connected and autonomous vehicles depend on AI-ready data. Learn what cities must build now: governance, storage, real-time analytics, and safety controls.

connected-autonomous-vehiclessmart-city-datapublic-sector-aimobility-analyticsdata-governancev2x
Share:

Featured image for Autonomous Vehicle Data: What Cities Must Build Now

Autonomous Vehicle Data: What Cities Must Build Now

In 2024, road traffic crashes still killed around 1.19 million people globally each year, according to widely cited public health reporting from the World Health Organization. That number is the uncomfortable baseline for every “smart mobility” conversation. If connected and autonomous vehicles (CAVs) reduce collisions even modestly, the payoff is measured in lives saved—not just smoother commutes.

But here’s the stance I’ll take: the hardest part of CAV adoption isn’t the vehicle. It’s the data plumbing and governance that cities and public agencies need to put in place. That’s the real “where the rubber hits the road” moment from the SmartCitiesWorld podcast discussion with leaders from Cloudera and IBM Storage—because CAVs don’t just move people, they generate a continuous stream of sensor, video, and telemetry data that forces cities to think like high-availability, safety-critical operators.

This post sits inside our “Mākslīgais intelekts ražošanā un viedajās rūpnīcās” series for a reason. The same AI patterns that make factories more efficient—real-time analytics, predictive maintenance, quality control, traceability—also apply to urban mobility. A modern city is starting to resemble a distributed factory floor: instrumented, automated, and judged by uptime.

CAVs make cities a data system first, a road system second

Answer first: If your city treats CAVs as a transport project only, you’ll miss the main requirement: an AI-ready data architecture that can ingest, govern, and act on streaming data safely.

CAVs blend two capabilities:

  • Connected vehicles: cars that exchange data with cloud services, other vehicles (V2V), and infrastructure (V2I/V2X)
  • Autonomous vehicles: cars that use AI to perceive the environment and make driving decisions

Put them together and you get an ecosystem where the vehicle is a roaming sensor platform. Every test fleet produces:

  • Camera and lidar/radar observations (often the largest volume)
  • High-frequency telemetry (speed, braking, steering angle)
  • Health signals (battery state, motor temperatures, sensor faults)
  • Event logs (near-misses, harsh braking, disengagements)
  • Map updates and “edge cases” (weird scenarios that break assumptions)

From a public-sector lens, that creates a new operational reality: mobility becomes a continuous feedback loop. Roads, signals, curbs, loading zones, and enforcement policies can’t be static. They’ll be pressured to evolve based on evidence.

Snippet you can quote: “Autonomous vehicles don’t just use data—they force cities to run on data.”

The hidden infrastructure: storage, pipelines, and decision latency

Answer first: CAV programs fail quietly when they can’t manage the “three L’s”: log volume, lineage, and latency.

The podcast guests come from data platform and storage backgrounds, and that’s telling. Autonomous mobility isn’t blocked by a single AI model; it’s blocked by the ability to reliably handle petabytes of multimodal data, trace it, and use it fast enough.

Log volume: the petabyte problem is real

Even conservative CAV data strategies generate massive datasets—especially when video is involved. Cities may not store raw video for every mile, but they will store:

  • Triggered clips around incidents
  • Training datasets for perception and prediction
  • “Ground truth” samples for audits and safety reporting
  • Aggregated traffic patterns for planning

This has a direct smart city implication: your procurement, retention rules, and storage tiers need to be designed for scale. If your data platform struggles with a few months of pilot data, it won’t survive a multi-year roll-out.

Lineage: trust depends on traceability

Public agencies don’t get to say “the model said so.” When a collision happens, or when residents dispute enforcement or curb changes, you need to answer:

  • What data was collected?
  • Which version of the model used it?
  • Who had access?
  • What transformations were applied?

That’s a manufacturing-grade requirement. Factories call it traceability. Cities should, too.

Latency: real-time isn’t a buzzword, it’s safety

For CAVs, some decisions must happen on the vehicle (edge). But cities still need near-real-time intelligence for:

  • Signal timing changes during incidents
  • Priority routing for emergency response
  • Dynamic speed management in high-risk conditions
  • Coordinating work zones and temporary closures

A practical target I’ve seen work: separate “seconds-level” operational data (streaming) from “days-to-months” planning data (batch/warehouse). Mixing them usually creates both cost blowups and missed SLAs.

AI in public transport: CAVs as a blueprint for smarter governance

Answer first: CAVs can improve mobility only if cities build data-driven governance that’s as modern as the vehicles.

Most public-sector CAV discussions get stuck at “when will it be fully autonomous?” That’s not the best question for 2025. The more useful question is: how do we govern mixed traffic—human drivers, automated shuttles, delivery robots, micromobility—without chaos?

Here are three governance shifts that CAVs push cities toward:

1) Evidence-based street operations

CAVs generate fine-grained information about where the city is confusing, dangerous, or poorly signed. That data can drive:

  • Better curb design (pick-up/drop-off zones that reduce double-parking)
  • Work zone standards that are machine-readable and consistent
  • Intersection redesign based on near-miss patterns, not only crash history

This is where AI can be bluntly useful: use anomaly detection on near-miss and disengagement clusters to find “risk hotspots” before someone gets hurt.

2) Procurement that favors interoperability

Cities that lock into one vendor’s telemetry formats create long-term fragility. Your CAV data ecosystem should be designed around:

  • Open data schemas where possible
  • Clear APIs for data exchange
  • Separation of storage, compute, and analytics layers
  • Contract clauses that preserve data portability

If this sounds like “smart factory” architecture, that’s because it is. The winning factories built platforms, not point solutions.

3) Privacy and security as operating conditions

Connected vehicle data can expose sensitive patterns—home locations, routines, even identities when combined with other datasets. Public trust will collapse if privacy is treated as paperwork.

A workable posture for municipalities:

  • Minimize collection: store raw, identifiable data only when necessary
  • Strong de-identification: aggregate whenever possible
  • Purpose limitation: define what the city will not use data for
  • Security by default: encryption, key management, strict access control

What smart factories can teach smart cities about CAV readiness

Answer first: The fastest path to CAV readiness is to borrow proven AI patterns from manufacturing: predictive maintenance, quality control, and continuous improvement loops.

Because this post is part of the AI in manufacturing and smart factories series, let’s be concrete about the crossover.

Predictive maintenance → predictive infrastructure

Factories use AI to predict failures in machines. Cities can do the same for infrastructure that CAVs depend on:

  • Traffic signals (controller faults, timing drift)
  • Road markings (visibility degradation)
  • Smart street lighting and cameras
  • EV charging points for fleet operations

If you already run a municipal asset management system, CAV data can add a new layer: real-world performance signals. For example, repeated CAV “uncertainty” at a specific intersection can indicate faded markings or confusing signage.

Quality control → safety validation and auditability

In manufacturing, quality isn’t a slogan—it’s measurable. For CAV operations, cities will need analogous quality gates:

  • Standard definitions for “incident,” “near miss,” and “disengagement”
  • Validation datasets to test changes to routing rules or infrastructure
  • Audit trails for AI-assisted decisions (why a signal plan changed)

That’s how you avoid public backlash. People will accept automation faster when you can show consistent, inspectable controls.

Continuous improvement → policy that evolves monthly, not yearly

Traditional transport planning cycles are slow. CAV-enabled cities will be pushed toward shorter iteration loops:

  • Monthly updates to curb policies in high-demand areas
  • Seasonal signal strategies (winter conditions, holiday surges)
  • Rapid adjustments around events and construction

December is a good example. End-of-year retail traffic, logistics peaks, and weather-related disruptions create a perfect stress test for connected mobility. The cities that do well are the ones that can sense-and-respond, not just publish a plan.

A practical roadmap: what to do in the next 12 months

Answer first: You don’t need full autonomy to start. You need a CAV data strategy that’s safe, scalable, and useful.

If you’re a municipality, transport authority, or public-sector innovation team, these steps are realistic within a year.

Step 1: Define the minimum viable CAV data contract

Write down, in plain language:

  • Which datasets you need (and which you don’t)
  • Retention periods by data type
  • Ownership and access rules
  • Requirements for anonymization/aggregation
  • Incident reporting formats and timelines

This is where many pilots get stuck later—because it wasn’t agreed upfront.

Step 2: Build a two-speed data platform

Architect for:

  1. Real-time operations (streaming ingestion, dashboards, alerting)
  2. Planning and oversight (analytics, reporting, model governance)

Keep them connected, but don’t let them compete for the same pipelines and budgets.

Step 3: Pick three “public value” use cases and measure them

Choose outcomes residents will actually notice:

  • Faster emergency response through signal preemption coordination
  • Reduced congestion around schools via curb and routing management
  • Fewer crashes at a known blackspot through redesign triggered by near-miss data

Then set targets. If you can’t measure improvement, you can’t defend the program.

Step 4: Create a cross-functional CAV governance group

Include:

  • Transport operations
  • IT/data platform owners
  • Legal/privacy
  • Emergency services
  • Public communications

CAV projects fail when they live in a single department. They succeed when operations and data governance share accountability.

What this means for AI-led public services in 2026

Connected and autonomous vehicles are often framed as a futuristic add-on. I see them as a forcing function: they pressure cities to modernize how they manage data, safety, and infrastructure. If that modernization happens, it spills into other services fast—waste collection routing, winter road maintenance, permitting, even public safety analytics.

For readers following our Mākslīgais intelekts ražošanā un viedajās rūpnīcās series, the message is consistent: AI succeeds when it’s treated as an operating system, not a pilot project. CAVs simply make that visible on the street.

If your city wants CAV benefits—safer roads, more reliable transit, cleaner logistics—start by building the boring parts: data standards, storage tiers, audit trails, and clear governance. That’s not glamorous work, but it’s the work that scales.

Where do you think your organization would feel the strain first: real-time operations, data governance, or public trust?

🇱🇻 Autonomous Vehicle Data: What Cities Must Build Now - Latvia | 3L3C