Connected and autonomous vehicles depend on AI-ready data. Learn what cities must build now: governance, storage, real-time analytics, and safety controls.

Autonomous Vehicle Data: What Cities Must Build Now
In 2024, road traffic crashes still killed around 1.19 million people globally each year, according to widely cited public health reporting from the World Health Organization. That number is the uncomfortable baseline for every âsmart mobilityâ conversation. If connected and autonomous vehicles (CAVs) reduce collisions even modestly, the payoff is measured in lives savedânot just smoother commutes.
But hereâs the stance Iâll take: the hardest part of CAV adoption isnât the vehicle. Itâs the data plumbing and governance that cities and public agencies need to put in place. Thatâs the real âwhere the rubber hits the roadâ moment from the SmartCitiesWorld podcast discussion with leaders from Cloudera and IBM Storageâbecause CAVs donât just move people, they generate a continuous stream of sensor, video, and telemetry data that forces cities to think like high-availability, safety-critical operators.
This post sits inside our âMÄkslÄ«gais intelekts raĆŸoĆĄanÄ un viedajÄs rĆ«pnÄ«cÄsâ series for a reason. The same AI patterns that make factories more efficientâreal-time analytics, predictive maintenance, quality control, traceabilityâalso apply to urban mobility. A modern city is starting to resemble a distributed factory floor: instrumented, automated, and judged by uptime.
CAVs make cities a data system first, a road system second
Answer first: If your city treats CAVs as a transport project only, youâll miss the main requirement: an AI-ready data architecture that can ingest, govern, and act on streaming data safely.
CAVs blend two capabilities:
- Connected vehicles: cars that exchange data with cloud services, other vehicles (V2V), and infrastructure (V2I/V2X)
- Autonomous vehicles: cars that use AI to perceive the environment and make driving decisions
Put them together and you get an ecosystem where the vehicle is a roaming sensor platform. Every test fleet produces:
- Camera and lidar/radar observations (often the largest volume)
- High-frequency telemetry (speed, braking, steering angle)
- Health signals (battery state, motor temperatures, sensor faults)
- Event logs (near-misses, harsh braking, disengagements)
- Map updates and âedge casesâ (weird scenarios that break assumptions)
From a public-sector lens, that creates a new operational reality: mobility becomes a continuous feedback loop. Roads, signals, curbs, loading zones, and enforcement policies canât be static. Theyâll be pressured to evolve based on evidence.
Snippet you can quote: âAutonomous vehicles donât just use dataâthey force cities to run on data.â
The hidden infrastructure: storage, pipelines, and decision latency
Answer first: CAV programs fail quietly when they canât manage the âthree Lâsâ: log volume, lineage, and latency.
The podcast guests come from data platform and storage backgrounds, and thatâs telling. Autonomous mobility isnât blocked by a single AI model; itâs blocked by the ability to reliably handle petabytes of multimodal data, trace it, and use it fast enough.
Log volume: the petabyte problem is real
Even conservative CAV data strategies generate massive datasetsâespecially when video is involved. Cities may not store raw video for every mile, but they will store:
- Triggered clips around incidents
- Training datasets for perception and prediction
- âGround truthâ samples for audits and safety reporting
- Aggregated traffic patterns for planning
This has a direct smart city implication: your procurement, retention rules, and storage tiers need to be designed for scale. If your data platform struggles with a few months of pilot data, it wonât survive a multi-year roll-out.
Lineage: trust depends on traceability
Public agencies donât get to say âthe model said so.â When a collision happens, or when residents dispute enforcement or curb changes, you need to answer:
- What data was collected?
- Which version of the model used it?
- Who had access?
- What transformations were applied?
Thatâs a manufacturing-grade requirement. Factories call it traceability. Cities should, too.
Latency: real-time isnât a buzzword, itâs safety
For CAVs, some decisions must happen on the vehicle (edge). But cities still need near-real-time intelligence for:
- Signal timing changes during incidents
- Priority routing for emergency response
- Dynamic speed management in high-risk conditions
- Coordinating work zones and temporary closures
A practical target Iâve seen work: separate âseconds-levelâ operational data (streaming) from âdays-to-monthsâ planning data (batch/warehouse). Mixing them usually creates both cost blowups and missed SLAs.
AI in public transport: CAVs as a blueprint for smarter governance
Answer first: CAVs can improve mobility only if cities build data-driven governance thatâs as modern as the vehicles.
Most public-sector CAV discussions get stuck at âwhen will it be fully autonomous?â Thatâs not the best question for 2025. The more useful question is: how do we govern mixed trafficâhuman drivers, automated shuttles, delivery robots, micromobilityâwithout chaos?
Here are three governance shifts that CAVs push cities toward:
1) Evidence-based street operations
CAVs generate fine-grained information about where the city is confusing, dangerous, or poorly signed. That data can drive:
- Better curb design (pick-up/drop-off zones that reduce double-parking)
- Work zone standards that are machine-readable and consistent
- Intersection redesign based on near-miss patterns, not only crash history
This is where AI can be bluntly useful: use anomaly detection on near-miss and disengagement clusters to find ârisk hotspotsâ before someone gets hurt.
2) Procurement that favors interoperability
Cities that lock into one vendorâs telemetry formats create long-term fragility. Your CAV data ecosystem should be designed around:
- Open data schemas where possible
- Clear APIs for data exchange
- Separation of storage, compute, and analytics layers
- Contract clauses that preserve data portability
If this sounds like âsmart factoryâ architecture, thatâs because it is. The winning factories built platforms, not point solutions.
3) Privacy and security as operating conditions
Connected vehicle data can expose sensitive patternsâhome locations, routines, even identities when combined with other datasets. Public trust will collapse if privacy is treated as paperwork.
A workable posture for municipalities:
- Minimize collection: store raw, identifiable data only when necessary
- Strong de-identification: aggregate whenever possible
- Purpose limitation: define what the city will not use data for
- Security by default: encryption, key management, strict access control
What smart factories can teach smart cities about CAV readiness
Answer first: The fastest path to CAV readiness is to borrow proven AI patterns from manufacturing: predictive maintenance, quality control, and continuous improvement loops.
Because this post is part of the AI in manufacturing and smart factories series, letâs be concrete about the crossover.
Predictive maintenance â predictive infrastructure
Factories use AI to predict failures in machines. Cities can do the same for infrastructure that CAVs depend on:
- Traffic signals (controller faults, timing drift)
- Road markings (visibility degradation)
- Smart street lighting and cameras
- EV charging points for fleet operations
If you already run a municipal asset management system, CAV data can add a new layer: real-world performance signals. For example, repeated CAV âuncertaintyâ at a specific intersection can indicate faded markings or confusing signage.
Quality control â safety validation and auditability
In manufacturing, quality isnât a sloganâitâs measurable. For CAV operations, cities will need analogous quality gates:
- Standard definitions for âincident,â ânear miss,â and âdisengagementâ
- Validation datasets to test changes to routing rules or infrastructure
- Audit trails for AI-assisted decisions (why a signal plan changed)
Thatâs how you avoid public backlash. People will accept automation faster when you can show consistent, inspectable controls.
Continuous improvement â policy that evolves monthly, not yearly
Traditional transport planning cycles are slow. CAV-enabled cities will be pushed toward shorter iteration loops:
- Monthly updates to curb policies in high-demand areas
- Seasonal signal strategies (winter conditions, holiday surges)
- Rapid adjustments around events and construction
December is a good example. End-of-year retail traffic, logistics peaks, and weather-related disruptions create a perfect stress test for connected mobility. The cities that do well are the ones that can sense-and-respond, not just publish a plan.
A practical roadmap: what to do in the next 12 months
Answer first: You donât need full autonomy to start. You need a CAV data strategy thatâs safe, scalable, and useful.
If youâre a municipality, transport authority, or public-sector innovation team, these steps are realistic within a year.
Step 1: Define the minimum viable CAV data contract
Write down, in plain language:
- Which datasets you need (and which you donât)
- Retention periods by data type
- Ownership and access rules
- Requirements for anonymization/aggregation
- Incident reporting formats and timelines
This is where many pilots get stuck laterâbecause it wasnât agreed upfront.
Step 2: Build a two-speed data platform
Architect for:
- Real-time operations (streaming ingestion, dashboards, alerting)
- Planning and oversight (analytics, reporting, model governance)
Keep them connected, but donât let them compete for the same pipelines and budgets.
Step 3: Pick three âpublic valueâ use cases and measure them
Choose outcomes residents will actually notice:
- Faster emergency response through signal preemption coordination
- Reduced congestion around schools via curb and routing management
- Fewer crashes at a known blackspot through redesign triggered by near-miss data
Then set targets. If you canât measure improvement, you canât defend the program.
Step 4: Create a cross-functional CAV governance group
Include:
- Transport operations
- IT/data platform owners
- Legal/privacy
- Emergency services
- Public communications
CAV projects fail when they live in a single department. They succeed when operations and data governance share accountability.
What this means for AI-led public services in 2026
Connected and autonomous vehicles are often framed as a futuristic add-on. I see them as a forcing function: they pressure cities to modernize how they manage data, safety, and infrastructure. If that modernization happens, it spills into other services fastâwaste collection routing, winter road maintenance, permitting, even public safety analytics.
For readers following our MÄkslÄ«gais intelekts raĆŸoĆĄanÄ un viedajÄs rĆ«pnÄ«cÄs series, the message is consistent: AI succeeds when itâs treated as an operating system, not a pilot project. CAVs simply make that visible on the street.
If your city wants CAV benefitsâsafer roads, more reliable transit, cleaner logisticsâstart by building the boring parts: data standards, storage tiers, audit trails, and clear governance. Thatâs not glamorous work, but itâs the work that scales.
Where do you think your organization would feel the strain first: real-time operations, data governance, or public trust?