How Perfect Can Your Autonomous Vehicle Data Annotations Be?

At Kognic, we're obsessed with data quality. It's not just a feature—it's our foundation. When developing advanced driver assistance systems (ADAS) and autonomous driving (AD) technologies, the ground truth we're annotating sometimes presents ambiguities. Camera resolution limitations, environmental factors, and sensor variances all introduce uncertainty (explore more factors in our detailed blog post). A critical question for automotive AI teams is: what annotation quality levels can you reasonably expect? This becomes particularly crucial when evaluating automation tools—how close to "perfect" can annotations realistically be?

Three meticulous annotations (blue boxes) of the same vehicle. What level of agreement is actually achievable?

To provide automotive engineers with concrete benchmarks, we conducted comprehensive in-house experiments using 200 typical ADAS/AD camera images annotated by 14 professional annotators following rigorous Kognic quality standards. The dataset contained approximately 2,500 objects, resulting in 35,000 annotated bounding boxes—giving us high statistical confidence in our findings.

For maximum accuracy, we established ground truth through "wisdom of the crowd," averaging all 14 annotations to create a reference standard for each object (illustrated by the dashed red line above). We then measured pixel deviations between individual annotations and this reference. The results reveal important insights for automotive perception teams:

While professional annotators achieve remarkable consistency most of the time (evidenced by pixel deviations clustered near zero), we discovered a significant—and perhaps surprising—number of larger deviations. The most extreme cases reached approximately 400 pixels from reference, creating what statisticians call a "heavy tail" distribution. Our investigation identified two primary factors driving these annotation variances—critical knowledge for automotive AI development teams.

Truncated objects (like this yellow cab) force annotators to estimate dimensions beyond visible boundaries—considerably more challenging than annotating fully visible vehicles.

First, objects partially outside image boundaries introduce significant ambiguity. When annotators must estimate how far a vehicle extends beyond visible edges, substantial variations naturally emerge. Below are the deviation patterns specifically for these truncated objects:

While truncation issues can be mitigated through annotation guidelines that avoid extrapolation beyond image borders, the second major challenge proves more difficult to overcome: occlusion.

Occluded objects (like this car behind the yellow cab) create uncertainty about their true dimensions—another major challenge compared to fully visible vehicles.

When analyzing only non-truncated but occluded objects, we see this distribution pattern. Occlusion clearly introduces significant annotation variability, though slightly less severe than truncation:

After identifying truncation and occlusion as the primary sources of annotation ambiguity (challenges that cannot be fully resolved without additional sensor data like LiDAR or multi-camera inputs), we have positive news: non-truncated, non-occluded objects can be annotated with exceptional precision:

This clean half-normal distribution without heavy tails reveals an important benchmark: for fully visible objects in 2D images, expert annotators rarely deviate more than 2 pixels from ground truth. However, this precision decreases significantly once objects become partially occluded or truncated—a crucial consideration when developing perception systems for complex urban environments.

While pixel deviation effectively measures 2D annotation accuracy, automotive AI increasingly relies on 3D annotations. For these, Intersection over Union (IoU) serves as the standard metric—comparing two geometric shapes with a score from 0 (no overlap) to 1 (perfect match). Our parallel experiment with 3D LiDAR point cloud annotations of non-truncated, non-occluded objects yielded these results:

The beta distribution pattern suggests carefully annotated, fully visible objects in 3D point clouds can consistently achieve IoU values above 0.9 compared to ground truth. Given the greater complexity of 3D annotation, this represents an achievable benchmark rather than a hard requirement.

For automotive AI teams developing perception systems, these insights provide valuable benchmarks: expect pixel deviations under 2 pixels for optimal 2D annotations and IoU values exceeding 0.9 for optimal 3D annotations. These metrics should inform your error tolerance specifications and quality assurance protocols for annotation projects powering your autonomous systems.

Note: While these findings apply broadly to automotive perception annotation, your specific project requirements may vary. Consult with Kognic's advisory team to discuss optimal quality targets for your particular ADAS/AD development needs.