Measuring data quality efficiently

Written by Isak Hjortgren Product Area Leadisak.hjortgren@kognic.com Rocío Martínez Climent Digital Content Producerrocio.martinez@kognic.com | Dec 1, 2020 2:24:00 PM

In this blog post, we explore...

why high-quality annotated data is foundational to autonomous vehicle safety
the challenges and costs of measuring annotation quality at scale
how Kognic's statistical approach enables efficient, reliable quality assessment

High-quality data is the foundation of safe and reliable autonomous systems. Modern perception systems in autonomous vehicles rely on sensors—cameras, LiDAR, and radar—to understand their environment in real-time. But raw sensor data alone isn't enough. The algorithms that power these systems must be trained on annotated data—precisely labeled examples that teach them what objects look like and how to interpret complex driving scenarios. At Kognic, we combine human feedback with automation to deliver the high-quality annotated data that makes autonomous perception possible.

2D Bounding Box

2D Segmentation

3D Cuboid

Annotated data is paramount when preparing the perception system of a car with some level of autonomy for the streets. It serves two purposes: training the algorithms to interpret collected information, and validating that the system has learned to correctly interpret that information. Since annotated data is used for both these critical purposes, the quality of annotations is of utmost importance. Low quality annotations might, in the end, cause a car to misinterpret what is happening on the road.

The process of annotating data always includes human decisions. The first challenge is having humans agree on what constitutes a correct annotation of the recorded data. Creating annotation guidelines is sometimes not as straightforward as one might think. At Kognic, we're experienced in efficiently designing annotation guidelines that enhance quality, and we'll share some of our insights in a later blog post.

The second challenge is performing annotations at scale, guided by those guidelines. At Kognic, we combine human expertise with machine learning automation in a carefully designed environment to annotate large amounts of data as efficiently as possible. Our goal is simple: help you get the most annotated autonomy data for your budget. We'll share more about our experience on these topics in later blog posts.

We understand that annotations are safety-critical, but also that they must be acquired cost-efficiently. Our top priority is to continuously develop and optimize our annotation tools and processes to maximize productivity. To guide this development, we need tools for measuring annotation quality so we can ensure we always reach the required quality while increasing annotation efficiency. This is an ongoing process, as there are many aspects of data quality and it's not yet fully known what impact different aspects have on modern machine learning methods. Kognic is actively advancing this knowledge.

One way to quantify annotation quality is through the precision and recall of an annotated dataset. Let us explain: consider annotations where an object (like an approaching vehicle) in a camera image is annotated by a bounding box. When reasoning about dataset quality, two important questions are: (i) whether an object of interest has been correctly annotated by a bounding box, and (ii) whether a bounding box actually contains an object of interest.

Raw data

Proposed Annotation

All Correct

False Negative

False Positive

In a perfectly annotated dataset, neither of the above mistakes are present. One way to define quality is to compute the extent to which these mistakes exist in an annotated dataset. We could for instance compute:

The ratio of bounding boxes that actually denote an object. This is known as the precision. Ideally the precision is 1.
The ratio of objects that are correctly annotated with a bounding box. This is known as the recall. Ideally the recall is 1.

But computing the precision and recall for a dataset would require a manual critical review of each frame in the entire dataset, which could be as expensive as the annotation process itself! To gain efficiency when we compute precision and recall, we rely on statistics to infer these metrics. We perform a manual critical review only for a statistically well-chosen subset of all annotations, and use probability theory to draw conclusions about the entire dataset.

In more detail, we use a Bayesian approach to compute the posterior distribution for precision and recall for the entire dataset, conditional on the subsample of critically reviewed annotations. This gives us not only an estimate of precision and recall, but also quantifies the uncertainty in these estimates. For example, we can compute the lower 95% credibility bound—a threshold we are 95% sure that the precision or recall does not fall below. You can explore how it works in the animation below. The more manual reviews we make, the less uncertainty remains in our estimate, and we can explicitly decide on the trade-off between the cost of manual critical reviews and the acceptable level of uncertainty in our quality measures.

All in all, this gives us a cost-efficient tool for measuring the quality of our annotations in terms of precision and recall levels, and how certain we are about those levels. It has become an integral part of our platform that we use routinely. We also compute similar quality measures for other aspects of our annotations, such as overlap ratios of bounding boxes in 2D and 3D. Measuring annotation quality is essential to optimizing our processes and ensuring we deliver the most annotated autonomy data for your budget—helping your autonomous systems learn faster with reliable human feedback.

The above image shows a screenshot from our platform, where this approach is implemented and used as an integral part of our continuous quality assessment.

View full post