Fine-tuning through Exploration

Iteration is the way to go...

Machine learning, and most of AI, is about learning patterns from huge amounts of data. With supervised machine learning, the annotated training data – and not only the program code! – is instructing the computer what to do. Much like program code for solving complex tasks is best developed iteratively, so is the case also for the data. The archetypal waterfall model has shown its limitations in software development and it does not serve its purpose for machine learning either. Indeed, at the very heart of modern machine learning is the iterative stochastic gradient algorithm, but iteration needs to happen on many more levels than that. Some examples:

  • Collect data iteratively. With a required performance level of the model, it is hard (if not impossible) to know beforehand how much data is needed. You need to learn what is sufficient as you go!
  • Decide what and how to annotate iteratively. It is almost impossible to specify annotation instructions upfront that will last through an entire project. This is partly because it is difficult to foresee everything that could ever appear in your data, and partly because it is not even fully understood how the training data characteristics will translate into model performance. You can start with an educated guess, but in the end you will have to try and see what happens, and most likely go back and improve.
  • Improve the annotation quality iteratively. It is inevitable that annotation mistakes happen, and hunting them all down by quadruple-checking everything is mostly not feasible, too costly and not guaranteed to work either. However, working on the annotation quality in parallel with improving your model, you can pinpoint and fix the essential mistakes. 

Introducing the Explore

To support the iterative way of working, please say hello to Explore. This new feature of our Dataset exploration capability helps narrow the iterative loop by comparing annotations to predictions  and allows your team to efficiently browse the results - a critical activity. With its advanced sorting, filtering and searching capabilities, users are able to single out the exact data points they are looking for maximum impact.

dr22keep gif
Browsing objects where annotations and predictions agree vs where they don’t agree.

Narrowing the selection of objects by creating filters using the histograms .

The idea of comparing an annotation to a prediction to tell whether they agree or disagree is not very complicated in itself, but it is a very useful tactic to improve performance. With high quality annotations and a good machine learning model, you’d expect only minor differences in the comparison. With over four years of platform usage and feedback to leverage, we have found this “hunting down” of the differences proves to be doubly valuable; sometimes it is an annotation mistake, while sometimes it is the model itself which needs improvement.

Browsing and sorting Lidar annotations.

Beyond the convenience of having millions of comparisons accessible with visualizations, sorting and filtering through your browser, the Explore functionality, now integrated into the Kognic platform, also allows for an easy way to send annotations back for review and fine-tuning. When your team encounters something that looks vaguely suspicious, the annotation can be redirected for a manual inspection (and correction if needed). By using Explore to narrow items sent back for refinement, QA becomes an entirely more streamlined and cost-efficient process as the entire dataset isn’t blindly sent through for double checks. And your teams will conveniently receive the updated annotation in the same way as all other annotations are delivered.


Selecting objects and sending them back for inspection and correction

We want our customers to succeed. And to succeed in using machine learning for perception, we are convinced that an iterative approach is needed. We're excited that Explore can help make this real. 


Photo by Steve Johnson on Unsplash