Machine learning blog series — Part 1: Get confident, stupid.

Whether you’re a legal information service or an e-commerce retailer, in publishing or consumer products (or some other enterprise entirely), chances are that machine learning (ML) is, or will soon be, an integral part of your business. But machine learning is not a one-shot deal. Rather you have make sure your machines keep learning, and that is where pitfalls abound. In this series of blog posts, we will look at critical components of machine learning systems. Today’s topic: confidence metrics.

When we talk about ML, we’re talking about giving computers the ability to learn without being explicitly programmed. This technology is being deployed for everything from facial recognition to helping you pick your next Netflix movie or show. Netflix actually provides a good example for today’s topic: Let’s say you told the streaming service that you hated “The Ranch” but you loved “The Keepers”. The next time you log in, it may suggest “Making a Murderer”. If you’re sick of true crime, you give the documentary a thumbs down.  Netflix would then think twice before showing you more documentaries like that.


In machine learning, this review system is what is known as a “feedback loop” or “retroaction loop”. Feedback loops are critical to ensuring systems provide continuously accurate results. So what do your binge watching preferences and feedback loops have to do with your business?

Confidence Metrics

Let’s say your business provides legal information services. Reading long legal texts searching for citations, recording those citations and indexing them is time-consuming and mind-numbing for employees. So you make the excellent decision to retain a renown digital data services company to implement a machine learning solution. The computer model can be trained to sniff out legal citations in documents. Any good machine learning model will have a confidence score built in. That is, a number attached to each prediction that tells how confident the system is about that prediction.


Now remember, you want to avoid brain-numbing activities for your employees. If a team is reviewing the output of an ML-enabled legal citation analysis, having operators review every piece of flagged text will do numb their brains faster than novocaine. It’s time consuming and inefficient, defeating the very purpose of using machine learning. This is where confidence estimation plays a key role. Using confidence metrics effectively is the key to striking the right  balance between automation and accuracy.

Once a confidence threshold is selected based on an error rate you are comfortable with, records having a higher confidence estimate are only sporadically reviewed, and all records with a confidence estimate lower than the threshold are reviewed and corrected. To avoid numbing brains, a simple color coding based on confidence levels, as seen in the example below, enables operators to give a cursory scan to green records, while paying more attention to yellow records and focusing intently on red records.

One more thing

Feedback loops must be created from both the high-confidence records (via sampling and occasional human review) and the low-confidence flow (via 100% human review). Building a feedback loop from the high-confidence records may be unintuitive, but it is crucial in order to avoid skewing the training dataset. Simply put, don’t just correct mistakes. Give your machine learning model a pat on the back for being right too. And when you’re done work, kick back with some TV shows and movies that have been machine-selected based on your preferences. Just don’t watch “The Ranch”.