Michael Nguyen | The Inside Scoop on Ground Truth Data for ML

Absolute AI | Conversations With the Humans Behind Artificial Intelligence


    • [1:58] AI Learning Resources and Applications

    • [4:37] Ground Truth Data in Machine Learning

    • [9:09] Securing Needed Ground Truth Data

    • [15:30] Addressing Privacy Concerns with Data Collection

    • [17:46] Capturing the Human Side of AI

    • [24:04] Overcoming Biases in AI

    • [26:51] The Evolution and Future of AI

Michael's Insights

    • [3:50] “Basically anything you can think of, AI can be a part of it.”

    • [11:24] “When it comes to human interaction, it’s really hard to create synthetic data because everyone is different.”

    • [24:53] “I don’t think you can eliminate biases in AI, but what you can do is get as much data as possible.”

    • [29:18] “AI can help us do a lot of things we can’t do ourselves, especially in healthcare.”

Howie's Bio

Michael Nguyen is the VP of Global Data Practice and Partnerships at Innodata, Inc. He is an action driven and technology focused business development professional with over 20 years of experience delivering state of the art products and services to global enterprises. For the past five years, Michael has been focused on ground truth data for artificial intelligence and machine learning.

Show Notes

In this episode, Melody welcomes Michael Nguyen, VP of Global Data Practice and Partnerships at Innodata, Inc. Michael shares insights into all that AI is (and isn’t), the advances that have been made in ground truth data collection, and what it will take to overcome the biases that are still present in machine learning.


Ground truth data is a collection of data that can’t be manufactured, it has to be captured. Primarily used by companies who develop products for AI/ML, there are several critical components to consider when building a ground truth data set, including identifying what type of data you actually need. Michael points out data that isn’t usually readily available, such as human interaction data, and how his team overcomes the hurdles to secure it.


Some data are very easy to come by, while some are significantly more difficult to secure. Any data collection that has to do with humans is more sensitive, and Michael highlights the ways that they address concerns and protect privacy for anyone who is sharing data. Object and environment data collection is pretty straightforward, while speech, simulation, and human data are much more difficult aspects of AI to capture. The only way to mitigate bias with facial recognition is to capture as much data as possible.


In the wake of the pandemic, one major focus of AI is turning to healthcare. From robots doing surgery to self-driving cars, AI will continue to improve the quality of life for everyone and take care of things that humans are not best suited to take care of.


Ready to Scale and Train Your AI Models?

Solve Your Toughest Data Engineering Challenges Using Artificial Intelligence and Human Expertise

Want to Join
the Conversation?

Are You a Leader in the AI Space? We Would Love to Learn About How You Are Transforming Your Industry.

(NASDAQ: INOD) Innodata is a global data engineering company delivering the promise of AI to many of the world’s most prestigious companies. We provide AI-enabled software platforms and managed services for AI data collection/annotation, AI digital transformation, and industry-specific business processes. Our low-code Innodata AI technology platform is at the core of our offerings. In every relationship, we honor our 30+ year legacy delivering the highest quality data and outstanding service to our customers.


Are You a Leader in the AI Space?

We Would Love to Learn About How You Are Transforming Your Industry

Absolute AI Podcast Cover Square

Conversations with the Humans Behind Artificial Intelligence