Search
Close this search box.
Data Centric AI Innodata

Quick Concepts: Data-Centric AI

What is Data-Centric AI?

Data-centric AI is a movement launched by Andrew Ng, a Stanford professor, co-founder of Google Brain and Coursera, chief scientist at Baidu, and founder of Landing AI. Data-centric AI emphasizes data refinement over model development (model-centric AI). It asserts that because AI models have mostly been figured out and made available to the public, data science expertise is no longer a prerequisite for AI implementation. The focus should now be on collecting the right kind of data and preparing it correctly.

How does a data-centricity affect AI implementation?

Data-centric AI directs resources toward collecting, cleaning, analyzing, and improving data. Traditional AI involves amassing huge datasets and refining the AI model (or adding more and more data) until achieving desired results. Data-centric AI does the opposite: it holds the model fixed and refines the data until the model produces accurate results, prioritizing data quality above quantity.

How does data-centric AI affect training data collection and preparation?

Data-centric AI calls for a systematic approach to data handling. Its strategy includes collecting data samples that truly represent the real world with consistent labels and minimal noise, running the models, then augmenting or fine-tuning the data further based on model performance. The goal is to identify where the data is inconsistent, ambiguous, or inadequate and target those specific areas for refinement. This is a quicker and less resource-intensive way to improve model performance than collecting large amounts of additional data or rewriting algorithms. It can also achieve high model accuracy using very small datasets. In addition, improving the data requires less ML and data science expertise than improving the model, consequently enabling companies in a range of non-tech industries (such as manufacturing, healthcare, and agriculture) to use AI effectively without in-house data science teams.

Accelerate AI with Annotated Data

Check Out this Article on Why Your Model Performance Problems Are Likely in the Data
ML Model Gains Come From High-Quality Training Data_Innodata