– Case Study –

Web Scraping & Monitoring For News Aggregator

Leading news aggregator requires continuous web scraping and monitoring of over 100,000 websites


A leading news aggregator creates and shares configurable summaries of news articles and other textual information to its clients. To support the needs of its expanding client base, they required help collecting news articles from various websites and continuous monitoring of the aggregated news feeds. The key challenge was to ensure that scrapers were able to handle broken links and enter robot.txt websites for the continuous acquisition of content, which was then converted and normalized into the client required format.  

Over 100,000 websites monitored

Innodata Web Scraping & Monitoring For News Aggregator



Innodata deployed an automated web data aggregation solution to acquire news articles from a predetermined list of news websites and set agents to continually monitor the websites. In addition, we built and deployed proprietary AIbased technology to identify broken links and robot.txt websites allowing for continuous data collection. Innodata further helped the company identify new sources of news articles in various domains. 


Complete and highly accurate aggregated news articles were automatically fed into the company’s data platform. This enabled real-time availability of news articles for creation of configurable summaries and on-time delivery of the summaries to the company’s clients. 

Meet an Expert

Our Team of Data Experts

A team comprised of data experts with extensive experience in developing AI-based data solutions for clients. Book a time that works for you and let us help develop a custom solution for your unique needs.

(NASDAQ: INOD) Innodata is a global data engineering company delivering the promise of AI to many of the world’s most prestigious companies. We provide AI-enabled software platforms and managed services for AI data annotation, AI digital transformation, and industry-specific business processes. Our low-code Innodata AI technology platform is at the core of our offerings. In every relationship, we honor our 30+ year legacy delivering the highest quality data and outstanding service to our customers.



© 2022 All rights reserved