– Case Study –

Web Scraping & Monitoring For News Aggregator

Leading news aggregator requires continuous web scraping and monitoring of over 100,000 websites

Challenge

A leading news aggregator creates and shares configurable summaries of news articles and other textual information to its clients. To support the needs of its expanding client base, they required help collecting news articles from various websites and continuous monitoring of the aggregated news feeds. The key challenge was to ensure that scrapers were able to handle broken links and enter robot.txt websites for the continuous acquisition of content, which was then converted and normalized into the client required format.  

Over 100,000 websites monitored

Innodata Web Scraping & Monitoring For News Aggregator

Results

SOLUTION

Innodata deployed an automated web data aggregation solution to acquire news articles from a predetermined list of news websites and set agents to continually monitor the websites. In addition, we built and deployed proprietary AIbased technology to identify broken links and robot.txt websites allowing for continuous data collection. Innodata further helped the company identify new sources of news articles in various domains. 

IMPACT

Complete and highly accurate aggregated news articles were automatically fed into the company’s data platform. This enabled real-time availability of news articles for creation of configurable summaries and on-time delivery of the summaries to the company’s clients. 

Meet an Expert

Our Team of Data Experts

A team comprised of data experts with extensive experience in developing AI-based data solutions for clients. Book a time that works for you and let us help develop a custom solution for your unique needs.