– Case Study –
Web Scraping & Monitoring For News Aggregator
Leading news aggregator requires continuous web scraping and monitoring of over 100,000 websites
Challenge
A leading news aggregator creates and shares configurable summaries of news articles and other textual information to its clients. To support the needs of its expanding client base, they required help collecting news articles from various websites and continuous monitoring of the aggregated news feeds. The key challenge was to ensure that scrapers were able to handle broken links and enter robot.txt websites for the continuous acquisition of content, which was then converted and normalized into the client required format.
Over 100,000 websites monitored
Results
SOLUTION
Innodata deployed an automated web data aggregation solution to acquire news articles from a predetermined list of news websites and set agents to continually monitor the websites. In addition, we built and deployed proprietary AI–based technology to identify broken links and robot.txt websites allowing for continuous data collection. Innodata further helped the company identify new sources of news articles in various domains.
IMPACT
Complete and highly accurate aggregated news articles were automatically fed into the company’s data platform. This enabled real-time availability of news articles for creation of configurable summaries and on-time delivery of the summaries to the company’s clients.
Meet an Expert
Our Team of Data Experts
A team comprised of data experts with extensive experience in developing AI-based data solutions for clients. Book a time that works for you and let us help develop a custom solution for your unique needs.