XML Research Platform, Modernizing Information Assets and Enabling New Product Development


Innodata designed and developed an innovative, XML-centric research platform that acquires large amounts of unstructured content, semantically enriches it and ontologically links data points, to speed the delivery of valuable industry information to researchers.


Our client is a leader in providing market intelligence and insights for bio-pharmaceutical deals and clinical developments. These services are offered as subscription based and data-driven solutions. Historically developed on a technology stack that was difficult to extend and customize, they faced a number of challenges as they sought to diversify their product portfolio, and grow, including:

      • Vendor lock-in, including storage of content in multiple Lotus Notes instances.
      • Slow product development timelines.
      • Inconsistent content schemas, and poor linking and relationships between information assets.
      • Capacity constraints on content throughput, as many tasks were manual.

The removal of these roadblocks was essential to deliver the client’s business goals – and a new XML-based research platform was conceptualized to help drive the client’s growth ambitions.


The new research platform design was based on a set of principles focused on modernizing information assets, including:

      • Write once, deliver anywhere using a XML-first approach
      • A content model that is rich yet highly flexible.
      • Intelligent agents to automate source content discovery and acquisition.
      • Components that automate information discovery, entity extraction, and semantic enrichment using the RDF approach.
      • Integration with existing relational database systems to support on-going operations.

A number of technology frameworks are inter-connected in the complex solution. At the core of the platform we implemented two MarkLogic Server instances; one for bulk data and initial processing, the second sitting behind a new researcher workbench. In the first we are using GATE and Apache Jena as the frameworks to extract and store relationships, then a custom Java layer to process content in workflow. Further, a REST layer was developed to expose content through APIs, enabling delivery of new information products as a data service to multiple channels.

Working closely with client and their partners, Innodata’s technology team not only defined the requirements for the solution, but also delivered a technology roadmap with key milestones closely tied with the client’s business goals. This ensured that progress was closely mapped longer term goals and objectives.

In this instance, Innodata’s hybrid development methodology (which utilizes agile and waterfall frameworks) safeguards the timeliness of project delivery. The project team was spread across four time zones with a highly collaborative model – the wheel was therefore rotating around the clock.


The key benefits of newly developed eReader and platform include:

      • An increase in content throughput of x10.
      • Automated information harvesting and enrichment – automatically relating new content to existing tracked business
        relationships in their target industry.
      • Monetization of existing data sets, leading to revenue growth by supporting:
           o Customer facing new product development.
           o Services based on slicing and dicing data in a highly flexible way.
           o Multichannel content delivery to mobile, browsers and end-customer applications.
           o Enabling end-customers to query data with methods including: content overlap analysis, connected-data search, semantic               analysis, and other means of information discovery.



We are using cookies to give you the best experience on our site. Storage of third-party cookies may be adjusted through your browser setting.

By continuing to use our website without changing the settings, you are agreeing to our use of cookies.

More Information