Smart content: taking your assets to school

Publishers often struggle with how to improve, reuse and repurpose their content. Likewise, consumers are getting increasingly demanding of providers. Simply being able to search for content is not enough anymore. Consumers also aren’t just buying physical books anymore; they want the ebook they bought to be readable on their mobile phone, their Kindle, their iPad or their desktop computer. Developing “smart content” is a two-fold process: Publishers need to get smart about their content, and their content needs to get smart. Machine learning can help with that second one.

There are actually two distinctly important sub-categories when we talk about developing smart content, one of which follows the other. The first sub-category we’ll call “production”. This is using machine learning to automate the transformation of raw content assets to smart content. What is smart content exactly? For our purposes it is content that is machine-addressable and self-describing. To borrow a helpful definition from Steve Odart, it is “content that knows what it is, where and how it has been used, and where and how else it could be used.”

In the “production” case, publishers re-engineer their digital supply chains using AI to reduce the extent to which human validation is required to transform raw material to the kind of normalized, metadata-rich, self-describing content. There are three ways to do this – machine only processing; machine/human hybrid processing; and confidence-guided symbiosis. The level of success with each of these possibilities is a function of applying the appropriate implementation model; building the right neural network model and tuning it appropriately; migrating to a service-driven approach that enables conserving and leveraging previously discarded production artifacts; and putting in place the necessary monitoring, maintenance, and performance processes to ensure that the system continues to perform without degrading over time or introducing bias. While transforming from a human-lead digital supply chain to an AI-driven digital supply chain is not easy, achieving the desired efficiencies makes the effort and investment worthwhile.

The second sub-category is using machine learning to provide new analytics functions to information products. We’ll call this the “product” use case.  All of the hard work done in the “production” use case paves the way for the product case. In the product use case, publishers harness new AI capabilities that work with those newly-smart assets (i.e., metadata, semantic tagging, normalization) in order to innovate new functionality. These new functions give product users the ability to obtain new, actionable insights. Artificial intelligence thrives on the very thing that can often be too overwhelming for humans to consume: massive data sets. While humans will miss patterns and overlook relationships in massive data sets, computers can quickly identify these patterns and reduce them to human-consumable insights. To illustrate, we have worked with our clients to enable AI-driven analytics that identify research trends through citations analysis and to predict how particular judges are likely to decide motions based on legal precedent cited and the facts and circumstances of cases.

This is all to say that smart content is here, and the publishers, information service providers and other organizations that embrace it can see lots of new opportunities, both for customer satisfaction and revenue.