Innodata — Semantic Versus Instance Segmentation

Why the Difference is Crucial for Image Annotation

For any business trying to create a distinguished competitive edge in the market, computer vision has been a go-to enabler. From improving customer experience to reducing costs, computer vision is being applied across a diverse set of industries to accurately identify and classify objects. One of the most important aspects of artificial intelligence, computer vision makes sense of a visual world. Many new economies and market categories have been born out of recent advances in computer vision. For example, autonomous drones which seemed like a scene out of a Hollywood movie is already enabling deep surveillance at land borders.

But how is it possible to encode the human vision in real-time while there is still a limited understanding of how differently people perceive the visual world? There’s still a common ground. For instance, we call the sun the sun, a human a human, a car a car, a plant a plant, and so on.

When we are born and open our eyes, we immediately start learning about our surroundings. Among a number of perceptions and emotions, we instinctively detect and label the people and objects in our environment. More importantly, this happens in a split second without even realizing.

For computers, machines need to learn from hundreds to thousands of labeled or annotated images. This is the basic premise for computer vision applications where in order for AI systems to predict objects of interest, we first must introduce it to examples of what those objects are. For e.g. – If we are building an AI system that can detect cars on a street, we would feed thousands of examples of all sort of cars in variety of colors and angles.

The definitive goal of a computer vision project is to develop a deep learning algorithm capable of detecting objects in real-time with high-accuracy. A typical implementation would rely on image segmentation techniques powered by Convolutional Neural Network (CNN), that basically involves drawing pixel level boundaries at the objects in an image.

There are two powerful – yet distinct – techniques in Image Segmentation that help computer vision projects:

Semantic Segmentation – This involves detecting objects within an image and grouping them based on defined categories. For e.g. – In a street scene, you would draw boundaries and label items – Humans, Automobiles, Bikes, Traffic Lights, Walkway, Crossing, Lanes etc.
Instance Segmentation – This takes semantic segmentation one step further and involves detecting objects within defined categories. For e.g. – In the same street scene, you would individually draw boundaries for each of the category and uniquely label – Humans – (Adult, Kid), Automobiles – (Cars, Bus, Motor Bikes…), and so on.

While Instance Segmentation labeling is expensive, it is one of the more robust and comprehensive methods of achieving object detection in the image analysis. Moreover, the application amplifies when assessing videos by analyzing individual frames in the video. There is real-intelligence to be tapped once you uniquely identify each instance of objects in an image which is segmented by defined categories.

There are many vendors that excel in the field of computer vision and routinely help companies interpret images for their machine learning models. But it’s important to pick a partner that understands the nuances of these image segmentation techniques. Be sure to do some research and identify those that can simply use computer vision versus those that can drill down to help build a robust data set of images that will take your projects to the next level.

Check out how we’re helping companies use diverse segmentation methods to extract data from satellite images and other image sources to meet their business needs.

Semantic Versus Instance Segmentation

Why the Difference is Crucial for Image Annotation

Will Fisher - VP, Data Solutions​

Will Fisher - VP, Data Solutions