Often times in video and other imagery, we wish to know how much of something is present. For example, Marine Biologists are often interested in assessing the health of a coral reef; healthy coral reefs grow, unhealthy coral reefs can contract. Biologists also do things like place video cameras underwater to assess the density of fish over time, or review aerial imagery to understand the health of a forest.
In medical applications, such as an in body surgery assessor, we might wish to know the relative volume of fat present in a body scene to understand how difficult a surgery might be. In all of these cases, quantifying how much of something is present in imagery is essential. For example, we might want to know how much of a fixed camera’s screen over time has a coral reef visible. This will allow us to estimate the growth or contraction of the coral. In another case, we might fly a drone or plane over a specific plot of land to quantify how much tree foliage was impacted by a forest fire, or how fast a farmer’s plants are growing.
So how do we do it? Sometimes we might need to count distinct objects, other times we might have ample budget for human labeling and can employ advanced techniques like semantic segmentation, but other times, we might want to quantify something with fairly distinguishing textures and colors and are limited on computation and human labeling resources. Each of these cases likely involve different techniques. In this latter case, one approach is to rapidly label positive examplars of what we wish to quantify in the form of quick polygons. Whenever labeling data for an AI system, its always essential to be mindful of how long the human annotators will spend labeling each item. We use a tool we built called Vannot (short for Video Annotator) to illustrate.
In the above screen shot, we see the human annotator has labeled this particular frame with a nice (and quick) large swatch of coral, water, and a smaller swath of fish.
The Convolutional Neural Network (CNN) model we construct can now be defined to understand which regions of a given frame contain coral, for example. Next, we can apply the model to the video, and overlay a mask for coral and quantify it. We can see the output of this simpler technique in the above video where less than 2 hours of human labeling time total were used. Notice the quantification bar in the far right. We might even generate time series data like this plot below displays containing the quantity of coral over time; this time series data can then in turn be used in other models, for example, to predict the overall health of the local ecosystem.
At Xyonix, we regularly employ techniques like this to help our customers quantify something present in their images. Your problem is not just a data science problem, it is also, and perhaps more importantly, a business problem — this is why we do not apply a 1 size fits all approach.