Often times in video and other imagery, we wish to know how much of something is present. For example, Marine Biologists are often interested in assessing the health of a coral reef; healthy coral reefs grow, unhealthy coral reefs can contract. Biologists also do things like place video cameras underwater to assess the density of fish over time, or review aerial imagery to understand the health of a forest. In medical applications, such as an in body surgery assessor, we might we wish to know the relative volume of fat present in a body scene to understand how difficult a surgery might be. In all of these cases, quantifying how much of something is present in imagery is essential. For example, we might want to know how much of a fixed camera’s screen over time has a coral reef visible. This will allow us to estimate the growth or contraction of the coral. In another case, we might fly a drone or plane over a specific plot of land to quantify how much tree foliage was impacted by a forest fire, or how fast a farmer’s plants are growing.
So how do we do it? One approach is to rapidly label positive examplars of what we wish to quantify in the form of quick polygons. Whenever labeling data for an AI system, its always essential to be mindful of how long the human annotators will spend labeling each item. We use a tool we built called Vannot (short for Video Annotator) to illustrate.
In the above screen shot, we see the human annotator has labeled this particular frame with a nice (and quick) large swatch of coral, water, and a smaller swath of fish. The models we construct can now be defined at a patch level, where we define a smaller area of pixels, say a 20x20 pixel square. We can then, for example, train a patch classification model up using a Convolutional Neural Network (CNN). This patch classifier, can then tell us whether a given 20x20 pixel area is coral, fish or water. We can then apply the model to the video, and overlay a mask for coral, water and fish, and now quantify each.
At Xyonix, we regularly employ techniques like this to help our customers quantify something present in their images.