efficient, iterative annotation gathering

For many problems, we need to formulate a ground truth, or dataset which tells what the right answers are. This ground truth is in turn used to train your models, much the same way a toddler may be given examples of fruit in order for the child to better understand fruit.


Our Mayetrix platform allows us to easily import your data and efficiently allow our annotators to label your data. For example, suppose you have a repository of images and their metadata of bottles moving along a conveyor belt line, and you want to predict whether a given bottle has a crack in them based on the photo. Our system will import the repository including all of the metadata which might contain information like which truck dropped off the bottles. Next our system will import them, and we will put our humans to work at looking at the images and labeling whether they are cracked or not. The annotators might find that a string of errors came from the same truck, in which case they will label more of these bottle images.

Once the annotated repository gets of a decent size (say around 1000 to start), we will build a model, and pass it through all of the images. Our humans are then carefully instructed to spend their time on likely error cases in a process known as active learning. We repeat this process and continuously leverage machine learning model output as a way to minimize the volume of annotated data we require.

Off the shelf crowd sourcing solutions are good at achieving scale once such problems are well defined, and a gold standard is available against which untrusted crowdsourced work can be reliably assembled. We regularly use these services once we reach that state, but often times in the early stages of a machine learning problem, we need trusted humans to help us make progress quickly using often very sensitive customer data.

In addition, we regularly encounter problems where the annotation jobs themselves are fairly complex. For example, we might need a document to be annotated with 0 or more labels of a hierarchical nature in a set of a hundred or more labels. This type of problem often requires detailed instructions to our trained annotators and a sustained learning process for the humans to understand the knowledge domain well. We also have tools like an autosuggest service which recommend labels to help make our annotators much more efficient and reduce costs. Please see the Mayetrix Data Annotation platform for a more system oriented discussion.