To ensure that our performance scores are as objective as possible, our examples are drawn from a statistically determined sample across our coverage regions that is weighted towards populated areas. We have deliberately chosen a significant portion of our examples as challenging cases where our models are least certain.
Our team of highly trained expert labellers use a custom version of MapBrowser to check multiple dates, multiple angles, and even our 3D models to determine whether they believe an object is present. For example, a swimming pool missed on a leaf-on survey will be marked as incorrect if the labeller can see the pool before or after that point in our capture history.