Analyzing Camera Trap Pictures with AI

Using SpeciesNet (Google AI) to sort through hundreds of thousands of wildlife pictures from motion detection cameras (camera traps).

Preview page created using Megadetector dev-tools on a subset of predictions.
Urban Rivers - CamTrap AI Observations

A good example of successfull Great Blue Heron classification

Project Workflow

Process Images through the SpeciesNet Deep Learning Computer Vision Model.
Associate inferences back to images via md5 hash - Data Cleaning.
Explore Dataset - EDA.
Plan for model fine-tuning & next steps.

Notebook for processing images on Kaggle:

The notebook took 10 hours to run - 7 hours of which was just spent on downloading and resizing images. This is why I used batching and multi-threading for the images, but yet it still takes a while to go through 100k s3 links. Surely there’s a better way.

Data Cleaning and EDA

Our primary goal with image classification - before releasing a model - is to confirm that predictions are “accurate enough”. This generally takes shape by having a holdout set of images with known (human graded) classes the ai doesn’t have access to. In our situation, we don’t have this quite yet - images are being labeled and classifications are on the production server - but the speciesnet ensemble doesn’t yet have a fine-tuning or base truth approach listed.

Ultimately this would let us “vote” on identifying species in photos using AI.

Since this is the first time we’re trying speciesnet, we’re going to do a few spot checks and EDA of the generated inference (predicted animals in photo) to understand how it performs.

Cleaning

After running the model we are left with a very long json file of 100k+ classifications and a pre-assembled [html pages][1] site through using megadetector-utils Here’s an example of the structure for one image where a great blue heron was classified:

{
    "filepath": "images/batch_5/bada276600ce2790b1d2877e7d6e7a25.jpg",
    "classifications": {
    "classes": [
        "96fe1a07-7ef1-4a2f-99e1-ec2c9a78b532;aves;pelecaniformes;ardeidae;ardea;herodias;great blue heron",
        "bfa75aeb-3187-48fe-95b9-f171465cc984;aves;pelecaniformes;ardeidae;ardea;cocoi;cocoi heron",
        "b1352069-a39c-4a84-a949-60044271c0c1;aves;;;;;bird",
        "f8db21f0-6b79-4444-8be4-b87906d56e6a;aves;pelecaniformes;ardeidae;ardea;albus;great egret",
        "1110460b-7f99-405b-a9b0-65a09ecccca1;aves;pelecaniformes;ardeidae;tigrisoma;lineatum;rufescent tiger-heron"
    ],
    "scores": [
        0.7359567880630493,
        0.07253273576498032,
        0.04953608289361,
        0.03795544430613518,
        0.008134812116622925
    ]
    },
    "detections": [
    {
        "category": "1",
        "label": "animal",
        "conf": 0.8977401256561279,
        "bbox": [
        0.0,
        0.0,
        0.3460410535335541,
        0.9453125
        ]
    }
    ],
    "prediction": "96fe1a07-7ef1-4a2f-99e1-ec2c9a78b532;aves;pelecaniformes;ardeidae;ardea;herodias;great blue heron",
    "prediction_score": 0.7359567880630493,
    "prediction_source": "classifier",
    "model_version": "4.0.1a"
}

First, we need to extract data from the JSON file into a dataframe so we can more easily parse through some questions. Looking at the JSON structure, we need the filepath, a paired list of the classifications and their scores, and a paired list of any detections and their scores. Speciesnet does ultimately give us a prediction inference based on a taxonomic hierarchy decision set, but to learn more, we want the full dataset minus some very low confidence prections. My preference for this step is to stick to pure python for parsing - aiming to create a list of dicts for generating a dataframe. In my opinion, this is a lot easier to read than the jargon in the pandas library for extraction.

_CONF = 0.3 # Master Setting for confidence or score threshold

# Function for pairing classification and score
def get_high_score_classes(scores, classes, threshold=_CONF):
    return [{"score": f'{score:.2f}', "class": cls} for score, cls in zip(scores, classes) if score > threshold]

# Function for pairing detection and score
def get_high_conf_detections(detections, target_label, threshold=_CONF):
    return [(f'{det["conf"]:2f}', det["label"]) for det in detections if det["label"] == target_label and det["conf"] > threshold]

# Staged Configuration
master_data = []

# Open and loop over json file
with open(master_json, "r") as f:
    data = json.load(f)
    for image in data["predictions"]:
        high_score_classes = get_high_score_classes(
            image["classifications"]["scores"],
            image["classifications"]["classes"],
            _CONF
        )
        master_data.append({
            "imageName": image["filepath"].split("/")[2],
            "nDetections_animal": len(get_high_conf_detections(image["detections"], "animal", _CONF)),
            "nDetections_human": len(get_high_conf_detections(image["detections"], "human", _CONF)),
            "nDetections_vehicle": len(get_high_conf_detections(image["detections"], "vehicle", _CONF)),
            "nClassifications": len(high_score_classes),
            "highScoreClasses": high_score_classes

        })

# Previous work identified 55363 as our heron example index - 
display(master_data[55363])

This script processes all 100k+ predictions in about 7 seconds, extracting a preferable starting point for EDA.
The heron example:

{'imageName': 'bada276600ce2790b1d2877e7d6e7a25.jpg',
 'nDetections_animal': 1,
 'nDetections_human': 0,
 'nDetections_vehicle': 0,
 'nClassifications': 1,
 'highScoreClasses': [{
    'score': '0.74',
    'class': '96fe1a07-7ef1-4a2f-99e1-ec2c9a78b532;aves;pelecaniformes;ardeidae;ardea;herodias;great blue heron'
 }]
}

From here it’s very simple to just define the dataframe as df = pd.DataFrame(master_data)

Next, we want to make sure that each row in our dataframe contains exactly one class prediction. This step is important because if an image has more than one classification above our threshold, while we’re ok with having more than one classification per image (sometimes there are more than 1 species in a photo!), we want to know the high score classification(s) in each photo.

For example:

imageName: '269ed4dcbaf7419a1d00e6957ff2beff.jpg'
highScoreClasses:
  [
    {'score': '0.57', 'class': 'f1856211-cfb7-4a5b-9158-c0f72fd09ee6;;;;;;blank'},
    {'score': '0.37', 'class': 'd372cda5-a8ca-4b7b-97ed-4e4fab9c9b4b;mammalia;cetartiodactyla;suidae;sus;scrofa;wild boar'}
  ]

With df2 = df.explode("highScoreClasses", ignore_index=True) turns into:

imageName: '269ed4dcbaf7419a1d00e6957ff2beff.jpg'
highScoreClasses:
  [
    {'score': '0.57', 'class': 'f1856211-cfb7-4a5b-9158-c0f72fd09ee6;;;;;;blank'}
  ]
imageName: '269ed4dcbaf7419a1d00e6957ff2beff.jpg'
highScoreClasses:
  [
    {'score': '0.37', 'class': 'd372cda5-a8ca-4b7b-97ed-4e4fab9c9b4b;mammalia;cetartiodactyla;suidae;sus;scrofa;wild boar'}
  ]

This increased our row count from 104937 to 107535

Now we’re going to parse that very long taxonomic string into the common name ‘most specific’ name that occupies the last item in the semicolon separated list. This is a very long line of code - but, chatGPT helped puzzle through it and we’re going to use it.

# Split by ';' and take the last part
df2['class'] = df2['highScoreClasses'].apply(lambda d: d.get('class') if isinstance(d, dict) else None).str.split(";").str[-1]

Last, we want to re-associate some of the data originally pulled for the predictions - namely the datetime and the publicURL for each image. To do that we will merge the predictions with the metadata - both files were downloaded as CSVs at this stage or early on in the process and can be linked via the unique mediaID (a simple parse of the imageName).

merged_df = pd.merge(metadata, predicts, on="mediaID", how="inner")

After running this the first time, the merged_df had a few hundred more rows than the predictions dataset - so, the metadata probably has duplicated media ids. This is pretty normal because the image uploading process doesn’t include deduplication. So, before the merge I added metadata_unique = metadata.drop_duplicates(subset=['mediaID'])

merged_df = pd.merge(metadata_unique, predicts, on="mediaID", how="inner")

Selecting our needed columns gives us:

Cleaned Data

mediaID	timestamp	publicURL	folderName	human	class
113f9bb5…39b0	2024-12-25 15:29:24	https://urbanriverrangers…SYFW01868.jpg	2025-01-04_UR027	1	human
871315ba…d6de	2024-12-25 15:29:24	https://urbanriverrangers…SYFW01869.jpg	2025-01-04_UR027	1	human
991baede…a9db	2024-12-25 15:29:24	https://urbanriverrangers…SYFW01870.jpg	2025-01-04_UR027	1	human
32c8912b…f68c	2024-12-25 15:29:26	https://urbanriverrangers…SYFW01871.jpg	2025-01-04_UR027	0	blank
f4ba4a3d…8d0a	2024-12-25 15:29:26	https://urbanriverrangers…SYFW01872.jpg	2025-01-04_UR027	0	blank

Exploratory Analysis

First let’s just get a list of the top 20 classifications and graph it - there is an overwhelming number of blanks - so we’re going to use a log-scale.
..and by the numbers:

#	Class	Count	%
1	blank	90475	84.1
2	bird	6122	5.7
3	human	4499	4.2
4	western pond turtle	1886	1.8
5	mallard	818	0.76
6	anseriformes order	588	0.55
7	american coot	299	0.28
8	domestic dog	161	0.15
9	northern raccoon	151	0.14
10	reptile	150	0.14
11	great blue heron	128	0.12
12	vehicle	109	0.10
13	eastern cottontail	82	0.076
14	wild turkey	68	0.063
15	domestic cattle	66	0.061
16	wood duck	53	0.049
17	brown rat	50	0.046
18	white-tailed deer	43	0.040
19	central american agouti	41	0.038
20	domestic cat	38	0.035

An overwhelming 84% of the images are blanks according to speciesnet. Given probably more than a few of these are just false negatives, but still - it mirror’s what we’ve been experiencing in labeling activities: a whole lot of boring.

For completeness - let’s also look at the least frequent 20. Most of these are ties because they are n=1:

#	Class	Count	%
65	rattus species	1	0.001
66	greater roadrunner	1	0.001
67	scaly ground-roller	1	0.001
68	merle dyal malgache	1	0.001
69	red acouchi	1	0.001
70	blood pheasant	1	0.001
71	white-lipped peccary	1	0.001
72	red fox	1	0.001
73	desert cottontail	1	0.001
74	rufescent tiger-heron	1	0.001
75	common wombat	1	0.001
76	african elephant	1	0.001
77	bearded pig	1	0.001
78	eastern chipmunk	1	0.001
79	fossa	1	0.001
80	l’hoest’s monkey	1	0.001
81	house rat	1	0.001
82	bobcat	1	0.001
83	nine-banded armadillo	1	0.001
84	guenther’s dik-dik	1	0.001

There are clearly some errors in classification. This run of the model was run without any geofencing to avoid collapsing low classification confidences into a higher “animal” taxonomy level. Probably a large portion of these would not exist. Increasing our _CONF threshold to >0.80 reduced our species counts to 27 for example instead of 84. Unfortunately playing with this value drops some true detections in favor of having none at all - we will explore this type of decision more after we have an opportunity for fine-tuning.

Another popular angle in conservation is looking for patterns in usage by time of day. Here’s a chart for the top 10 classes and all of our pictures.
This could be further subsetted into seasons but for this first pass we are just getting an understanding.

There are definiely nocturnal patterns already emerging in a “we aren’t totally confident it got it right all that often” dataset.
For example, we see humans follow a normal bell curve pattern of visitation, dogs seem to spike in the morning, noon, and evening (for walkies I hope), and species like cottontails and raccoons show up when the lights go out and nobody is around.

Same plot but with a shared y-axis:

Here we get a sense of who we’re sharing the wild mile with: birds and turtles (not western pond turtles, but we’ll fix that soon)

The full cleaning and eda notebook is embedded in all of its’ chaotic glory here:

Obligatory example of good classification

Link back to top

2024-06-27_UR011| class: great blue heron

2024-10-12_UR005| class: great blue heron

2024-06-27_UR011| class: great blue heron

2024-09-08_UR011| class: great blue heron

2024-10-12_UR005| class: great blue heron

2024-09-08_UR011| class: great blue heron

Next Steps

Overall, we succeeded in using the SpeciesNet ensemble released by google. We were able to generate over 100k predictions for our camera trap images, and parse those predictions into a useable structure for analysis and validation.

The first attempts at this used a gps geofence parameter and full resolution images. We seemed to have better luck by resizing the images before inference - something that speciesnet does also do - but perhaps the two stage downsampling of the images led to a more familiar (lower resolution) image like those used for training initially.

The next step I’m excited about involve learning how to deploy this model to huggingface through gradio & then, after getting human graded image ids, working on fine tuning SpeciesNet to our image types.

The goal here is to reduce the boring parts of image labeling - something MegaDetector and SpeciesNet did excellently.

To get started on parsing the false positive (species who don’t belong) I build a webapp to interact with gbif data. This is something you can both use here.
And read about here.

SpeciesNet was described by Gadot et al.¹

Gadot, T., Istrate, Ș., Kim, H., Morris, D., Beery, S., Birch, T., & Ahumada, J. (2024). To crop or not to crop: Comparing whole-image and cropped classification on a large dataset of camera trap images. IET Computer Vision. Wiley Online Library. ↩