If you use Amazon Photos to back up your pictures, you might have noticed something frustrating over time: duplicate photos. It happens to everyone — you upload a batch of photos twice, you edit a picture and keep both versions, or your phone syncs a photo you already had. Before you know it, your library is cluttered with copies you don't even remember making.
I recently built a command-line tool that connects to your Amazon Photos library and automatically finds those duplicates for you. Let me walk you through how it works.
The problem with finding duplicates the old way
The naive approach to finding duplicate files is to compare them byte by byte — if two files are identical down to the last bit, they're duplicates. But photos don't work that well like that. The same picture can exist in your library in slightly different forms: different compression, a small crop, a brightness tweak, or even a different file format (JPEG vs. PNG). Byte-by-byte comparison would miss all of those.
How the tool actually detects duplicates
The tool uses a technique called perceptual hashing. Instead of comparing raw file data, it looks at what the image looks like to the human eye. It converts each photo into a compact "fingerprint" that captures its visual content, then measures how similar two fingerprints are — giving a percentage from 0% (completely different) to 100% (visually identical).
This means it can detect duplicates even when the files are technically different, as long as the pictures look the same.
A two-phase process to avoid downloading everything
Downloading and visually comparing thousands of photos one by one would take forever. The tool is smarter than that: it first does a quick metadata pass to narrow down candidates.
Photos with the same filename, or taken at the exact same second, are very likely to be duplicates. The tool groups those photos together first — without downloading any images. Only then does it download and visually compare the photos within each group. This makes the whole process much faster.
Running the command
The command is photos:find-duplicates. In its simplest form, you just run it and let it scan your entire library:
php artisan photos:find-duplicates
You can also narrow things down. For example, to only look at photos uploaded in the last 30 days:
php artisan photos:find-duplicates --uploaded-last-days=30
Or to compare photos taken within a specific date range:
php artisan photos:find-duplicates --taken-between=01/01/2024,31/12/2024
By default, it groups candidate duplicates by filename. You can also group by the timestamp when the photo was taken, or both at once:
php artisan photos:find-duplicates --group-by=taken-at php artisan photos:find-duplicates --group-by=name-and-taken-at
The similarity threshold defaults to 90% — photos that look at least 90% alike are flagged as duplicates. You can make it stricter or more lenient:
php artisan photos:find-duplicates --similarity=95
Comparing two specific photos
Sometimes you already suspect two specific photos are duplicates. Instead of scanning the whole library, you can compare just those two by passing their IDs directly:
php artisan photos:find-duplicates photo-id-1 photo-id-2
The tool will tell you the similarity percentage and whether they're considered duplicates.
More info: https://github.com/icsbcn/awsphotosapi