How to Build a Price Monitoring Tool with Product Scrapes
A practical walkthrough of building a price tracker that monitors products across Amazon and other stores, with working code and the edge cases we learned about the hard way.
Price tracking is probably the most common thing people build with our API. The basic idea is simple - check a product page on a schedule, compare the price to what you saw last time, and send an alert if it drops. The devil is in the details.
This post walks through a working implementation. I'll point out the things that tripped us up when we built our own internal version.
The overall shape
You need four things:
- A list of products you're tracking (just URLs, really)
- Something that calls the API on a schedule to check current prices
- A database to store price history
- Alerts when prices change meaningfully
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Product │────→│ Scheduled │────→│ Product │
│ Registry │ │ Fetcher │ │ Scrapes API │
└─────────────┘ └──────┬───────┘ └─────────────┘
│
┌──────▼───────┐
│ Price Store │
│ (Database) │
└──────┬───────┘
│
┌──────▼───────┐
│ Alerts │
└─────────────┘
Database schema
Nothing clever here. Two tables - one for what you're tracking, one for the price log:
CREATE TABLE tracked_products (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
url VARCHAR(2048) NOT NULL,
name VARCHAR(500),
current_price DECIMAL(10,2),
currency VARCHAR(3) DEFAULT 'USD',
last_checked_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE price_history (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
product_id BIGINT NOT NULL,
price DECIMAL(10,2) NOT NULL,
currency VARCHAR(3) DEFAULT 'USD',
in_stock BOOLEAN,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (product_id) REFERENCES tracked_products(id)
);
You might be tempted to add more fields. Resist that urge at the start. You can always add columns later, and if you're storing the raw API response (you should), you can backfill anything you need.
The fetcher
Here's a basic Node.js version:
const axios = require('axios');
const API_KEY = process.env.PRODUCT_SCRAPES_API_KEY;
const API_URL = 'https://productscrapes.com/api/fetch';
async function fetchProductPrice(url) {
const response = await axios.post(API_URL, { url }, {
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
}
});
const product = response.data.data.product;
return {
title: product.title,
price: parseFloat(product.price),
currency: product.currency,
inStock: product.in_stock
};
}
In production you'll want retries and error handling around this, obviously. Some requests will fail - the product got delisted, the store is down, whatever. Log the failure and move on to the next product. Don't let one bad URL block the whole queue.
When to alert (and when not to)
This is where most people overcomplicate things. Early on, we sent an alert on every single price change and it was just noise. A $0.03 drop on a $50 item isn't worth an email.
What actually works: alert immediately on drops over 20% or back-in-stock events. Send a daily digest email for smaller changes. Ignore price increases entirely - nobody wants to hear about those.
One thing to watch out for: temporary price spikes. Some retailers briefly show a higher price during inventory updates or A/B testing. If you see a price jump and then it goes back to normal within an hour, it wasn't a real change. We added a 2-hour confirmation window before recording any price increase, which cut out almost all the false positives.
How often to check
This depends on what you're tracking and how many API calls you want to burn through. We settled on these rough intervals after some trial and error:
- Electronics and anything with flash sales: every 1-2 hours
- Fashion and seasonal goods: twice a day is plenty
- Books, home goods, stable-price items: once a day
If you're monitoring flash deals or lightning sales specifically, you'll want to check more frequently - every 15 minutes or so. But that burns through API quota fast, so be selective about which products get that treatment.
One thing that bit us: don't run all your checks at the same time. If you have 5,000 products, don't fire 5,000 API calls at midnight. Spread them out. It's better for rate limits and you get more evenly distributed data.
Edge cases that will bite you
Currency. If you're tracking products across different Amazon regions, store prices in their original currency. Don't convert to USD at fetch time - exchange rates change and you'll end up with phantom price changes that are really just currency fluctuations.
Variants. The same Amazon URL can show different prices depending on the size or colour selected. Our API returns the default variant's price, which is usually the cheapest one. If you need a specific variant, you'll need the variant-specific URL.
Out-of-stock products. When a product goes out of stock, the price field might come back empty. Decide upfront whether you want to keep checking it or pause monitoring. We keep checking - that way we catch back-in-stock events.
Running it
The whole thing can run on a single $5/month VPS if you're tracking a few thousand products. A cron job that runs every 15 minutes, works through the products that are due for a check, and fires off alerts. Nothing fancy.
For the alerts themselves, we just use email via Postmark for most things and a Slack webhook for the "holy crap, 40% off" moments. You could also do push notifications if you're building a consumer-facing app.
The Product Scrapes API handles the hard part - rotating proxies, dealing with anti-bot systems, parsing different site structures. Your code just needs to compare two numbers and decide whether to send an email.