Engineering

Web Agents for E-commerce Price Monitoring: MAP Enforcement, Competitive Intelligence, and Catalog Sync

TinyFishie··
Share
Web Agents for E-commerce Price Monitoring: MAP Enforcement, Competitive Intelligence, and Catalog Sync

Web Agents for E-commerce Price Monitoring: MAP Enforcement, Competitive Intelligence, and Catalog Sync

The scale problem: why traditional scrapers can't keep up

You have 1,000 SKUs. Sold across 20 retailers. You need daily pricing data.

That's 20,000 data points to collect every morning — before your pricing team starts work.

Traditional scrapers were built for a simpler web. Static HTML, predictable selectors, no JavaScript. Today's retail sites are the opposite: prices rendered client-side, layout A/B tests that silently break your CSS selectors every few weeks, bot detection that blocks entire IP ranges. Maintaining that infrastructure isn't a one-time project. It becomes a full-time engineering responsibility.

The failure mode is quiet and expensive. Your extraction returns stale data with no error. A competitor drops below your Minimum Advertised Price (MAP) price. You find out three weeks later when a distributor emails to complain. By then, other retailers have matched the low price, and you're managing a channel conflict instead of preventing one.

Web agents change the architecture of the problem. Instead of maintaining brittle selectors across 20 retailer codebases, you describe the goal in plain English — "find the current price of this product" — and the agent navigates, renders, and extracts by reading page content directly. No selectors to maintain. No silent failures when layouts change.

This article walks through three production-ready use cases: competitor price monitoring, MAP violation detection, and dynamic pricing intelligence — with working code for each.

When does a web agent actually beat a extraction for price monitoring?

Before the use cases: if you're still deciding whether this applies to you, here's the honest decision framework.

Quick reference: when a web agent beats a extraction for price monitoring

  1. Target sites render prices via JavaScript — anything a traditional extraction can't see after initial page load
  2. Volume × frequency exceeds extraction maintenance capacity — roughly 50+ SKUs daily across 5+ retailers
  3. You need schema-consistent output — downstream pricing systems consuming structured JSON
  4. Your existing tooling stops working on sites with strict access requirements — behavioral fingerprinting, session management, IP variability
  5. You need an audit trail — each agent run produces a timestamped record, not just a number

Use case 1: Competitor price monitoring at scale

The pattern: give the agent a list of product URLs, a structured output schema in the goal prompt, and run them concurrently. Total time equals the slowest single task — not the sum of all tasks.

Installation

pip install tinyfish
export TINYFISH_API_KEY=sk-tinyfish-*****

The code

import asyncio
import json
from datetime import datetime, timezone
from tinyfish import AsyncTinyFish, BrowserProfile

client = AsyncTinyFish()  # Reads TINYFISH_API_KEY from environment

PRODUCTS = [
    {"product_id": "airpods-pro-3", "url": "https://www.bestbuy.com/site/..."},
    {"product_id": "airpods-pro-3", "url": "https://www.amazon.com/dp/..."},
    {"product_id": "sony-wh1000xm6", "url": "https://www.target.com/p/..."},
    # Add up to 1,000 product URLs
]

async def extract_price(product_id: str, url: str) -> dict:
    goal = (
        "Find the current listed price of the product on this page. "
        "Return a JSON object with these fields only: "
        "price (number, no currency symbol), currency (ISO 4217 code), in_stock (boolean). "
        "If no price is visible, set price to null."
    )

    response = await client.agent.run(
        url=url,
        goal=goal,
        browser_profile=BrowserProfile.STEALTH,
    )

    # For debugging: response.streaming_url contains a live browser replay of this run (valid 24h)
    # response.result is a dict shaped by your goal prompt — or None if the run failed
    # at the infrastructure level (browser crash, timeout, etc.)
    result = response.result or {}

    # Two distinct failure modes to handle:
    # 1. Infrastructure failure: response.result is None → caught by `or {}`
    # 2. Goal failure: run completed but agent couldn't achieve the task →
    #    result contains {"status": "failure", "reason": "..."} instead of your data
    #    This is the most common production surprise — COMPLETED status ≠ goal achieved
    if result.get("status")  "failure":
        return {
            "product_id": product_id,
            "price": None,
            "error": result.get("reason", "goal_failed"),
            "scraped_at": datetime.now(timezone.utc).isoformat(),
            "source_url": url,
        }
# (continued)

    # Happy path: result contains the fields you specified in the goal prompt
    return {
        "product_id": product_id,
        "price": result.get("price"),        # number, as specified in goal
        "currency": result.get("currency", "USD"),
        "in_stock": result.get("in_stock"),  # boolean, as specified in goal
        "scraped_at": datetime.now(timezone.utc).isoformat(),
        "source_url": url,
    }

async def main():
    # All requests fire concurrently — total time = slowest single task
    tasks = [extract_price(p["product_id"], p["url"]) for p in PRODUCTS]
    results = await asyncio.gather(*tasks)
    print(json.dumps(results, indent=2))

asyncio.run(main())
Note on concurrency: When you send more requests than your plan's concurrent session limit, TinyFish queues the excess runs automatically (status: PENDING) — they start as sessions free up. Size your batches to your plan's concurrency limit for predictable run times.
Note on result handling: status: "COMPLETED" means the browser ran successfully — not that your goal succeeded. A run that hit an Access Denied page will also return COMPLETED, but result will contain {"status": "failure", "reason": "..."} instead of your data. The code above handles both cases explicitly. This is the most common source of silent failures in production price monitors.

Output schema

[
  {
    "product_id": "airpods-pro-3",
    "price": 249.99,
    "currency": "USD",
    "in_stock": true,
    "scraped_at": "2026-03-27T14:32:01Z",
    "source_url": "https://www.bestbuy.com/site/..."
  },
  {
    "product_id": "airpods-pro-3",
    "price": 239.00,
    "currency": "USD",
    "in_stock": true,
    "scraped_at": "2026-03-27T14:32:04Z",
    "source_url": "https://www.amazon.com/dp/..."
  }
]

The parallel execution advantage

Approach1,000 products @ 3s eachActual wall-clock time
Sequential extraction1,000 × 3s~50 minutes
TinyFish concurrent agents1,000 × 3s~3–5 minutes

The math: total time equals the slowest single task, not the sum. At 1,000 concurrent agents, a batch that took 50 minutes sequentially completes in the time it takes to process one page.

Success rates by retailer category

Retailer categorySuccess rate
Major global e-commerce platformsup to 85% on sites with strict automation requirements
Sites with strict access requirementsLower — success rate varies by site configuration

Use case 2: MAP violation detection with evidence reports

MAP enforcement has a discovery problem. Your team can't manually check every authorized retailer every day — and by the time a violation surfaces through a distributor complaint, the channel damage is done.

A scheduled agent run closes the gap: extract prices, compare against your MAP database, generate a timestamped evidence record per violation. The whole workflow runs while your team sleeps.

The code

import asyncio
import json
from datetime import datetime, timezone
from tinyfish import AsyncTinyFish, BrowserProfile

client = AsyncTinyFish()

# Your MAP pricing database
MAP_PRICES = {
    "airpods-pro-3": 249.00,
    "sony-wh1000xm6": 299.00,
}

# Authorized retailers to monitor per product
RETAILERS = {
    "airpods-pro-3": [
        {"name": "BestBuy",  "url": "https://www.bestbuy.com/site/..."},
        {"name": "Target",   "url": "https://www.target.com/p/..."},
        {"name": "Walmart",  "url": "https://www.walmart.com/ip/..."},
    ],
}

async def check_retailer(product_id: str, map_price: float, retailer: dict) -> dict | None:
    response = await client.agent.run(
        url=retailer["url"],
        goal=(
            "Return a JSON object with: price (number, no symbol), currency (ISO 4217). "
            "If no price is found, set price to null."
        ),
        browser_profile=BrowserProfile.STEALTH,
    )

    result = response.result or {}
    advertised = result.get("price")

    if advertised is None or result.get("status")  "failure":
        return None  # Could not extract price — log separately

    if advertised < map_price:
        return {
            "product_id": product_id,
            "retailer": retailer["name"],
            "map_price": map_price,
            "advertised_price": advertised,
            "violation_amount": round(map_price - advertised, 2),
            "evidence_url": retailer["url"],
            "detected_at": datetime.now(timezone.utc).isoformat(),
        }

    return None  # Compliant

async def run_map_check():
    all_tasks = []
    for product_id, map_price in MAP_PRICES.items():
        for retailer in RETAILERS.get(product_id, []):
            all_tasks.append(check_retailer(product_id, map_price, retailer))

    results = await asyncio.gather(*all_tasks)
# (continued)
    violations = [r for r in results if r is not None]

    print(json.dumps(violations, indent=2))
    return violations

asyncio.run(run_map_check())

Evidence report output

{
  "product_id": "airpods-pro-3",
  "retailer": "Target",
  "map_price": 249.00,
  "advertised_price": 219.99,
  "violation_amount": 29.01,
  "evidence_url": "https://www.target.com/p/...",
  "detected_at": "2026-03-27T14:35:22Z"
}

This structure pipes directly into your brand protection workflow. Route violations to Slack for same-day retailer outreach, to your ERP for automated distributor notification, or to a compliance dashboard. The timestamped URL is your legal evidence record — captured at the exact moment of detection.

Use case 3: Dynamic pricing intelligence

Static MAP enforcement catches violations after the fact. The harder problem is building a pricing system that reacts to market changes before they compound.

The structural issue: prices on food delivery, travel, and marketplace platforms load dynamically per user session, vary by location, and change multiple times per day. A extraction built on CSS selectors breaks every time a vendor updates their layout — and in high-velocity markets, that's often. An agent built on a goal — "find the current price of this item at this location" — adapts to layout changes automatically, because it's reading the page content and layout directly, not pattern-matching against a selector you wrote last quarter.

The output that makes this useful downstream isn't a report — it's a structured feed with enough granularity to drive pricing decisions:

{
  "market_id": "sf-94102",
  "restaurant_id": "R_8821",
  "item_id": "burger-classic",
  "base_price": 12.99,
  "delivery_fee": 1.99,
  "promo_price": null,
  "scraped_at": "2026-03-27T18:00:00Z"
}

The code structure to produce this is identical to Use case 1 — AsyncTinyFish + asyncio.gather() across a list of URLs, with a goal prompt that specifies the schema above. The only difference is the schema itself. If you've already built Use case 1, this is a goal prompt change, not an architecture change.

Note: This represents a generalized deployment pattern, not a published case study. Specific customers are not identified.

A major food delivery platform uses this pattern to track millions of pricing variables per month across thousands of markets. The scale is unusual; the architecture isn't.

Handling access requirements on retail sites: an honest assessment

The majority of major consumer retail platforms runs at up to 85% success rate on sites with strict automation requirements. For most teams, this covers 80%+ of the retailers they care about.

The exception is sites running enterprise behavioral analysis systems. These systems don't look for a missing header — they model whether the entire session pattern looks human. Success rates are lower and inconsistent across all automation tools. No vendor publishes reliable numbers for the hardest-protected sites, and you should be skeptical of any that do.

Practical approach for protected retailers:

from tinyfish import TinyFish, BrowserProfile, ProxyConfig, ProxyCountryCode

client = TinyFish()

result = None
with client.agent.stream(
    url="https://protected-retailer.com/product/...",
    goal="Extract the current price and stock status",
    browser_profile=BrowserProfile.STEALTH,
    proxy_config=ProxyConfig(enabled=True, country_code=ProxyCountryCode.US),
) as stream:
    for event in stream:
        # SDK CompleteEvent: result lives in event.result_json
        if getattr(event, "type", None)  "COMPLETE":
            # Layer 1: infrastructure failure
            result = event.result_json
            break

# Layer 2: goal failure
if result and isinstance(result, dict) and result.get("status")  "failure":
    result = None

TinyFish handles detection at the infrastructure level — not through JS injection applied after browser start. Adding a matching country-code proxy handles geo-specific access requirements. For sites that still reliably block: weigh whether the data value justifies ongoing engineering time, or whether an alternative data source exists.

Cost model: what does daily monitoring actually cost?

TinyFish pricing by plan: Pay-as-you-go $0.015/credit · Starter $15/mo (1,650 included credits, $0.014/credit overage) · Pro $150/mo (16,500 included credits, $0.012/credit overage) · Enterprise custom. One price check = one step.

ScaleDaily stepsMonthly stepsPay-as-you-goPro plan
100 products × 5 retailers500~15,000~$225/mo$150/mo (within 16,500 included)
500 products × 10 retailers5,000~150,000~$2,250/mo$150 + ~$1,602 overage ≈ $1,752/mo
1,000 products × 20 retailers20,000~600,000~$9,000/mo$150 + ~$7,002 overage ≈ $7,152/mo
Enterprise scalemillions/monthEnterprise

The free tier — 500 credits, no credit card — covers a complete scan of up to 500 product URLs. Enough to test your real target retailer list and validate extraction quality before committing to a production schedule.

For context on extraction maintenance costs: a junior engineer spending 25% of their time keeping selectors current across 20 retailer sites is a real line item. That's not in the table above.

Build vs. buy: when your existing extraction is still the right answer

The honest answer is that many teams don't need a web agent for price monitoring. If your requirements are under 50 products, static pages, and a handful of stable low-protection retailers — Scrapy or a basic Playwright script is cheaper. Build it yourself.

The crossover point isn't a product decision, it's a maintenance math question: when does the engineering time spent keeping selectors current across 20 retailer sites exceed the cost of outsourcing that infrastructure? That typically happens around 100+ products, 5+ retailers, or the first time a retailer redesign takes down your monitoring for a week without anyone noticing.

Web agents also make sense when extraction accuracy has downstream consequences. MAP violation reports submitted with incorrect prices are worse than no report — they expose you to retailer disputes you can't back up. A extraction that silently returns the wrong price is a liability. An agent run with a failure status is just a gap in your data, which is recoverable.

Get started

The free tier gives you 500 credits with no credit card required — enough to run a complete price scan across 500 products and validate results against your real target retailers.

For teams monitoring 10,000+ products or needing SLA guarantees, contact our enterprise team for volume pricing and dedicated support.

FAQ

Can web agents handle price monitoring on major e-commerce platforms?

Yes. Major e-commerce platforms run at up to 85% success rate on sites with strict automation requirements. Standard browser profile covers most product pages; switch to managed browser profile for high-volume or well-protected retailers.

How fresh is the price data?

Agents run on demand, so freshness is whatever schedule you set. Daily is the most common pattern. For flash-sale categories, teams commonly run 4× daily or trigger runs on inventory alerts.

Does this work for international retailers?

Yes. Include a currency field (ISO 4217) in your goal schema. For geo-restricted content, add proxy_config with the matching country code (US, GB, CA, DE, FR, JP, AU supported).

What does `COMPLETED` status actually mean?

Infrastructure success only — the browser launched and finished. It does not mean your goal succeeded. Always check the result field: if it contains {"status": "failure"}, the agent ran but couldn't extract the data. This is the most common production gotcha.

What if one agent in a batch fails?

Each run is independent — one failure doesn't affect others. Every run includes a streaming_url for debugging (valid 24 hours). Failed runs where infrastructure succeeded are not billed.

Is there a concurrency limit?

Yes. Exceeding it triggers automatic queuing — no 429 errors, but later requests take longer. Check your plan's limit in the dashboard.

See It in Action

The free tier includes 500 steps — enough to run a complete e-commerce monitoring workflow against real data before committing to a plan.

Start free, no credit card →

Related Reading

Get started

Start building.

No credit card. No setup. Run your first operation in under a minute.

Get 500 free creditsRead the docs