Crawlcrawl — the credit-based crawler API at $0.001 per page, with 8 specialty actors

Published 2026-06-28 Updated 2026-06-28 Read 22 min Words ~5,160 Crawlcrawl · crawlcrawl.com

tl;dr — the whole post in six bullets

Crawlcrawl is a cloud-operated crawler API with one unit of pricing: the credit. 1 credit equals 1 chargeable page (a scan, a scraped page, a crawled page, or one actor call). The credit cost is flat $0.001 across every monthly plan — bigger plans never cost less per credit, so the customer always knows what next month costs.
Eight specialty actors are included on every paid tier and on the free tier: render-diff (static fetch vs full browser render with ai_bot_blind_pct), sitemap-audit, internal-link-graph (PageRank + orphan detection), structured-data (six syntaxes parsed into one JSON), on-page audit (~30 technical SEO rules), article extraction, broken-link check, and proxy-fetch (anti-bot across 190+ countries).
Drop-in compatible with Firecrawl. The /v1/scrape, /v1/map, and /v1/crawl endpoints return the same envelope shape your existing parser already handles. Change the base URL, swap the API key prefix, redeploy. No client-side changes.
Free tier ships 1,000 credits per month with no card. Paid plans run from Hobby $5 / 5K credits, Starter $10 / 10K, Pro $20 / 20K, Growth $50 / 50K, Scale $100 / 100K, and Enterprise contracts at higher volume.
JavaScript rendering on real Chrome with auto-fallback on direct-fetch failures. Anti-bot routing across Cloudflare, Akamai, PerimeterX, Datadome, and CAPTCHAs in 190+ countries. HMAC-signed webhooks on every async crawl. Permanent HTML archive — every fetched page stored long-term, re-pull historical datasets years later at zero credit cost.
Built and operated by Ollasoftware out of HSR Layout, Bengaluru. Production substrate that powers Aeoniti, Ollagraph, and the broader Ollasoftware portfolio's web-data needs.

#The setup: every team building on the web ends up needing a crawler

There is a routine moment in the evolution of almost every team building modern software where someone asks the question "how do we get the contents of these URLs into our system, reliably, at scale, in a shape we can actually work with." The team that is building a RAG pipeline against customer documentation needs it. The team that is building an AEO audit workflow needs it. The team that is building a competitive-intelligence dashboard needs it. The team that is building an LLM-driven extraction pipeline against partner-published content needs it. The team that is building any product with "go read this and tell me what it says" as an operating primitive eventually arrives at the same crawler-shaped need.

The historical default answer was to write the crawler from scratch. Pick a request library, write the retry logic, handle the redirect chain, parse the HTML, deal with the JavaScript-rendered pages, write the anti-bot bypass for the sites that block automated traffic, deal with the storage layer for the pages the team wants to keep, write the diff logic for the recurring monitors. The crawler ends up consuming an outsized share of the engineering team's attention relative to its visibility in the product, and at some point — usually after a third or fourth incident where the crawler quietly broke against a single high-value target — the team accepts that the crawler is its own product and probably should not be built in-house.

The market response was a category of hosted crawler APIs that ship the working substrate behind a clean API. Firecrawl is the most visible name in the category as of 2026; ScrapingBee, Bright Data, Apify, Spider, and several smaller vendors round out the choice set. They are all reasonable products. They mostly converge on the same shape: a single-URL scrape endpoint, a multi-page crawl endpoint, a structured-data endpoint, some kind of anti-bot capability, and a billing meter that prices on the per-call or per-token rate the vendor has settled on.

The dimensions that have not converged are the dimensions that matter most to teams building production workloads on top of the crawler. Pricing is one of them. Most established crawlers have tiered pricing where the per-page rate falls as the plan size grows — which sounds customer-friendly but in practice means the customer has to commit to a higher plan than they need in order to get the unit cost they want, or accept a higher unit cost than they could get if they over-committed. Specialty actors are another — the technical-SEO audit, the render-diff against AI-bot blindness, the sitemap-audit, the internal-link-graph — that the established crawlers either do not ship or ship as separate vendors the customer integrates against. Storage permanence is a third — most established crawlers either expire historical datasets after a window or charge separately for long-term retention.

Crawlcrawl exists because the founders watched their own portfolio companies and a growing crowd of teams building production AI workloads run into all three dimensions at once and conclude that the established crawlers were rentable substrates rather than purchasable infrastructure. The bet was simple: ship the crawler API on one credit unit at a flat price across every plan, with the specialty actors included on every tier, and the historical-dataset access permanent rather than expiring.

#What Crawlcrawl actually is, in one paragraph and then in detail

Crawlcrawl is a cloud-operated web-crawler API that runs as a managed SaaS service. The capability surface is three integration paths behind one API: a single-URL scrape (POST a URL, get clean markdown plus structured signals back in one round-trip), a multi-page crawl (queue a crawl, get a webhook when it finishes with a paginated dataset of clean markdown), and eight specialty actors (each one POST, each one credit) for the work most teams used to glue together from smaller open-source projects or smaller vendors. The single API serves the developer building a one-off integration, the platform building a RAG pipeline that ingests at scale, and the agency running AEO audits across many client domains.

The pricing unit is the credit. One credit equals one chargeable page — whether the call is a single-URL scan, one page of a multi-page crawl, or one invocation of one of the eight specialty actors. The credit cost is flat $0.001 across every monthly plan; bigger plans never cost less per credit. The customer pays exactly the same per-page rate at the free tier as at the scale tier. The plan size controls the included credit pool and the operational ergonomics (higher rate limits, longer queues, larger dataset retention windows); it does not control the unit price. This decision is the one customers point to most often when explaining why they picked the platform: the bill is fully predictable from the very first month.

Free is 1,000 credits per month with no card. Hobby is $5 for 5,000 credits per month. Starter is $10 for 10,000. Pro is $20 for 20,000. Growth is $50 for 50,000. Scale is $100 for 100,000. Enterprise is a custom contract for higher volume with the same per-credit rate and additional operational controls. Every tier includes every endpoint and every actor. There is no "intelligence tier" that gates the technical-SEO audit behind a higher plan, no Pro-only render-diff, no Growth-only structured-data extraction.

Operationally, the platform sits in a specific place. It is not trying to displace Bright Data at the enterprise web-data tier (Bright Data's install base, residential proxy depth, and procurement gravity are real). It is not trying to displace Apify's Actors marketplace (Apify's community of actor authors is a different shape of bet). It is trying to be the right answer for the team that has chosen the focused commercial crawler category — the Firecrawl / ScrapingBee / Spider buyer profile — and wants the flat-credit pricing model, the included specialty actors, and the permanent dataset storage.

#The credit unit and the flat-price discipline

The credit is the only unit of pricing the customer needs to understand to predict their bill. One credit equals one chargeable page. The chargeable page is the API call's output, not the API call itself — a single-URL scan that resolves through three redirects is one credit because it produced one final page; a multi-page crawl that visits 487 pages is 487 credits because it produced 487 final pages; an actor call against a single URL is one credit because the actor produced one finding. There is no separate per-actor surcharge, no per-render-mode multiplier, no per-region pricing layer, no premium-feature gate.

The flat-price discipline is the part that distinguishes the model from every other commercial crawler. Bigger plans cost more in absolute terms because they include more credits, but every plan costs the same per credit. The customer who runs 5,000 credits a month on the Hobby plan pays $0.001 per page. The customer who runs 100,000 credits a month on the Scale plan pays $0.001 per page. The Enterprise customer running ten million credits a month pays the same $0.001 per page. The principle is that the customer should never be paying a penalty for being on the wrong plan size for their volume, and should never be paying a premium for early-stage usage that the established vendors use to subsidise their enterprise tier.

For teams that have run on volume-priced crawlers and seen the per-call rate move as the bill grew, the flat-credit model is a meaningful change in operational behaviour. The team no longer has to negotiate a new contract every time the workload doubles; the bill simply doubles linearly with the workload. The team no longer has to plan migrations between vendor plans to stay on the optimal per-call rate; the credit rate is the credit rate at every scale. The cognitive load of pricing-aware capacity planning falls to zero.

For the operations side of the customer team — the SRE who is responsible for the monthly bill, the finance team that has to forecast vendor spend — the predictability is the headline. The bill for next month is the credit allowance of the current plan multiplied by $0.001, regardless of which endpoints the customer used or which actors the customer ran. There are no variable overage charges that surprise the finance forecast. There are no separate retainer fees for premium features. The bill is the credit pool, paid up-front.

#The eight specialty actors

The actors surface is the part of the platform that distinguishes it most clearly from the focused single-purpose crawlers. Eight endpoints today, each one POST, each one credit, each one available on every paid tier and on the free tier.

On-page audit (`/v1/actors/audit-onpage`) runs about thirty technical SEO rules per page and surfaces flags for missing H1, meta-description length, robots conflicts, canonical loops, hreflang mismatches, and the rest of the rule set that an in-house technical-SEO checklist usually covers. Output is a structured findings array the customer can render directly into a dashboard or fan into a Slack channel for the publishing team.

Article extraction (`/v1/actors/extract-article`) returns the clean article body with author attribution and publication date. Navigation, footer, comment threads, and ad slots are stripped automatically. For teams building RAG over partner-published content or competitive-content monitoring against editorial publishers, this actor produces the article-shape the downstream pipeline needs without the customer writing the cleanup logic themselves.

Broken-link check (`/v1/actors/check-links`) validates every link on a page. The optional anti-bot retry rescues LinkedIn 999 responses and Cloudflare false positives — two specific failure modes that the standard link-checker libraries do not handle gracefully and that produce false positives in the default broken-link report on most established alternatives.

Structured data (`/v1/actors/structured-data`) parses six syntaxes — JSON-LD, Microdata, RDFa, OpenGraph, Dublin Core, Microformats — and merges them into one JSON output. For teams building deterministic extraction pipelines that need the structured metadata the page actually publishes (rather than LLM-extracted "what does this page mean"), this actor is the canonical first call.

Render-diff (`/v1/actors/render-diff`) compares the static fetch with the full browser render. The output includes the word count and link count under each render mode plus the `ai_bot_blind_pct` — the share of the page invisible to GPT-class crawlers that do not execute JavaScript. For brands evaluating their AEO posture, this number is the diagnostic that distinguishes "the page works for humans" from "the page works for AI assistants." A reading of 99.5% ai-bot-blind means the page is structurally invisible to most assistants and the team has a content-architecture problem to solve.

Internal-link graph (`/v1/actors/internal-link-graph`) runs PageRank, weakly-connected components, and orphan detection over an existing crawl ID. The actor surfaces the pages that need internal-linking work, the orphan pages no internal link reaches, and the components that are silently disconnected from the main site. For technical SEO at scale, this is the visualisation the team usually pays a separate tool to produce.

Sitemap audit (`/v1/actors/sitemap-audit`) classifies every URL in the customer's sitemap into seven buckets: ok, redirect, 4xx, 5xx, noindex, canonicalised away, and network error. The `dry_run` variant is free — the customer can see the shape of their sitemap health without spending credits, and only commit to the full audit when the dry run shows there is work to do.

Proxy-fetch (`/v1/cloud/proxy-fetch`) is the anti-bot endpoint. Four pools across 190+ countries handle the routing through residential, datacenter, ISP, and mobile-carrier proxies as needed. Bytes-priced rather than per-page — cheaper than a full Chrome render when the customer does not need JavaScript execution but does need to get past the destination's anti-bot layer. For destinations with sophisticated bot detection (LinkedIn, large e-commerce platforms, certain newspaper sites), this endpoint is the one that turns "this URL is unreachable from your crawler" into "this URL is reachable from your crawler."

“ Eight specialty endpoints. Each one POST, each one credit, included on every paid tier and the free tier. No add-ons.

#Drop-in compatibility with Firecrawl: three alias endpoints

The platform ships three alias endpoints — `/v1/scrape`, `/v1/map`, and `/v1/crawl` — that return the same envelope shapes the corresponding Firecrawl endpoints return. The aliases exist because Firecrawl is the established incumbent in the focused-commercial-crawler category and the customers most likely to evaluate the platform are the ones currently coding against Firecrawl. The drop-in shape compatibility means the migration cost from Firecrawl to the platform is approximately two strings: the base URL and the API key prefix.

`POST /v1/scrape` returns the shape `{ success, data: { markdown, html, metadata } }`. This is the shape every Firecrawl client library already parses. A customer running Firecrawl's Python client points it at the platform's base URL, swaps the key, and the code keeps working unchanged. The same applies to the Node, the Go, and the Ruby client libraries Firecrawl ships.

`POST /v1/map` returns the shape `{ success, links: [...] }`. Every URL discoverable from one entry point. This is the call most customers use to seed a recursive-crawl workflow. The output shape mirrors Firecrawl exactly so downstream code can treat the platform's response and Firecrawl's response as interchangeable.

`POST /v1/crawl` is the async crawl primitive. It returns a job ID, fires the same webhook payload format when the dataset is ready, and exposes the same dataset-pagination shape Firecrawl ships. The customer's webhook receiver does not need modification.

There is one piece of the Firecrawl API the platform does not yet match: `/v1/extract` (LLM-driven extraction against a schema). The platform's endpoint returns 501 with a breadcrumb pointing at the structured-data actor for deterministic schema.org extraction, and the team has been transparent that the LLM-extract surface is on the roadmap but not shipped. For teams whose only Firecrawl dependency is the schema.org-shaped extraction, the actor surface is the alternative; for teams whose dependency is LLM-driven extraction against an arbitrary schema, the migration is partial today and waiting for that endpoint to ship.

For the broader set of teams whose Firecrawl usage is `/v1/scrape`, `/v1/map`, and `/v1/crawl`, the migration is the two-string change. The published migration notes walk through the specific changes and the gotchas (rate-limit headers are the same shape but the platform has a per-credit cap rather than per-call; webhook signatures are HMAC-SHA256 in both cases but the secret rotation flow differs). The full migration usually takes less than an hour of engineering time.

#JavaScript rendering, anti-bot, and the production-fetch substrate

Underneath every endpoint is the fetch substrate the platform has invested most heavily in. JavaScript rendering runs on real Chrome (not a stripped-down browser engine that misses the rendering details that matter on hydration-heavy single-page apps). The platform auto-falls-back from the cheap direct-fetch path to the Chrome-render path when the direct fetch produces a thin response (low word count, low link count, signature heuristics indicating client-side hydration). The customer does not have to know in advance whether a URL is SPA-rendered; the platform figures it out.

The same auto-fallback applies in the opposite direction. URLs that the platform has previously seen render successfully through the direct-fetch path are not unnecessarily promoted to the Chrome-render path. The per-page price is the same regardless of which path the request resolves through, which removes the customer-side incentive to manually optimise the routing — a category of premature optimisation that crawler customers historically spend more time on than the savings warrant.

Anti-bot routing handles the destinations whose owners actively block automated traffic. The platform routes through four proxy pools across 190+ countries, with the standard destination-fingerprint-aware logic that picks the right pool for the specific anti-bot system the destination is running. Cloudflare's defences, Akamai's defences, PerimeterX, Datadome, and the various CAPTCHA implementations all have specific bypass paths the platform has tuned. The customer surfaces one endpoint; the platform handles the routing decision.

The anti-bot surface is bytes-priced rather than per-page, which is the right model for the use case. A customer crawling a small page through Cloudflare's anti-bot layer should not pay the same as a customer crawling a large page through the same layer. The bytes-priced model handles this naturally and ends up cheaper than a Chrome-render call for the use cases where the customer needs the proxy routing but does not need JavaScript execution.

For the use cases where neither the direct path nor the standard proxy path resolves — destinations with the most sophisticated anti-bot detection that has fingerprinted the platform's standard infrastructure — the platform offers an opt-in escalation path through the most expensive proxy pool. This is the path of last resort and the team is transparent that it should not be a default; the cost is high enough that most customers should accept "this URL is not reachable from this crawler" rather than pay the escalation cost.

#Scheduled crawls, HMAC-signed webhooks, and the production substrate

For workloads that recur — content-monitoring against a partner publisher, scheduled AEO audits against client domains, periodic competitive scans — the platform exposes scheduled crawls with cron-syntax configuration. The customer authors the cron expression in the dashboard or through the API, points the schedule at a base URL or a crawl recipe, and the platform fires the recurring runs.

Webhook delivery is HMAC-SHA256-signed and retried with exponential backoff on customer-side failure. The signature scheme is the conventional one — the secret is set per-webhook, the signature is the SHA-256 HMAC of the request body, the verification step on the customer side is one library call in any standard language. The retry layer handles transient customer-side failures (a deploy that briefly returned 500, a brief network blip between cloud regions) without firing duplicate events when the customer-side recovers.

For the recurring crawls that only want to fire on real content change (rather than on every cron tick regardless of whether the destination changed), the built-in diff endpoint compares the new fetch against the previous stored version. The customer can configure the webhook to fire only when the diff exceeds a threshold — a five-percent content change, a structural-change in the link graph, a specific keyword appearing for the first time. The diff-only delivery saves the customer the downstream-pipeline cost of processing identical content over and over.

Dataset storage is permanent. Every crawled page's content (clean markdown plus the raw HTML) is stored long-term, paginable through the dataset endpoint, and re-pullable at zero credit cost. For teams running longitudinal analysis against content that has changed over time — measuring how a competitor's pricing page has evolved across quarters, archiving a documentation set for compliance, training an offline model against a corpus the team curated months ago — this matters meaningfully. The customer can come back two years later and pull the historical dataset without re-crawling.

The HTML archive is the under-discussed part of this. Many crawler vendors store the parsed markdown or the structured extraction but discard the raw HTML once the parse is complete. The platform stores both, which means the customer can re-parse historical pages with new logic — a new structured-data parser, a new article-cleanup heuristic, a different LLM-extraction prompt — without re-crawling. The cost asymmetry between "we have the bytes, re-parse them" and "we don't, we have to re-fetch" is large enough that the storage decision pays off across most production workloads.

#The pricing tiers in detail

The pricing surface has six published tiers plus an Enterprise contract. Every tier has the same per-credit rate of $0.001, the same eight specialty actors included, the same JavaScript rendering, the same anti-bot routing, the same scheduled crawls and webhooks, the same dataset storage. The tiers control the included credit pool, the rate-limit ceilings, and the operational ergonomics like dataset retention window and concurrent crawl capacity.

Free is $0 for 1,000 credits per month, no credit card required. This is the tier where most customers run their first evaluation — 1,000 pages is enough to scan a small site end-to-end, run an AEO audit across a representative content set, or build a working prototype against the customer's own workload.

Hobby is $5 for 5,000 credits per month. The tier for the indie hacker, the side project, the small agency running one or two client domains. The unit cost is the same $0.001; the larger included pool removes the friction of repeatedly upgrading from the free tier.

Starter is $10 for 10,000 credits. The tier for the small team building production workloads — a series-A SaaS RAG pipeline ingesting partner documentation, an early-stage AEO consultancy running monthly audits across a handful of clients.

Pro is $20 for 20,000 credits. Growth is $50 for 50,000. Scale is $100 for 100,000. Each step doubles or more-than-doubles the included pool at a corresponding step in the absolute price; the per-credit rate stays flat at $0.001 throughout.

Enterprise is a custom contract for higher-volume customers. The per-credit rate stays at $0.001 (or lower for very-high-volume contracts); the contract adds operational primitives the smaller tiers don't need (SCIM for user provisioning, SAML for SSO, dedicated regional deployment for compliance environments, an SLA with named availability commitments). The principle that the per-credit rate stays flat at the enterprise tier is the part that distinguishes the platform most clearly from the established alternatives whose enterprise pricing typically involves a premium per-call surcharge.

#How Crawlcrawl compares to the alternatives

The focused-commercial-crawler category has a clear set of names and it is worth being direct about how the platform sits against each.

Firecrawl is the closest peer and the vendor most prospective customers are evaluating the platform against. Firecrawl ships a competent scrape-and-crawl surface, has a strong developer reputation, and integrates well with the popular open-source AI frameworks. The platform extends past Firecrawl on three axes: the flat-credit pricing model, the eight specialty actors included on every tier, and the permanent dataset storage. The drop-in shape compatibility means the migration cost is two strings. For teams choosing between the two today, the platform is the alternative that compares directly on every dimension that matters to a production deployment.

ScrapingBee is the established mid-market scraping vendor. Reliable, well-documented, focused on the scrape endpoint specifically. The platform's extension over ScrapingBee is the actor surface (ScrapingBee does not ship the technical-SEO actors, the render-diff, or the structured-data extraction at the same depth) and the credit-per-page pricing rather than ScrapingBee's API-call-per-page pricing. For teams whose workload is purely single-URL scraping, ScrapingBee may be sufficient; for teams whose workload includes any of the actor surface, the platform consolidates the spend.

Apify ships a marketplace of community-built actors on top of its own crawler substrate. The marketplace model is a different shape of bet — Apify lets the community ship the workflows, the platform ships the workflows itself as first-party endpoints. For the teams that want a marketplace of community-built actors and the operational flexibility of running arbitrary code at the platform level, Apify is the right answer; for teams that want vendor-built actors with the consistency and SLA that a first-party endpoint provides, the platform is the alternative.

Bright Data is the heavyweight at the enterprise web-data tier. Large procurement footprint, residential proxy depth, broad workflow capability. The platform sits below Bright Data on raw volume capability and above it on developer ergonomics — Bright Data's enterprise contracts and procurement gravity are real, and the team is direct that Bright Data is the right answer for very-large enterprise workloads. For teams in the small-to-mid-market range, the platform consolidates more capability per dollar.

Spider, Crawlee, Cheerio-based open-source projects, and the build-your-own crawler path are the lower-cost alternatives. The platform's value over the open-source path is the integration work — the same engineering hours that would be spent wiring the JS rendering, the anti-bot routing, the actor implementations, and the operational substrate are spent shipping product instead. For teams whose engineering capacity is bounded, the platform is the alternative; for teams whose preference is "build it ourselves," the open-source path is the alternative.

Across all of these, the question is rarely "is the platform cheapest per page." It is "for the workload my team is actually running, what is the total cost of ownership — including the engineering cost of integration, the operational cost of running multi-vendor stacks, and the unit cost of the bill itself — compared to a single platform that handles the full surface at flat-credit pricing." For most teams in the series-A-to-mid-market range, the answer points clearly at the platform.

#The team and the operational substrate

Crawlcrawl is built and operated by Ollasoftware, the AI software development company headquartered in Bengaluru that has shipped more than forty AI brands in production over the last four years. The platform is one of the largest backend products in the portfolio by traffic — handling billions of pages per month across customer workloads — and the operational substrate that powers it is shared with several of the other Ollasoftware products. Aeoniti's Web Authority surface runs on the same Common Crawl infrastructure the platform uses for its own deep crawl analysis. Ollagraph's web-intelligence platform inherits the same fetch substrate. The AI-bot policy resolution surface that appears on every scrape response is shared with the broader AEO portfolio.

The Rust engineering group inside Ollasoftware operates the platform alongside the broader infrastructure portfolio (OllaDNS, 24observe, Qcrawl). The shared operational substrate — async-Rust services, Postgres for the relational store, ClickHouse for the analytics-shaped queries that the actor surface depends on, Caddy for the public-facing edge — is the reason the team can ship the breadth the platform covers without an enterprise-vendor engineering headcount. The architectural patterns and the operational tooling carry across.

The parent group, Networkers Home, is the cybersecurity and networking training institute that has placed more than forty-five thousand alumni across eight hundred hiring partners since 2007. The platform inherits the parent group's disciplinary heritage in network-level engineering — the proxy routing, the anti-bot fingerprinting, the multi-region deployment — and the operational maturity that comes from a parent organisation that has been operating at scale for nearly two decades.

#What is on the roadmap

The team publishes the roadmap and the changelog at the brand site and updates them as work ships. The visible near-term threads are concrete: shipping the `/v1/extract` LLM-driven extraction surface that currently returns 501, expanding the actor catalogue to cover additional technical-SEO and AEO patterns that customer requests have surfaced, and deeper integration with the agent runtimes that increasingly drive the platform's usage profile.

Underneath those visible features is steady investment in the fetch substrate. The anti-bot routing layer is in continuous tuning against destinations whose detection improves; the JavaScript-rendering path is in continuous tuning against the frameworks that dominate modern hydration patterns; the proxy-pool selection logic is in continuous tuning against the cost-vs-success-rate trade-offs that determine the unit economics of the bytes-priced path.

On the dataset side, the team is investing in the analytics surface that operates over the permanent HTML archive. The current dataset endpoint exposes the page-level surface; the roadmap extends the analytics surface to enable cross-page queries (find every page in this corpus that mentions this term, find every page whose word count crossed a threshold this quarter) without the customer having to pull and re-parse the dataset themselves.

Pricing during the current phase is the published 6-tier model with the Enterprise contract above it. The flat-credit principle is non-negotiable in the roadmap; the team has been explicit that the per-credit rate will not increase, and that the included credit pools at each tier are likely to grow over time as the underlying infrastructure costs compound down.

#How to start

If you are building any product that needs to read URLs at scale — a RAG pipeline, an AEO audit workflow, a competitive-intelligence dashboard, a content-monitoring service, an LLM extraction pipeline — the right next move takes about five minutes. Sign up at crawlcrawl.com, claim the free 1,000 credits, copy the API key (it begins with `crk_`), and run a single-URL scan against a representative page in your workload.

The first scan response shows the clean markdown, the structured metadata, the AI-bot policy resolution, and the cost in credits. Most teams learn from the first scan whether the platform's output is the shape their downstream pipeline expects; for the small number of cases where the shape needs to change, the actor surface provides the deterministic alternatives.

For teams currently running on Firecrawl, the migration is the two-string change documented in the migration notes. The /v1/scrape, /v1/map, and /v1/crawl aliases return the same envelope shape your existing parser already handles. Change the base URL to `api.crawlcrawl.com`, swap the key prefix from `fc-` to `crk_`, redeploy. The dataset storage, the webhook delivery, the rate-limit handling all behave the way the team has been expecting.

For teams running production AEO workflows or technical-SEO audits at scale, the actor surface is the part of the platform that justifies the migration most clearly. The render-diff actor in particular — the one that surfaces the AI-bot-blindness percentage — has no equivalent in most of the established commercial crawlers; the visibility it produces into a brand's structural AEO posture is meaningfully different.

If you would like the team to walk you through a specific deployment — particularly the AEO consulting workflow that agencies run on top of the platform, the high-volume RAG ingestion patterns that platforms build against the multi-page crawl, or the enterprise contract for a large-scale workload — the Ollasoftware contact page reaches the engineers who built the platform directly.

#FAQs about Crawlcrawl

1. What is Crawlcrawl?

Crawlcrawl is a credit-based web-crawler API operated as a managed SaaS by Ollasoftware. Three integration paths behind one API: single-URL scrape (POST a URL, get clean markdown + structured signals), multi-page crawl (queue, webhook on completion), and eight specialty actors for technical-SEO and AEO workloads. Flat $0.001 per credit on every monthly plan. 1 credit = 1 chargeable page.

2. How does Crawlcrawl pricing work?

One unit of pricing: the credit. 1 credit = 1 chargeable page. The credit cost is flat $0.001 across every monthly plan — bigger plans never cost less per credit. Free 1,000 credits/mo no card. Hobby $5 / 5K. Starter $10 / 10K. Pro $20 / 20K. Growth $50 / 50K. Scale $100 / 100K. Enterprise contract above. Every feature on every tier; tiers control the credit pool, not the capability surface.

3. What are the 8 specialty actors?

On-page audit (~30 technical SEO rules per page), article extraction (clean body with author/date), broken-link check (with anti-bot retry for LinkedIn 999 and Cloudflare false positives), structured data (6 syntaxes parsed into one JSON), render-diff (static vs full render with ai_bot_blind_pct), internal-link graph (PageRank + orphans), sitemap audit (7-bucket health classification, dry_run is free), proxy-fetch (anti-bot across 190+ countries, bytes-priced).

4. Is Crawlcrawl compatible with Firecrawl code?

Yes — three drop-in alias endpoints (/v1/scrape, /v1/map, /v1/crawl) return the same envelope shape your existing Firecrawl parser already handles. Migration is two strings: change the base URL to api.crawlcrawl.com, swap the API key prefix. No client-side changes required. The /v1/extract LLM-driven extraction is on the roadmap but not shipped today; the structured-data actor is the alternative for deterministic schema.org extraction.

5. How does JavaScript rendering and anti-bot work?

Real Chrome render for SPAs and hydration-heavy pages, with auto-fallback from the cheap direct-fetch path to the Chrome-render path when the direct fetch comes back thin. Same per-page price regardless of which path resolves. Anti-bot routing across Cloudflare, Akamai, PerimeterX, Datadome, and CAPTCHAs in 190+ countries through four proxy pools. The proxy-fetch endpoint is bytes-priced — cheaper than a full Chrome render when you need proxy routing but not JavaScript execution.

6. How does dataset storage work?

Permanent. Every crawled page (clean markdown + raw HTML) is stored long-term, paginable through the dataset endpoint, and re-pullable at zero credit cost. The HTML archive lets you re-parse historical pages with new logic (new structured-data parser, new article-cleanup heuristic, different LLM-extraction prompt) without re-crawling. Re-pull historical datasets years later for compliance, longitudinal analysis, or offline training corpora.

7. How does Crawlcrawl compare to Firecrawl, ScrapingBee, Apify and Bright Data?

Firecrawl is the closest peer — Crawlcrawl extends past it with flat-credit pricing (no rate-tier games), 8 specialty actors included on every tier, and permanent dataset storage. Drop-in shape compatibility means the migration is two strings. ScrapingBee is focused single-URL scraping; Crawlcrawl consolidates the actor surface. Apify is a marketplace model; Crawlcrawl ships first-party endpoints with vendor-built consistency. Bright Data is enterprise-tier with residential proxy depth at higher cost; Crawlcrawl serves the focused small-to-mid-market customer more efficiently.

8. Who is behind Crawlcrawl?

Crawlcrawl is built and operated by Ollasoftware, the Bengaluru-headquartered AI software development company. The platform handles billions of pages per month across customer workloads. The operational substrate is shared with Aeoniti, Ollagraph, OllaDNS, 24observe and Qcrawl — async-Rust services, Postgres + ClickHouse for the analytics shape that the actor surface depends on. The parent group is Networkers Home, the cybersecurity and networking training institute founded in 2007 with 45,000+ alumni placed across 800+ hiring partners.