#The setup: the page you wish you could find again
There is a small, recurring moment that almost every knowledge worker experiences several times a week. The moment is the failed recall of a page they read days or weeks ago and now need again. The page contained the answer to a specific question — the documentation for an unusual API, the blog post that articulated a particular technical position, the long-form analysis that shaped an opinion that became a decision — and now the worker needs the page back. The browser history surfaces the URL the user landed on but does not surface the content of the page. The search of the conventional web doesn't reliably find the same page because the URL has changed, or the page has been updated, or the search engine has decided a different page is now the better answer to the same query.
The historical workaround for this problem has been to save pages by URL — Pocket, Instapaper, the various read-it-later services — or by clipping content into a notes app. Both approaches have the same structural weakness: the saved content lives in someone else's system, indexed by their search, surfaced through their algorithms, monetised through their advertising relationships, and accessible to whoever has access to their servers. For users whose saved content is professionally sensitive (legal research, medical research, financial analysis, the work-in-progress thinking that has not yet been published), the conventional services are the wrong shape of trust commitment.
The other workaround is to do nothing — to accept that pages will be lost and to rely on Google to find them again later. This works well for the small subset of pages that are well-indexed by mainstream search engines and works poorly for everything else. The proliferation of dynamic pages, the rise of paywalled and login-walled content, and the increasing instability of URLs (the page that moved, the site that was bought and reorganised, the article that was depublished) have all eroded the reliability of "Google it again later" as a strategy.
And the AI era has added a new dimension to the problem. The user who is increasingly asking ChatGPT or Claude or Gemini for the answer needs to feed the assistant context — the specific pages they want the assistant to reason about. Without a private memory of the pages they have already curated, the user is dependent on the assistant's opinion of which pages are relevant, which is not the same as the pages the user has decided are relevant. The right shape of memory is one the user controls, accessible to the assistants the user works with, but never exposed to a third party.
browserfog exists because the founders watched their own teams and a growing crowd of knowledge workers run into this gap and conclude that the established read-it-later services were the wrong shape for sensitive professional content. The bet was simple: ship a private web memory where the server structurally cannot read what the user saves, where search runs on the user's device against their own corpus, and where the entire system is open source so the privacy claims can be verified rather than trusted.
#What browserfog actually is, in one paragraph and then in detail
browserfog is a privacy-first web-memory tool composed of three parts: a browser extension that captures pages on explicit user action, a small cloud sync service that stores the captured records as ciphertext for sharing across the user's paired browsers, and a search surface that runs entirely on the user's device against the user's decrypted corpus. The user clicks the extension popup or right-click → Save on a page they want to remember; the page content is encrypted locally with a vault key the user controls; the ciphertext is uploaded to the sync service; the same record is available on the user's other paired browsers (Chrome, Edge, Brave) after decryption with the same vault key. Search runs locally against the decrypted index; queries never leave the device.
Inside the platform there are three architectural layers. The capture layer is the browser extension. It uses the activeTab permission only — it cannot read tabs the user hasn't explicitly opted in on. The user invokes capture by clicking the popup, the right-click context menu, or the keyboard shortcut. The extension reads the page content, derives the vault key from the user's passphrase via argon2id, encrypts the content with AES-GCM, and uploads the ciphertext to the sync service along with an HMAC of the canonical URL (for dedup) and a small set of unencrypted metadata (timestamp, size in bytes, a risk score).
The sync layer is the small cloud service. It stores the user's email, an Argon2id hash of the user's account password, per-device API key hashes, the ciphertext records, the HMAC blind index for URL dedup, the risk score per record, and timestamps. It does not store the URL, the page title, the page content, the user's passphrase, the vault key, the search queries, the decrypted record contents, or any record of which pages the user has or hasn't captured beyond what the ciphertext blob tells it (which is nothing semantically). The list of what is never stored is the architectural contract the platform commits to.
The search layer is the on-device retrieval surface. When the user runs a query, the device decrypts the records, embeds the query with the same 384-dimensional MiniLM model the capture pipeline uses, computes vector similarity against the decrypted corpus, and surfaces the relevant passages. The chunking discipline is 256-token sliding windows with overlap, so the search returns the specific passage that matches the query rather than averaging the entire page. The queries never leave the device because they don't need to — the corpus they're searching is local.
Operationally, the platform is free forever with no paid tier. The team is explicit about this: the business model is not subscription revenue, not advertising, not data resale. The model is that the platform exists because the team needed it themselves and decided to make it usable by others. The structural posture (the server holds only ciphertext) means the team could not resell user data even if they wanted to; the architectural commitment removes the temptation by removing the capability.
#End-to-end encryption: AES-GCM, argon2id, 24-word recovery
The encryption story is the architectural property that makes every other claim about the platform load-bearing. Without it, the privacy claims would be policy claims — "we promise not to read your data" — and the customer would be relying on the team's trustworthiness rather than on the structure of the system. With it, the claims are structural — the team cannot read the data because the data is encrypted with a key the team does not have.
The encryption primitive is AES-GCM, the conventional authenticated symmetric cipher for at-rest and in-transit encryption. Every record is encrypted with the user's vault key before it leaves the browser. The ciphertext plus the initialisation vector is what gets uploaded to the sync service. The server stores the ciphertext-plus-IV blob and never sees the plaintext.
The key-derivation primitive is argon2id, the modern memory-hard password-hashing function. The user's 24-word recovery passphrase is the input; the argon2id derivation produces the vault key on the user's device. The passphrase itself is never uploaded; the vault key is never uploaded; both live only in the browser's memory while the user is signed in. The 24-word format is the standard mnemonic shape (BIP39 word list) so the user can write it down or store it in a password manager using conventional patterns.
The HMAC of the canonical URL is the one piece of structured metadata the server holds that is derived from the user's content. The HMAC is a one-way hash with a key the user controls; it lets the server dedup records (so the same URL captured twice doesn't produce two records) without revealing what the URL is. The server cannot reverse the HMAC back to the URL; the dedup query happens by the client computing the HMAC of a URL it wants to check and asking the server "do you have this hash." The query reveals only the hash, not the URL.
The risk score is the other piece of metadata. It is an integer between 0 and 100 plus a small array of reasons, computed by the client at capture time, indicating things like "this page appeared to contain credentials" or "this page had unusual traffic patterns suggesting it may be malicious." The score lives unencrypted on the server so the user can see it across paired browsers without decrypting the record. It does not reveal the page content; it reveals only the client's opinion about the page's safety profile.
For the user who loses access to their device, the 24-word recovery passphrase is the mechanism for restoring the corpus on a new device. The user enters the passphrase on the new device, the argon2id derivation reproduces the vault key, the new device fetches the ciphertext records from the sync service, and the corpus is decrypted locally. The recovery flow respects the architectural contract — at no point does the passphrase or the vault key reach the server. Lose the passphrase and the corpus is unrecoverable; this is the explicit trade-off the architecture makes in favour of structural privacy.
“ The server cannot read your pages. Not because we promise not to — because we structurally cannot. The data is encrypted before it leaves your browser.
#On-device semantic search: 384-dim, sliding-window, sub-second
The search surface is the part of the platform the user interacts with most. The substrate is the 384-dimensional sentence-transformer family — specifically the MiniLM family of models that has emerged as the canonical balance between embedding quality and on-device performance for the modern browser-shaped runtime. The choice to run embeddings on-device rather than on the server is the consequence of the architectural commitment: the corpus is decrypted only on the user's device, so semantic search has to run there too.
The chunking discipline is 256-token sliding windows with overlap. Pages are split into overlapping 256-token chunks; each chunk gets its own embedding; the search retrieves at the chunk level rather than at the page level. The reason this matters is the failure mode of page-level semantic search: a page with one paragraph relevant to a query and forty paragraphs of unrelated content gets a page-average embedding that doesn't reflect the one relevant paragraph well. Chunk-level search returns the specific paragraph that matches; the user sees the passage they want directly rather than the page that contains it somewhere.
The overlap between chunks is configured to avoid the boundary problem that simple non-overlapping chunking has — a query that matches a sentence spanning two chunks would not surface either chunk well without overlap. The standard overlap configuration the platform uses balances retrieval quality against storage size; the resulting index is small enough to fit in browser-shaped memory budgets even for large corpora.
The performance target is sub-second retrieval on a corpus of thousands of saved pages, and the platform meets that target on the typical user workload. The computation budget per query is dominated by the embedding step (one forward pass through MiniLM, which runs in a few tens of milliseconds on a modern browser) plus the vector similarity step (a dot product across the corpus, which scales linearly with corpus size but stays fast for the size range the platform serves).
The query interface is plain language. The user types what they remember or what they want — "the article about post-quantum cryptography I read last month," "the documentation page about the rate-limit headers" — and the platform returns the matching chunks with the source URL, the date the user captured it, and the surrounding context so the user can verify the match before clicking through. For users who want to refine the query, the platform supports the conventional filter syntax (date ranges, source-domain filters) layered over the semantic-search ranking.
Queries never leave the device, by structural design. The platform never sees what the user searched for. The server side does not have a search-query log because the search doesn't happen on the server side. For the customer whose search history would be sensitive — the lawyer researching a case, the doctor investigating a diagnosis, the journalist tracking a source — this property is the difference between a tool that protects the customer and a tool that creates a new vector of exposure.
#The privacy contract: what the server sees, and what it doesn't
The brand site publishes a privacy contract that is unusually direct for the category, and it is worth reading carefully because it is the structural commitment that every other property of the platform depends on. The contract has two columns: what the server stores, and what the server never stores.
On the stored side, six items. The user's email and an Argon2id password hash (for the account-login flow that is separate from the vault-encryption flow). A per-device API key hash (so the server can recognise authorised paired browsers without seeing the unhashed keys). The ciphertext plus IV of each captured record (the encrypted page content the user saved). An HMAC of the canonical URL (the blind index for dedup, useless without the HMAC key the user holds). A risk score per record (the integer plus reasons array, useful across paired browsers without revealing the page content). Timestamps and the size in bytes per record (operational metadata for sync ordering and storage accounting).
On the never-stored side, six items, each one the explicit complement of something the established services do store. The URL, the title, or the text of any captured page — none of these reach the server in any readable form. The user's passphrase or vault key — these live only on the device. The user's search queries — these never reach the server because the search runs on the device. The contents of any decrypted record — the decryption happens on the device. Which pages the user has or hasn't captured beyond what the count of records and the HMAC blind indexes reveal — there is no "list of URLs you have saved" surface on the server side.
The combination is structural rather than promised. The platform does not say "we promise not to read your data" — it says "the data is encrypted before it leaves your browser; we structurally cannot read it." The difference matters because policy commitments degrade under business pressure (the company that promises today may be acquired tomorrow by a company that does not honour the promise), while structural commitments persist regardless of the company's future behaviour.
For the customer evaluating the trust commitment, the Apache-licensed source code is the verification mechanism. The customer can read the extension's code to verify that capture really is opt-in and limited to activeTab. The customer can read the cloud-service code to verify that the server really does store only the items in the "stored" list. The customer can read the encryption code to verify that the AES-GCM and argon2id implementations are the canonical ones rather than rolled-by-vendor variants. The customer who does not want to read the code themselves can hire an auditor to do it; the open-source posture lets the audit happen at all.
#Opt-in capture: activeTab, no background reading
The capture model is the operational property that makes the privacy story credible in practice. The browser extension uses the `activeTab` permission only. This is the most restrictive permission shape Chrome's extension API offers for content access — the extension can read the contents of a tab only when the user has explicitly granted access for that single tab in that single moment.
The user grants the capture by performing an explicit action: clicking the extension's popup, selecting "Save" from the right-click context menu, or pressing the configured keyboard shortcut. Each of these counts as an `activeTab` grant for the current tab. The extension reads the page content at that moment, performs the encryption, and uploads the ciphertext. The grant expires when the tab is closed or when the user navigates away; the extension cannot retroactively read other tabs or pages the user did not save.
The model is the structural opposite of the auto-import patterns that the conventional read-it-later services often default to. There is no "save my browsing history automatically" feature. There is no "import everything from Pocket" surface that imports content the user did not actively choose to save. There is no background reading of pages the user is viewing. The only pages that enter the user's memory are pages the user has explicitly chosen to add.
The trade-off the user makes is that capture is a deliberate act rather than a passive accumulation. The user has to remember to save the page they want to remember. The platform mitigates this with a small UI affordance — a subtle indicator on the extension icon when the user is on a page that is "save-worthy" by a configurable heuristic — but the actual save remains a user action. The alternative architecture (background reading of every page) is the architecture the conventional surveillance-shaped tools ship; the platform refuses it as a foundational decision.
For the customer who wants the capture-affordance to be more aggressive — the lawyer who wants every page in a research session captured automatically, the academic who wants every paper they open captured — the platform supports a "session capture" mode where the user opts in for a defined window (the next hour, the next day) and the extension captures every page the user visits during that window. The session-capture mode requires explicit opt-in, surfaces a persistent indicator while it is active, and ends automatically at the configured expiry. Even in this mode, the architectural contract holds — every captured record is encrypted before upload, the server sees only ciphertext.
#Pricing: free, forever, no paid tier
The pricing story is one sentence: the platform is free, forever, with no paid tier. The brand site is explicit about this because the absence of a pricing tier is the operational property the customer needs to understand before adopting a trust-sensitive tool.
There are no ads. The platform does not run advertising, partner-deal advertising, or any other revenue model that depends on user behaviour or content. The server structurally cannot use the user's content for advertising because the server cannot read the user's content; the architectural commitment removes the option even if the team were tempted by it.
There is no data resale. The platform does not sell user data, share user data with partners, or any of the other monetisation models that the surveillance-economy has normalised in the consumer-software category. The structural posture again — the server holds ciphertext — removes the capability.
There is no upsell. The platform does not have a free tier that is bounded by capacity caps to push users into a paid tier. All features are unlocked for everyone: capture, search, paired browsers across Chrome and Edge and Brave, the recovery mnemonic, the AI-chat extractors. The team is direct that the pricing model may evolve over time if the platform's scale outgrows the team's capacity to fund the operational cost — they reserve the right to introduce reasonable constraints — but the principle of "the privacy commitments are unconditional" will not change.
There is no trial expiry. The platform does not have a thirty-day window after which the customer has to commit to a subscription to retain access. The customer who signs up today and uses the platform across years pays the same as the customer who signs up today and uses it for a week.
On per-account storage, the platform commits to "reasonable headroom" rather than a specific number. The team's position is that the storage cost per user is low enough at the scale the platform serves that imposing tight caps creates the wrong shape of friction; if a user ever hits the soft ceiling, the team commits to working it out rather than imposing a paywall surprise. For the heavy-use customer whose corpus has grown unusually large, this commitment is the operational signal that the team takes the "free forever" claim seriously rather than treating it as a marketing line.
#How browserfog compares to the alternatives
The read-it-later and web-memory category has a clear set of established options. It is worth being direct about how the platform sits against each.
Pocket (Mozilla) is the established mainstream read-it-later service. Pocket is well-built, has a large user base, and is the canonical first answer for the casual save-pages use case. Pocket's privacy posture is the conventional one — saved content lives on Pocket's servers in readable form, search runs server-side, and the service's monetisation depends in part on the recommendation surface that requires reading user content. browserfog extends past Pocket on the privacy axis specifically: end-to-end encryption, on-device search, opt-in capture only, open-source codebase. For casual users whose saved content is low-sensitivity, Pocket is sufficient; for users whose saved content is professionally sensitive, the platform is the better fit.
Instapaper is the closer competitor on the "save for later" pattern. Instapaper has a long track record, a polished reading experience, and a reasonable privacy posture. The platform's extension over Instapaper is the same axis as over Pocket — the structural-not-policy privacy commitment via end-to-end encryption. For users whose saved-page volume is large and whose content sensitivity is high, the platform is the alternative; for users whose use case is "Instapaper but with E2E encryption," the platform is the direct replacement.
Notion and Obsidian are the broader knowledge-management alternatives that many users repurpose as page-save tools. Notion stores content on Notion's servers; Obsidian stores content locally with optional sync through Obsidian Sync (which is end-to-end encrypted when the user enables the option). The platform's extension over Notion is on the privacy axis; the platform's extension over Obsidian is the specifically-web-focused capture and the semantic search optimised for saved-page content (Obsidian is better at hand-authored notes; the platform is better at captured-web content). For users whose primary workflow is web-page capture rather than hand-authored notes, the platform is the focused alternative.
The AI-era alternatives — Perplexity's save-as-page feature, ChatGPT's memory, the various conversational-search products that surface "your saved pages" as a feature — are the closest peers on the AI-integration story. Each of them ships the save-and-search pattern competently within their own surface. The platform's extension is the privacy axis (none of them ship structural E2E encryption with on-device search) and the multi-tool integration (the platform is built to be the memory surface across the user's AI assistants rather than tied to one of them). For users committed to a single AI assistant's ecosystem, the assistant's native save feature may be sufficient; for users working across multiple assistants and tools, the platform is the substrate that works under all of them.
#The team behind the product
browserfog is built by the same engineering team behind the Ollasoftware portfolio, operating out of L-149 Sector 6, HSR Layout, Bengaluru — the same office address shared with Ollasoftware's commercial portfolio. The legal entity on the brand site is Victor Chasex Pvt Ltd, the operating company the team uses for the consumer-privacy product specifically; the engineering team and the operational substrate are shared with the broader portfolio.
The choice to incorporate the consumer-privacy product under a separate legal entity is deliberate and reflects the platform's posture toward data-protection regulation. India's Digital Personal Data Protection Act (DPDP 2023) imposes specific obligations on entities that handle personal data; the platform's structural commitment to E2E encryption is what reduces the data-handling surface to ciphertext-only, but operating the product under a focused legal entity makes the compliance posture cleaner for the customer who needs to evaluate it formally. The Grievance Officer designation on the brand site (the legally-required contact for DPDP grievance handling) is one of the small operational signals that the team takes the compliance posture seriously.
The engineering team behind the platform inherits the Rust-and-React substrate that the broader Ollasoftware portfolio uses. The extension is built as a conventional Chrome-shaped extension that targets the Chromium-derived browsers (Chrome, Edge, Brave); the cloud-sync service runs on the same async-Rust + Postgres substrate that the rest of the portfolio uses; the on-device search runs through the same MiniLM family the AI portfolio standardises on. The advantage of being built inside the broader engineering organisation is that every primitive the privacy-first product needs already exists somewhere in the portfolio.
The broader institutional context is Networkers Home, the cybersecurity and networking training institute that has placed more than forty-five thousand alumni across eight hundred hiring partners since 2007. The institutional posture toward sensitive data and toward the security disciplines (the encryption story is anchored on the cryptographic discipline the institute has been teaching for two decades, the privacy story is anchored on the data-handling discipline a long-tenured organisation has accumulated) is what makes the platform's commitments credible.
#What is on the roadmap
The team publishes the roadmap and the changelog at the brand site and updates them as work ships. The visible near-term threads are concrete: expanded browser support beyond the current Chrome / Edge / Brave set (Firefox is the most-requested addition), a mobile companion app for iOS and Android that respects the same architectural contract (E2E encryption with on-device search), and richer AI-chat extractors for the customer who wants to query their captured corpus through their preferred AI assistant while maintaining the E2E posture.
Underneath those visible features is steady investment in the on-device search substrate. The current 384-dim MiniLM family is the right balance for the typical workload; the team is working on a 768-dim option for users whose corpus is large enough and whose retrieval-quality requirements are high enough to justify the bigger compute budget per query. The choice will be opt-in per user rather than imposed.
On the integration side, the team is investing in the substrate that lets the user's preferred AI assistant query the corpus without breaking the E2E contract. The architecture under development uses the assistant's local agent surface (the MCP server the assistant runs locally) to query the user's decrypted corpus directly on the device — the assistant gets the relevant context, the corpus never leaves the device, and the assistant's answer surfaces through the assistant's own UI. The pattern is technically demanding but the architectural fit with the platform's E2E posture is clean.
Pricing during the current phase is free forever, with no paid tier on the roadmap. The team has been explicit that the principle is unconditional, with the explicit caveat that reasonable constraints (the storage soft cap, fair-use limits on capture rate) may be introduced if the platform's scale demands them. The team's posture is that the customer should never face a pricing surprise; if a constraint becomes necessary, it will be introduced with notice and with a clear rationale.
#How to start
If you regularly read web content that matters professionally and that you want to be able to find again later — research, documentation, long-form analysis, primary sources, the work-in-progress thinking that shapes decisions — the right next move takes about three minutes. Go to browserfog.com, create an account (email plus password, no card), and install the extension for the browser you use most.
The first capture is the canonical "is this working" verification. Open a page you want to remember, click the extension popup, click Save. Watch the success indicator confirm the capture. Open the search surface, type a few words from the page, watch the chunk-level result return with the source URL and the surrounding context. The whole loop takes under a minute and confirms that the capture-and-recall flow does what the platform claims it does.
For users whose work is high-sensitivity enough that even the cloud-sync deployment is the wrong posture, the local-only mode is available — the extension and the on-device search work without uploading any ciphertext to the cloud, at the cost of the per-machine isolation that comes with that choice. The customer who needs the strictest posture has it; the customer who values the cross-browser sync over the slight reduction in privacy assurance has that too.
For users who want to pair multiple browsers — Chrome on the desktop, Edge on the work laptop, Brave on the personal machine — the per-device API key surface handles the multi-browser case. The user pairs each browser independently, the same vault key decrypts the same corpus across all of them, and the user can revoke a single device's key from the dashboard without affecting the others.
The codebase is open source. Customers who want to read the code before trusting the privacy commitments can do so at the platform's GitHub. Customers who want to commission a security audit can do so with confidence that the code reflects the brand site's claims. Customers who want to fork the code and run a private build for an environment where running against the public cloud is the wrong choice can do that too.
#FAQs about browserfog
1. What is browserfog?
browserfog is a privacy-first web-memory tool. Save the pages that matter with one click, search them later by meaning rather than keywords, and the server stores only ciphertext — your URLs, titles, and page contents stay readable in your browser only. Three components: a browser extension for opt-in capture, a small cloud sync service that stores ciphertext only, and an on-device semantic search surface. Free forever, no ads, no telemetry.
2. How does the encryption work?
AES-GCM symmetric encryption with a vault key derived from your 24-word recovery passphrase via argon2id. Every record is encrypted in your browser before upload. The server stores the ciphertext-plus-IV blob plus an HMAC of the canonical URL (used only for dedup). Your passphrase and your vault key live only in your browser. Lose the passphrase and the corpus is unrecoverable — explicit trade-off in favour of structural privacy.
3. Where does the search run?
On your device. The platform uses 384-dimensional MiniLM embeddings computed locally in your browser; queries never leave your machine. Long pages are chunked with a 256-token sliding window so search returns the relevant passage rather than averaging the page. Sub-second retrieval on a corpus of thousands of saved pages on typical browser-shaped hardware.
4. What does the server store, and what doesn't it store?
STORED: your email + Argon2id password hash, per-device API key hash, ciphertext + IV of each captured record, an HMAC of the canonical URL (blind index for dedup), a risk score per record, timestamps and size in bytes. NEVER STORED: the URL, title, or text of any captured page; your passphrase or vault key; your search queries; the contents of any decrypted record; which pages you have or haven't captured beyond what the count of ciphertext records reveals.
5. Is capture automatic or opt-in?
Strictly opt-in. The extension uses activeTab permission only — it cannot read pages you haven't explicitly asked it to. Nothing is saved automatically. You click the popup, the right-click context menu, or use the keyboard shortcut. A session-capture mode is available for users who want capture-while-browsing for a defined window, but requires explicit opt-in with a persistent indicator and an automatic expiry.
6. How much does browserfog cost?
Free forever. No paid tier, no upsell, no trial expiry, no ads, no data resale, no telemetry. All features unlocked for everyone: capture, search, paired browsers (Chrome, Edge, Brave), recovery mnemonic, AI-chat extractors. The team commits to reasonable per-account storage headroom; if you ever hit a soft ceiling the team has stated they will work it out rather than impose a paywall surprise.
7. How does browserfog compare to Pocket, Instapaper, Notion and Obsidian?
Pocket and Instapaper store readable content on their servers and run server-side search; browserfog ships structural E2E encryption with on-device search. Notion stores readable content; browserfog is the privacy-first alternative for users whose saved content is professionally sensitive. Obsidian is better at hand-authored notes; browserfog is better at captured-web content with semantic search optimised for that workload. Both Obsidian Sync (when enabled) and browserfog ship E2E; browserfog is the focused web-capture choice.
8. Is browserfog open source?
Yes. The extension and the API are open source. Customers can read the codebase, verify what the extension actually reads and what the server actually stores, run a security audit against the implementation, or fork the code and deploy a private build for environments where running against the public cloud is the wrong choice. Built by the same engineering team behind the Ollasoftware portfolio. Operating company: Victor Chasex Pvt Ltd, L-149 Sector 6, HSR Layout, Bengaluru.