Beta-service disclaimer
Pengon is currently in private, invitation-only beta. We’re proud of what it does, but we’re also being honest about what that means while we’re still growing:
- No uptime SLA. We aim for the same edge-deployed reliability you’d expect from any Cloudflare-hosted service, but if Pengon ever becomes unreachable, your contact form keeps working — submissions fail open and reach you the same way they did before you installed Pengon.
- Classifier mistakes happen. AI spam detection is a probability game,
not a perfect science. We catch the vast majority of AI-written spam, but
the occasional real message will land in the Spam tab. Use
Recover (not spam)to put it back in your inbox; it takes one click. - The product is still evolving. Features may change, ship, or be removed based on what beta customers actually need. We’ll keep you in the loop by email before anything that affects you.
What we collect
When a visitor submits a form on your Squarespace site, our worker receives the same data Squarespace already forwards to your inbox: the visitor’s name, email, phone (if present), the message text, and the page they submitted from. We classify it and store it so you can view it in your dashboard.
We also store the minimum needed to run your account: your Clerk user ID, your sign-up email, and the domain(s) of the site(s) you’ve connected. No payment information — the beta is free.
How we use it
Your submission data is yours. We use it for exactly three things:
- Showing you your dashboard. Submissions you receive, their classification, the AI’s reasoning, the score.
- Sending recovery emails. When you click
Recover (not spam), we send the original message as an email to the address on your Pengon account — this is a separate email from us, not a message put back into your Squarespace form inbox. - Anonymised training of our spam classifier — opt-in, on Recoveries only. See the section below.
We never sell, share, or transfer your submission data to third parties. We don’t use it for advertising. We don’t train other people’s models on it.
The training corpus (the bit you specifically asked about)
Pengon’s spam detection runs on Cloudflare Workers AI using Meta’s Llama 4 Scout model. The model itself is frozen and doesn’t learn from your submissions on its own. What we do do is build an anonymised training corpus that we’ll eventually use to fine-tune a specialised version of the classifier — but only with very specific, consensual data.
When you click Recover (not spam) on a flagged message, that
— and only that — triggers a training-corpus entry. The corpus row contains:
-
The message text, with email addresses, URLs and phone-number patterns
automatically replaced with
[email],[url]and[phone]tokens before storage; - The AI’s original classification and reasoning;
- Your verdict (in this case: “clean”);
- A timestamp.
What the corpus does not contain: no name, no email, no phone number, no site domain, no user ID, no IP, no country, no link back to your account or anyone else’s. Once a row is in the corpus it’s genuinely de-linked — even we can’t tell which Pengon customer it came from.
Why we do this. A spam classifier is only as good as the examples it’s seen. The recoveries you give us are the highest-quality signal we have — cases where the AI got it wrong — and they’re what will eventually make Pengon catch more spam without misfiring on real clients. It’s the single biggest long-term investment we’re making in classifier quality.
Don’t want to contribute? Don’t click Recover. Use the Delete button on the row instead — the submission is removed from your dashboard with no training-corpus entry created. You can also delete submissions in bulk via the checkboxes.
How long we keep things
- Submissions in your dashboard: stored in our database until you delete them. Bulk deletion is one click away.
- Account info: kept while your account is active. On request we delete everything.
- Training-corpus rows (anonymised): retained indefinitely once anonymised. Because they no longer reference any individual, we treat them as fully de-identified data per the EU GDPR § Recital 26.
Third-party services we use
Pengon runs on Cloudflare (Workers, D1, Workers AI, Pages). Auth is handled by Clerk. Transactional emails (recovery notices) are sent by Resend. Form classification runs on Cloudflare Workers AI using Meta’s Llama 4 Scout. Each of these processes data on our behalf under their own data-processing terms; we don’t share data with anyone else.
Your rights
You can ask us at any time to:
- Export everything we have on you (submissions, account info);
- Delete your account and all associated submissions;
- Correct anything we’ve recorded inaccurately.
Just email hello@pengon.dev. We answer same-day during the work week. We can’t un-mix training-corpus rows from the corpus — they’re anonymised and don’t reference you anymore — but everything else we’ll delete on request.
Plain-language disclaimer
This page is a friendly explanation of what we actually do, not a legal contract. It doesn’t replace the formal Terms of Service we’ll publish before opening public signup. If something here is unclear or you’d like a specific question answered, please reach out.
Contact
Pengon is built by Quad Studio, Zürich. Reach us at hello@pengon.dev for product questions or info@quadstudio.ch for anything else.