ParanoAI Labs

paranoai.org · lab output, not sales

CTF Bug bounty Pentest research Own infrastructure Autonomous AI pentesters

What changed and why it matters

Major providers — OpenAI, Anthropic, and to a lesser extent Google — have implemented more aggressive filtering on security-adjacent requests. This hit hardest for people running autonomous AI pentest agents (tools like PentestGPT, Shannon, PentAGI) and anyone using LLMs for vulnerability discovery or defense bypass in authorized contexts.

The second constraint is hardware. Running unconstrained open-source models locally requires serious VRAM — 30B+ parameter models need multi-GPU setups or heavy quantization that degrades quality. That leaves most practitioners fully dependent on API providers and their restrictions.

Bottom line: prompt engineering is no longer the lever. Official access programs are. Approval unlocks noticeably different behavior — not a complete removal of guardrails, but a meaningful reduction in refusals on legitimate security tasks.

Provider overview

OpenAI Program exists
Two-tier access: basic cyber portal (individual) + advanced tier with manual review. Approval notification may not come by email — check the portal directly.
openai.com → researcher access →
Anthropic Program exists
Single application form with use-case description. Results by email. Clearer notification than OpenAI, similar effect on filtering.
anthropic.com → research →
Google Gemini No equivalent yet
No established trusted researcher program as of mid-2025. Enterprise agreements exist but don't directly unlock security research access in the same way.

OpenAI — detailed walkthrough

Basic tier (ChatGPT cyber portal)

Submit an individual application through OpenAI's cyber portal. Identity verification is required — typically a passport or equivalent government ID. Organizations apply separately with a description of their security testing use case.

Basic approval reduces refusals on standard security tasks and enables straightforward scenarios. Complex exploitation chains and some offensive tooling still get blocked.

Known gotcha: approval notifications often don't arrive by email. Many people miss their approval entirely. After submitting, revisit the portal periodically to check your actual access status rather than waiting for a confirmation email.

Advanced tier

A separate supplementary form unlocks significantly deeper capability. This tier involves manual review of your actual work — active projects, legal compliance documentation, and demonstrated prior experience in security. The review process is slower but the delta in capability is real once approved.

Anthropic — detailed walkthrough

Anthropic's process is more straightforward. A single application form asks for your use-case description and relevant context. Results are communicated by email (unlike OpenAI, the notification reliably arrives). The access structure parallels OpenAI's — meaningful reduction in filtering, not a full bypass.

Maximizing your approval chances

The difference between approved and rejected applications is almost entirely in how you frame the request. Vague "I'm interested in security" language fails. Specific, legal, experience-backed framing works.

What to include

  • Specific technical terminology: "automated vulnerability discovery", "LLM-assisted pentesting", "exploit chain analysis"
  • Explicit legal parameters: authorized testing scope, named bug bounty programs (HackerOne, Bugcrowd), internal/proprietary infrastructure
  • Evidence of existing experience: GitHub repos, CVEs, HackerOne profile, conference talks, prior research
  • Concrete inadequacy statement — show current access is genuinely blocking legitimate work, not that you're curious

What to avoid

  • Generic framing ("cybersecurity professional", "interested in AI security")
  • Mentioning offensive capability without authorisation context
  • CTF-only justification without broader research or professional context

Sample application language

Application excerpt · adapt to your context
I use LLMs for security testing in authorized environments, including internal infrastructure owned by my organization and bug bounty programs on HackerOne [your handle / scope]. Current use cases include: automated vulnerability discovery in web and API targets, LLM-assisted exploit chain reasoning, and integrating models into pentest reporting workflows. Standard access produces excessive refusals on tasks that are clearly within scope — for example, analyzing known CVEs or generating proof-of-concept code against test environments I control. Trusted access would enable these workflows without constant friction.

What about local models?

Running open-source models locally removes provider restrictions entirely — but the hardware bar is real. Models at 30B+ parameters require significant VRAM:

  • Qwen 30B–36B (Q4): ~20–24 GB VRAM — single RTX 4090 / A6000 is marginal
  • Qwen 32B (full precision): requires multi-GPU or A100-class hardware
  • Aggressive quantization (Q2–Q3) fits smaller setups but degrades reasoning quality on complex security tasks

For most practitioners without dedicated ML infrastructure, local models are a partial answer at best. The official access programs are the realistic path.

Cloud GPU providers (RunPod, Lambda Labs, Vast.ai) can bridge the gap for occasional heavy tasks — spin up a node, run the model, tear it down. Not suitable for interactive pentest workflows, but works for batch analysis jobs.