What It Does
Redwood Research is a Berkeley-based 501(c)(3) nonprofit AI safety organization whose primary mission is aligning superhuman AI systems. Founded by Buck Shlegeris and Ryan Greenblatt (now Chief Scientist), the organization focuses on three research threads: AI control (techniques for keeping AI systems safe even if they are misaligned), AI scheming detection (evaluating whether frontier models might strategically deceive evaluators), and empirical alignment research (interpretability, evaluation methodology).
Unlike METR, which focuses on capability assessment, Redwood’s work is more directly oriented toward solutions: developing control protocols that assume misalignment and ask whether we can still catch misbehavior in time. The organization also runs Constellation, a ~30,000 sq ft coworking space in Berkeley hosting staff from Open Philanthropy, ARC, Atlas Fellowship, and CEA, making it a physical node in the EA-aligned AI safety ecosystem.
Key Features
- AI Control research: formal threat models for misaligned AI, protocols for catching scheming behavior before it causes harm
- Triframe and related agent architectures for AI evaluation scaffolding
- “Making Deals with Early Schemers” and related work on AI strategic deception
- Pre-deployment evaluation consulting for frontier labs (collaborative with METR)
- Ryan Greenblatt’s publicly-tracked personal AI timeline estimates, cited widely in the AI safety community
- Constellation office space: hub for EA-adjacent AI safety organizations in Berkeley
- Public Substack/blog (blog.redwoodresearch.org) publishing research and opinion from staff
Use Cases
- AI safety policy: Governments and researchers using Redwood’s control protocol work to understand what procedural safeguards could reduce risk from advanced AI systems
- Evaluator methodology: Labs and evaluation bodies drawing on Redwood’s scheming detection research to design evaluations robust to strategic deception
- Research collaboration: AI safety-focused organizations sharing the Constellation space and collaborating on alignment work
- Timeline calibration: Technical Directors and strategy teams tracking Greenblatt’s published timeline estimates as one data point in scenario planning for AI capability timelines
Adoption Level Analysis
Small teams (<20 engineers): Not directly applicable as a product or toolset. Relevant as a source of public research on AI safety methodology and evaluation design. Blog posts and arXiv papers are accessible.
Medium orgs (20–200 engineers): Relevant if building AI agents and concerned about safety properties. Redwood’s control protocol research and scheming detection work can inform evaluation design. Not a vendor relationship.
Enterprise (200+ engineers): Primary audience for indirect influence. Frontier AI labs, government bodies, and large-scale AI deployers engage with Redwood’s research for safety policy development. The control protocol work is especially relevant for organizations deploying autonomous agents in high-stakes settings.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| METR | Focuses on capability benchmarking and pre-deployment evaluation, less on control protocols | You need formal third-party evaluation of whether a model has dangerous autonomous capabilities |
| Apollo Research | Specializes in AI scheming and deception evaluation specifically | You need targeted evaluation of strategic deception and goal-directed misalignment |
| UK AISI / DSIT | Government body with regulatory mandate; produces Inspect framework | You need government-backed evaluation framework or compliance evidence |
| ARC (Alignment Research Center) | Precursor organization; broader interpretability and elicitation work | You need mechanistic interpretability or elicitation research |
Evidence & Sources
- Redwood Research official site
- Critiques of prominent AI safety labs: Redwood Research (EA Forum)
- Ryan Greenblatt on AI Control, Timelines, and Slowing Down (80,000 Hours Podcast)
- Redwood Research Funding Analysis (Extruct AI)
- AIs can now often do massive easy-to-verify SWE tasks (Ryan Greenblatt, LessWrong)
Notes & Caveats
- Funding concentration: Redwood has received approximately $21M in total funding, with $20M from Open Philanthropy (OP). This creates a single-donor dependency risk. If OP reallocated priorities, Redwood would face significant financial pressure.
- Community selection bias: As a prominent institution in the EA/AI safety community, Redwood’s public communications (including staff blog posts and timeline estimates) may be subject to community norms that reward doom-adjacent updates. This is a soft but real publication bias to account for when consuming timeline estimates.
- Small team, outsized influence: The organization is small (~30–50 people) but its work is referenced by major AI labs and governments. This creates a bottleneck and makes the work potentially fragile to key-person departures.
- Not a product or service: Redwood does not sell tools, APIs, or SaaS. Catalog value is as a reference organization for understanding the AI safety research landscape and tracking influential AI timeline estimates.
- Relationship with Anthropic: Redwood’s founders and staff have close ties to Anthropic (Greenblatt previously worked there). This creates collaborative advantages but also potential epistemic proximity that could affect how Anthropic’s model capabilities are perceived and reported.