Technical companion
Methodology
This page documents how the simulation works, what is actually measured, and where the current boundaries of the system are. It is written for researchers, grant reviewers, and developers.
Simulation design overview
Abiogenics is a continuous evolutionary loop running on a Node.js / TypeScript worker process connected to a Supabase (Postgres) database and one or more LLM providers via API.
Two independent processes run simultaneously:
- Evolution worker — drives generations, calls LLMs, manages memory, proposes tool actions, writes results to the database.
- Action runner — polls the action queue, enforces policy, and executes approved tools inside isolated Docker containers. The two processes communicate only through the shared database.
Current batch: 120 runs × 40 cycles = 4,800 target successful cycles · Model: google/gemini-2.5-flash · Auto-approval disabled for sandboxed actions.
How a generation works
Each generation runs perception (Tier 3), metacognitive reflection (Tier 2+), stimulus selection, an LLM evolution call, response parsing, fitness scoring, memory writes, tool action processing, optional post-mortem analysis, genome persistence, and periodic capability assessment.
How fitness is scored
Fitness is stored as a float (0.0–1.0) in evolution_logs.fitness_score. It is a composite evaluated across coherence, adaptation quality, novelty, and depth.
Known limitation: fitness currently depends heavily on LLM interpretation of narrative quality. Independent evaluation and outcome-based scoring are on the roadmap.
The tier system
| Tier | Label | Capabilities |
|---|---|---|
| 1 | Foundation | Basic genome evolution, external stimuli, system challenges |
| 2 | Emergence | Metacognition, The Hearth (social learning), self-generated challenges |
| 3 | Autonomy | Real-world perception feeds, research hypotheses, genome extension proposals, propagation detection |
Experimental conditions
- cold — blank genome, no prior context.
- warm — researcher-guided stimuli and intervention points.
- lineage— initialized from a LoRA adapter trained on a predecessor's history.
Limitations and caveats
- Non-reproducibility across runs (sampling, temperature, world events).
- Fitness subjectivity until independent evaluators land.
- Some tools are still simulated (flavor text) while real sandboxes roll out.
- Single-model batch today; multi-model comparison is planned.
- Memory decay / consolidation documented but not fully implemented in DB.
- One-person research operation — proof-of-concept scale, not a funded lab.