Technical companion

Methodology

This page documents how the simulation works, what is actually measured, and where the current boundaries of the system are. It is written for researchers, grant reviewers, and developers.

Simulation design overview

Abiogenics is a continuous evolutionary loop running on a Node.js / TypeScript worker process connected to a Supabase (Postgres) database and one or more LLM providers via API.

Two independent processes run simultaneously:

Evolution worker — drives generations, calls LLMs, manages memory, proposes tool actions, writes results to the database.
Action runner — polls the action queue, enforces policy, and executes approved tools inside isolated Docker containers. The two processes communicate only through the shared database.

Current batch: 120 runs × 40 cycles = 4,800 target successful cycles · Model: google/gemini-2.5-flash · Auto-approval disabled for sandboxed actions.

How a generation works

Each generation runs perception (Tier 3), metacognitive reflection (Tier 2+), stimulus selection, an LLM evolution call, response parsing, fitness scoring, memory writes, tool action processing, optional post-mortem analysis, genome persistence, and periodic capability assessment.

How fitness is scored

Fitness is stored as a float (0.0–1.0) in evolution_logs.fitness_score. It is a composite evaluated across coherence, adaptation quality, novelty, and depth.

Known limitation: fitness currently depends heavily on LLM interpretation of narrative quality. Independent evaluation and outcome-based scoring are on the roadmap.

The tier system

Tier	Label	Capabilities
1	Foundation	Basic genome evolution, external stimuli, system challenges
2	Emergence	Metacognition, The Hearth (social learning), self-generated challenges
3	Autonomy	Real-world perception feeds, research hypotheses, genome extension proposals, propagation detection

Experimental conditions

cold — blank genome, no prior context.
warm — researcher-guided stimuli and intervention points.
lineage— initialized from a LoRA adapter trained on a predecessor's history.

Limitations and caveats

Non-reproducibility across runs (sampling, temperature, world events).
Fitness subjectivity until independent evaluators land.
Some tools are still simulated (flavor text) while real sandboxes roll out.
Single-model batch today; multi-model comparison is planned.
Memory decay / consolidation documented but not fully implemented in DB.
One-person research operation — proof-of-concept scale, not a funded lab.

← The Premise Dashboard →