Abiogenicsresearch

Technical companion

Methodology

This page documents how the simulation works, what is actually measured, and where the current boundaries of the system are. It is written for researchers, grant reviewers, and developers.

Simulation design overview

Abiogenics is a continuous evolutionary loop running on a Node.js / TypeScript worker process connected to a Supabase (Postgres) database and one or more LLM providers via API.

Two independent processes run simultaneously:

  1. Evolution worker — drives generations, calls LLMs, manages memory, proposes tool actions, writes results to the database.
  2. Action runner — polls the action queue, enforces policy, and executes approved tools inside isolated Docker containers. The two processes communicate only through the shared database.

Current batch: 120 runs × 40 cycles = 4,800 target successful cycles · Model: google/gemini-2.5-flash · Auto-approval disabled for sandboxed actions.

How a generation works

Each generation runs perception (Tier 3), metacognitive reflection (Tier 2+), stimulus selection, an LLM evolution call, response parsing, fitness scoring, memory writes, tool action processing, optional post-mortem analysis, genome persistence, and periodic capability assessment.

How fitness is scored

Fitness is stored as a float (0.0–1.0) in evolution_logs.fitness_score. It is a composite evaluated across coherence, adaptation quality, novelty, and depth.

Known limitation: fitness currently depends heavily on LLM interpretation of narrative quality. Independent evaluation and outcome-based scoring are on the roadmap.

The tier system

TierLabelCapabilities
1FoundationBasic genome evolution, external stimuli, system challenges
2EmergenceMetacognition, The Hearth (social learning), self-generated challenges
3AutonomyReal-world perception feeds, research hypotheses, genome extension proposals, propagation detection

Experimental conditions

  • cold — blank genome, no prior context.
  • warm — researcher-guided stimuli and intervention points.
  • lineage— initialized from a LoRA adapter trained on a predecessor's history.

Limitations and caveats

  • Non-reproducibility across runs (sampling, temperature, world events).
  • Fitness subjectivity until independent evaluators land.
  • Some tools are still simulated (flavor text) while real sandboxes roll out.
  • Single-model batch today; multi-model comparison is planned.
  • Memory decay / consolidation documented but not fully implemented in DB.
  • One-person research operation — proof-of-concept scale, not a funded lab.