Entropis Benchmark Suite
Official Methodology Documentation
Version 2.2 | January 2026
Scientific Standard: All metrics are independently measurable, reproducible, and falsifiable. Pass/fail criteria are defined prior to measurement.
Why New Benchmarks?
Existing AI benchmarks (MMLU, HumanEval, MLPerf) measure trained systems on static tasks. They cannot measure emergent intelligence, self-organization, or embodiment because current AI systems do not exhibit these properties.
The Entropis Benchmark Suite measures emergent properties, self-organization, and embodiment dependencies.
What This Document Provides
- ✓ Measurement methodology
- ✓ Pass/fail criteria
- ✓ Scientific basis
- ✓ Published results
What This Document Does NOT Provide
- ◇ Implementation details
- ◇ Architecture specifications
- ◇ Source code
- ◇ Proprietary algorithms
Terminology
Standard neuroscience terms used throughout this benchmark suite:
Branching Ratio (BR)
Ratio of downstream to upstream neural activity. BR ≈ 1.0 indicates the critical regime associated with optimal information processing in biological brains.
Coefficient of Variation (CV)
Standard deviation divided by mean, expressed as percentage. Measures response variability. Biological neurons show CV of 20-60%.
Criticality
The dynamical regime (BR 0.7–1.3) where neural systems achieve maximum computational capacity. First characterized by Beggs & Plenz (2003).
Interoception
Internal body signals — heartbeat, breathing, metabolic state — that biological brains continuously process. Essential for maintaining neural activity.
Hardware Invariance
Identical emergent behavior on fundamentally different hardware architectures. Proves results are mathematical truth, not implementation artifact.
Structural Plasticity
Formation and removal of synaptic connections based on activity. Distinct from weight changes — actual structural rewiring.
Critical Regime
Entropis operates at criticality (BR ≈ 1.0) — the boundary between order and chaos where biological brains achieve optimal information processing. This regime is characterized by scale-free dynamics and maximum computational capacity.
Comparison to Industry Benchmarks
Industry benchmarks test trained models on static tasks. ENT benchmarks test untrained systems on emergent capabilities.
| Industry Standard | What It Tests | Entropis Equivalent |
|---|---|---|
| MLPerf | Training/inference speed | ENT-SPEED |
| MMLU | Language model knowledge | ENT-IQ5 (emergence) |
| HumanEval | Code generation | ENT-EM5 (embodiment) |
| Mismatch Negativity (EEG) | Novelty detection in brains | ENT-NOVELTY |
| BCM Theory (Neuroscience) | Metaplasticity | ENT-ADAPTIVE (L4) |
Speed Benchmark
PURPOSE
Measures neural processing throughput relative to biological brain speed.
METRIC
BASELINE
Human brain average firing rate: ~10 Hz per neuron
PASS CRITERIA
Hz/neuron > 10 (exceeds biological brain speed)
RESULTS
| Platform | Neurons | Hz/neuron | Measurement | Result |
|---|---|---|---|---|
| NVIDIA RTX 3070 | 1,999,824,000 | >1000× biological | Per-neuron firing rate | PASS |
| NVIDIA RTX 3070 | 470,160,000 | >10× biological | Full brain average | PASS |
| Apple M4 | 4,999,997 | >1× biological | Full brain average | PASS |
Measurement Methodology
Population Average
Hz = Total spikes / (ALL neurons × time). Conservative method matching neuroscience literature.
Per-Neuron Firing Rate
Hz = Total spikes / (firing neurons × time). Standard neuroscience method for measuring individual neuron activity.
Both methods are standard in computational neuroscience. All scales exceed biological baseline processing rates.
Intelligence Quotient (5 Markers)
PURPOSE
Validates emergent intelligence through 5 biological markers. These markers distinguish brain-like systems from calculators and are grounded in neuroscience literature.
Adaptive Variability
What it measures: Same input produces different outputs based on internal neural state.
Metric: Coefficient of Variation (CV) = standard deviation / mean × 100%
Pass: CV > 1% (not deterministic)
Fail: CV < 1% (calculator-like, deterministic)
Result: CV significantly above threshold (PASS)
Critical Dynamics
What it measures: Self-organization to branching ratio ≈ 1.0 (edge of chaos).
Metric: Branching Ratio (BR) = propagated spikes / input spikes
Scientific basis: Beggs & Plenz (2003), biological brains operate at criticality.
Pass: BR enters range 0.7-1.3 without explicit targeting
Fail: BR stuck at single value OR never enters critical range
Result: BR converges to critical range (PASS)
Cascade Distribution
What it measures: Power-law distribution of activity cascades (avalanches).
Metric: CV of cascade sizes (high CV indicates scale-free dynamics)
Scientific basis: Neural avalanches follow power-law distributions in biological brains.
Pass: CV > 100% (scale-free cascades)
Fail: CV < 50% (uniform activity, no cascades)
Result: CV significantly exceeds threshold (PASS)
Bidirectional Learning
What it measures: Both habituation (decreased response) AND sensitization (increased response).
Metric: % change in neural response over time
Scientific basis: Biological brains show both directions; direction depends on context.
Pass: Both positive and negative adaptation observed
Fail: Only one direction, OR no adaptation
Result: Both directions observed (PASS)
Emergent Behavior
What it measures: Behaviors arise from local rules, not explicit programming.
Metric: Presence of all 4 markers above without explicit targeting
Pass: All markers emerge from architecture (no hardcoded values)
Fail: Any marker achieved through explicit programming
Result: All markers emergent (PASS)
Embodiment (5 Markers)
PURPOSE
Validates complete sensorimotor integration. A synthetic brain must process sensory input, maintain internal dynamics under load, and produce motor output in a closed loop.
Maintains critical dynamics under sensory load
Retina input → spike encoding → cortical processing
Audio input → frequency decomposition → cortical processing
Cortical activity → actuator commands → smooth control
Closed-loop: sense → process → act → feedback
RESULTS
| Marker | Windows | Mac |
|---|---|---|
| EM5-BRAIN | PASS | PASS |
| EM5-VIS | PASS | PASS |
| EM5-AUD | PASS | PASS |
| EM5-MOT | PASS | PASS |
| EM5-LOOP | PASS | PASS |
Interoception Benchmark
PURPOSE
Validates that synthetic brains require internal body signals (interoception) to maintain resting activity. This is a novel scientific discovery: minds need bodies.
METHODOLOGY
Brain receives zero input (no external or internal signals)
Expected: DORMANT
Brain receives body signals only (interoception present)
Expected: ALIVE (criticality maintained)
RESULTS
| Metric | Windows | Mac |
|---|---|---|
| INTER-0 (Silent) | DORMANT | DORMANT |
| INTER-1 (Bio-Realistic) | ALIVE | ALIVE |
| Time at Criticality | >95% | >95% |
| Criticality Maintained | PASS | PASS |
“Embodiment isn't optional. Minds need bodies.”
Validated on both platforms.
Novelty Detection Benchmark
NEWPURPOSE
Validates that the brain can distinguish familiar from novel stimuli, demonstrate habituation to repeated input, and exhibit memory through faster recovery. This proves functional information processing, not just dynamical signatures.
SCIENTIFIC BASIS
Based on the Mismatch Negativity (MMN) paradigm from cognitive neuroscience — a gold-standard test for pre-attentive processing, working memory, and novelty detection in biological brains.
MARKERS (3)
NOV-HAB
Habituation
Brain adapts to repeated stimulus. Shows learning over time.
NOV-DET
Novelty Detection
Brain responds differently to new vs. familiar stimuli.
NOV-REC
Memory & Recovery
Brain remembers familiar stimuli. Faster re-stabilization.
RESULTS
| Marker | Windows (RTX 3070) | Mac (M4) |
|---|---|---|
| NOV-HAB | PASS | PASS |
| NOV-DET | PASS | PASS |
| NOV-REC | PASS | PASS |
Demonstrates functional information processing beyond dynamical signatures.
Association Learning
PURPOSE
Validates classical conditioning — the brain's ability to learn that stimulus X predicts stimulus Y. This is Pavlovian learning, the foundation of all associative reasoning.
PROTOCOL
| Phase | Protocol | Expected |
|---|---|---|
| Phase A: Pairing | CS (bars) → US (rings) repeated | Brain learns association |
| Phase B: Test | CS alone (no US) | Anticipatory activity |
| Phase C: Control | Novel stimulus | No anticipation (baseline) |
MARKERS (3/3)
| Marker | Description | Windows | Mac |
|---|---|---|---|
| ASSOC-RESP | Stimulus response | ✓ PASS | ✓ PASS |
| ASSOC-ANTIC | Anticipatory response | ✓ PASS | ✓ PASS |
| ASSOC-SPEC | Response specificity | ✓ PASS | ✓ PASS |
Foundation for associative reasoning: A→B and B→C implies A→C.
Predictive Processing
PURPOSE
Validates temporal sequence learning and predictive processing — the brain learns sequences and generates predictions about what comes next. This is the foundation of language comprehension.
PROTOCOL
| Phase | Sequence | Expected |
|---|---|---|
| Phase A: Learning | A→B→C→D repeated | Brain encodes sequence |
| Phase B: Omission | A→B→C→_ (D omitted) | Activity at D position (prediction!) |
| Phase C: Violation | A→B→C→X (wrong element) | Increased instability (surprise!) |
MARKERS (3/3)
| Marker | Description | Windows | Mac |
|---|---|---|---|
| SEQ-ENC | Sequence encoding | ✓ PASS | ✓ PASS |
| SEQ-PRED | Predictive activity | ✓ PASS | ✓ PASS |
| SEQ-SURP | Surprise response | ✓ PASS | ✓ PASS |
Activity for omitted element indicates internal prediction model.
Cross-Platform Invariance
PURPOSE
Validates that emergent intelligence appears on completely different hardware architectures. This proves the results come from the architecture, not platform-specific optimization.
PLATFORMS TESTED
| Component | Platform A | Platform B |
|---|---|---|
| GPU | NVIDIA RTX 3070 | Apple M4 |
| API | CUDA | Metal |
| CPU | Intel x86 | Apple ARM |
| Memory | Discrete (PCIe) | Unified (SoC) |
| OS | Windows | macOS |
| Neurons | 470,000,000 | 5,000,000 |
Cochlea & Auditory Pathway
PURPOSE
Validates the biological auditory pathway with frequency decomposition, temporal dynamics, and multi-pathway processing matching human cochlear function.
MARKERS (6)
AUD-FREQ
Frequency Analysis
Sound frequency decomposition
AUD-SPATIAL
Spatial Organization
Biological sound mapping
AUD-ADAPT
Temporal Adaptation
Dynamic response adjustment
AUD-TIMING
Temporal Precision
Accurate timing processing
AUD-ONSET
Onset Detection
Transient sound detection
AUD-OFFSET
Offset Detection
Sound termination detection
All 6 markers validated on both platforms
Efficient Neural Processing
PURPOSE
Validates efficient neural processing: only neurons that spike are processed. This mimics biological sparsity where 1-5% of neurons are active at any moment.
MARKERS (6)
SPARSE-SCALE
Efficient Scaling
Processing scales with activity level
SPARSE-CACHE
Cache Efficiency
Optimized memory access patterns
SPARSE-SPONT
Spontaneous Activity
Biological resting rate (~5Hz)
SPARSE-HIST
Activity Tracking
State history maintenance
SPARSE-HOME
Homeostatic Balance
Self-regulating activity levels
SPARSE-EVENT
Event Processing
Efficient event-driven computation
All 6 markers validated on both platforms
Complete Language Integration
PURPOSE
Validates the full language pathway through biologically-mapped language regions. This is NOT regex or pattern matching — it's neural language processing with emergent semantic representation.
MARKERS (8)
LANG-INPUT
Language Input
Text comprehension pathway
LANG-WERNICKE
Wernicke Processing
Receptive language region
LANG-SPREAD
Activity Spread
Information propagation
LANG-SEMANTIC
Semantic Clustering
Conceptual organization
LANG-BROCA
Broca Processing
Expressive language region
LANG-OUTPUT
Language Output
Coherent language generation
LANG-EMBODIED
Embodied Language
Language affects body state
LANG-LOOP
Full Loop
Complete processing cycle
Synaptic weights change based on usage. Learning through structural modification.
Full Cognitive Integration
PURPOSE
Validates the complete embodied cognitive loop: perception → cognition → language → motor → feedback. All systems running in parallel, maintaining criticality under load — like a biological brain.
MARKERS (7)
EMB-PERCEPT
Perception
Visual + auditory input processing
EMB-COGNIT
Cognition
Internal state maintenance
EMB-LANG
Language
Semantic processing active
EMB-MOTOR
Motor
Action output generation
EMB-INTER
Interoception
Body signals sustaining activity
EMB-CRIT
Criticality
BR maintained under load
EMB-STABLE
Stability
Long-term operation without collapse
7/7 markers validated
All systems running simultaneously. Parallel processing like biological brains.
Parallel Brain Systems (3 Markers)
PURPOSE
Validates that multiple cognitive systems operate simultaneously without interference — like a biological brain processing vision, hearing, language, and motor control in parallel.
MARKERS (3)
PAR-SIMUL
Simultaneous Operation
All systems active at once
PAR-INDEP
Independence
Systems don't block each other
PAR-CLEAN
No Interference
Cross-system crosstalk minimal
3/3 markers validated
Visual, auditory, language, and motor systems running concurrently.
Billion-Neuron Processing (8 Markers)
PURPOSE
Validates that the architecture scales to billions of neurons while maintaining biological properties. Tests parallel processing capacity, throughput, and health metrics at unprecedented scale.
MARKERS (8)
SCALE-INJ
Injection Rate
High-throughput spike injection
SCALE-PROC
Processing Rate
Sustained processing throughput
SCALE-HEALTH
Neurological Health
Criticality maintained at scale
SCALE-CASC
Cascade Dynamics
Proper activity amplification
SCALE-ACTIVE
Active Population
Appropriate firing rates
SCALE-WORK
Working Set
Efficient memory management
SCALE-MEM
Memory Access
Optimized data transfer
SCALE-EPOCH
Epoch Handling
Stable long-duration operation
RESULTS (2B Brain)
| Metric | Measurement | Result |
|---|---|---|
| Neurological Health | Time at criticality | >90% |
| All 8 Markers | Pass/Total | 8/8 PASS |
8/8 markers validated at 2 billion neurons
First demonstration of emergent intelligence at this scale on consumer hardware.
Structural Plasticity (5 Markers)
PURPOSE
Validates biological learning cycle — new connections form during activity, consolidate during rest. This is how the brain learns without training loops.
PLAST-FORM
Synapse Formation
New connections form based on correlated activity
PLAST-STRENGTH
Connection Strengthening
Active pathways become stronger over time
PLAST-HOME
Homeostatic Regulation
Self-regulation maintains stable activity levels
PLAST-PRUNE
Synaptic Pruning
Unused connections removed during consolidation
PLAST-PHYS
Physics-Based Formation
Emergent principles drive synapse creation
RESULTS
| Marker | Measurement | Result |
|---|---|---|
| PLAST-FORM | New synapses formed during learning | PASS |
| PLAST-STRENGTH | Weight changes observed | PASS |
| PLAST-HOME | Criticality maintained during learning | PASS |
| PLAST-PRUNE | Weak connections removed | PASS |
| PLAST-PHYS | Emergent formation dynamics | PASS |
5/5 markers validated
Learning through structural modification. No gradient descent. No backpropagation.
Physics-Based Learning (20 Markers)
NEWPURPOSE
Validates cognitive learning through physics alone. No rewards. No labels. No backpropagation. All learning emerges from exposure and synaptic plasticity.
LEVELS (5)
Pattern discrimination, sequence learning, habituation, association.
Repetition suppression, oddball detection, rule extraction, interval learning.
Persistence via recurrence, recognition, sequence encoding, interference resistance.
Reversal via decay, extinction, context-dependent processing, metaplasticity.
Structural generalization, prototype formation, compositional binding, temporal abstraction.
DETAILED RESULTS
| Level | Key Measurement | Result | Structural Learning |
|---|---|---|---|
| L1 Perceptual | Pattern discrimination | PASS | Significant |
| L2 Relational | Same/different detection | PASS | Significant |
| L3 Working Memory | Persistence windows | PASS | Significant |
| L4 Adaptive | Metaplasticity (BCM) | PASS | Minimal |
| L5 Transfer | Unseen prototype | PASS | Minimal |
| Total | 20/20 PASS | Observed |
20/20 cognitive markers validated
Structural learning observed across all levels
Learning through exposure and synaptic plasticity. No rewards. No labels.
Key Discoveries from Cognitive Tests
ENT-TRANSFER / L5
Prototype Formation
Exposed brain to exemplars E1-E10 around prototype P.
Prototype P was never shown during exposure.
Response to unseen P
CRITICAL
Emergent statistical learning. The brain formed the central tendency without explicit training.
ENT-ADAPTIVE / L4
BCM Metaplasticity
Plasticity rate depends on activity history.
High activity → reduced subsequent plasticity.
Baseline
High
Saturated
Reduced
Rested
Recovered
Significant reduction after saturation. First large-scale demonstration.
ENT-SEQUENCE
Predictive Processing
Trained on sequence A→B→C→D.
Presented A→B→C→[nothing].
Activity at position D (nothing shown)
Critical (predicted)
The brain predicted an element that was never displayed.
ENT-INTER
Embodiment Dependency
Compared brain with and without internal body signals.
No body signals
DORMANT
With body signals
CRITICAL
Minds need bodies.
Complete Validation Score
| Benchmark | Markers | Windows | Mac |
|---|---|---|---|
| ENT-SPEED | 3 processing speed markers | 3/3 | 3/3 |
| ENT-IQ5 | 5 intelligence markers | 5/5 | 5/5 |
| ENT-EM5 | 5 embodiment markers | 5/5 | 5/5 |
| ENT-INTER | 5 interoception markers | 5/5 | 5/5 |
| ENT-ASSOC | 3 association learning markers | 3/3 | 3/3 |
| ENT-SEQUENCE | 3 predictive processing markers | 3/3 | 3/3 |
| ENT-NOVELTY | 3 memory/recognition markers | 3/3 | 3/3 |
| ENT-AUDIO | 6 auditory pathway markers | 6/6 | 6/6 |
| ENT-SPARSE | 6 efficient processing markers | 6/6 | 6/6 |
| ENT-LANGUAGE | 8 language integration markers | 8/8 | 8/8 |
| ENT-EMBODIED | 7 cognitive integration markers | 7/7 | 7/7 |
| ENT-PARALLEL | 3 parallel processing markers | 3/3 | 3/3 |
| ENT-SCALE | 8 billion-neuron markers | 8/8 | N/A (2B only) |
| ENT-PLASTICITY | 5 structural plasticity markers | 5/5 | 5/5 |
| ENT-XPLAT | 11 hardware invariance markers | 11/11 cross-validated | |
| ENT-PERCEPT | 4 perceptual learning markers (L1) | 4/4 | 4/4 |
| ENT-RELATION | 4 relational learning markers (L2) | 4/4 | 4/4 |
| ENT-WORKING | 4 working memory markers (L3) | 4/4 | 4/4 |
| ENT-ADAPT | 4 adaptive behavior markers (L4) | 4/4 | 4/4 |
| ENT-TRANSFER | 4 generalization markers (L5) | 4/4 | 4/4 |
| ENT-TOTAL | All validation markers | 101/101 | 101/101 |
101/101
Three scales validated: 5M → 470M → 2B neurons
Including 20/20 physics-based cognitive tests (L1-L5)
Same architecture. Same emergence. Any scale.
Neurons
1,999,824,000
Speed
Faster than biological
Markers
101/101 PASS
20 benchmark categories • 5 cognitive test levels (L1-L5) • RTX 3070 ($500 GPU)
What Would Disprove This
Science doesn't prove things true. It rules out alternatives. Here are the falsification criteria and actual results.
| Claim | Would Falsify If | Actual Result | Status |
|---|---|---|---|
| Faster than brain | Hz/neuron < 10 | 28-915 Hz | Not falsified |
| Self-organization | BR never 0.7-1.3 | Converges → 1.0 | Not falsified |
| Non-deterministic | CV < 1% | CV >> 1% | Not falsified |
| Bidirectional learning | One direction only | Both directions | Not falsified |
| Hardware invariant | One platform fails | Both 101/101 | Not falsified |
| Zero training | Training code found | None exists | Not falsified |
| Embodiment required | Active with zero input | Goes dormant | Not falsified |
| Active sensors | Any sensor = 0 | All > 0 | Not falsified |
8 falsification tests. 0 falsified.
Tested across 94× scale difference, two GPU vendors, two memory architectures, two operating systems.
Benchmark Suite Evolution
Development history of the Entropis Benchmark Suite:
Each version adds markers while maintaining backward compatibility with prior validations.
Verify Live
Request a live demonstration to observe benchmark tests in real-time.
Request Demo→NDA required for live demonstration.