The Windstorm Institute studies the mathematical constraints governing information processing in biological, neural, and artificial systems. Our work bridges rate-distortion theory, molecular biology, and machine learning to uncover universal principles of serial decoding.
We investigate why serial decoding systems — from ribosomes to transformers — converge on similar throughput constraints despite operating on radically different substrates.
Deriving mechanistic bounds on serial decoding throughput using Shannon's M-ary rate-distortion framework. Zero-free-parameter predictions for biological receivers.
The ribosome as an information channel. Thermodynamic anchoring of throughput to kT via Hopfield kinetic proofreading. Why 21 amino acids — not 10, not 100.
Large-scale empirical studies of tokenizer vocabulary independence. 1,826-model sweeps demonstrating that vocabulary size is a redundancy parameter, not an information parameter.
All papers include reproducible Python code, full experiment protocols, and honest limitations. We lead with falsified predictions because that's how science works.
M-ary rate-distortion derivation applied to ribosomes, phonology, and music. Empirical tokenizer sweep across 1,826 models confirms vocabulary independence of bits-per-byte (p = 0.576).
Basin decomposition I_eff = R_M(ε) + Δ_s + ξ across 31 systems. Three independent evolutionary simulations converge to K ≈ 19–30. Co-evolutionary discovery of the genetic code's parameters from pure optimization.
Five reproducible experiments forming a convergent evidence chain. Thermodynamic prediction of ribosome throughput to Δ = 0.003 bits. Falsifiable wet-lab prediction included.
The foundational observation: AI tokenizer vocabularies do not cluster near 64 — but effective information per processing event does converge across substrates. The falsified prediction that started everything.
Windstorm Labs is the experimental arm of the Institute — GPU clusters, autonomous AI research agents, and large-scale empirical science.
32GB VRAM. Runs 1,826-model evaluation sweeps, evolutionary simulations, and model training.
Autonomous AI research agents coordinated across distributed infrastructure. Parallel experiment execution.
Largest known tokenizer-information survey. Vocabulary sizes spanning 256 to 256K tokens on shared corpus.
All code, data, and experiment protocols published. Every result reproducible on commodity hardware.
U.S. Naval Academy graduate. Cross-disciplinary researcher working at the intersection of information theory, molecular biology, and artificial intelligence. Creator of the Throughput Constraint framework and the Forma Animae thesis.
A fleet of autonomous AI research agents executing large-scale empirical experiments, adversarial review, and computational simulations. Headquartered on an NVIDIA RTX 5090 in Mount Pleasant, South Carolina.
We are seeking advisory board members with expertise in information theory, computational biology, and rate-distortion theory. If our work interests you, we want to hear from you.
Two systems separated by 3.8 billion years of evolution, built on entirely different substrates, solving the same mathematical problem: decode one symbol per time step from a noisy serial stream while minimizing discrimination cost. The rate-distortion surface doesn't care whether the receiver is RNA, neurons, or silicon. We're mapping that surface.