text · image · voice · data
Synesthesia Machine is a cross-modal translation engine. It takes any input — text, images, audio, or structured data — and translates it into visual and sonic output. The name comes from synesthesia, the neurological phenomenon where stimulating one sense triggers another (e.g., "hearing" colors or "seeing" sounds).
⊘ Zero AI · Zero Server · Pure Math
Every input passes through the same three-stage architecture:
The Sensation Vector is the universal bridge. Any input produces one; any renderer can consume one. This decoupled design means you can add new inputs or outputs without touching existing code.
| Dimension | What It Measures | Text | Image | Audio | Data |
|---|---|---|---|---|---|
| Energy | Overall intensity | Exclamation density + caps ratio | Mean pixel brightness | RMS amplitude | Normalized mean of numeric columns |
| Complexity | Information density | Vocabulary richness (unique / total words) | Edge density via neighbor-pixel differencing | Spectral flatness (geometric / arithmetic mean) | Column count × row count scaled |
| Rhythm | Pattern regularity | Inverse of sentence-length variance | Inverse of histogram entropy | Onset interval consistency | Row-to-row delta consistency |
| Temperature | Warm vs. cool / positive vs. negative | Sentiment polarity via lexicon lookup | Red channel vs. blue channel ratio | High-frequency vs. low-frequency band ratio | Trend direction (rising = warm) |
| Density | How packed vs. sparse | Words per sentence (normalized) | Non-white pixel ratio | Zero-crossing rate | Inverse of null/empty cell ratio |
| Chaos | Randomness vs. order | Punctuation irregularity | Shannon entropy of brightness histogram | Spectral flux + zero-crossing combined | Standard deviation / mean ratio |
| Flow | Smoothness vs. jaggedness | Word-length transition smoothness | Second-derivative of brightness (gradient smoothness) | Energy stability around midpoint | Autocorrelation of sequential values |
| Scale | Magnitude / scope | Document character count (normalized to 2000) | Total pixel count relative to 1080p | Mirrors energy (louder = bigger) | Row count normalized to 1000 |
Sentiment is computed using a hand-curated lexicon of ~140 words split into positive and negative sets. The formula:
temperature = 0.5 + (positiveCount − negativeCount) / wordCount × 5complexity = uniqueWords / totalWords × 1.2rhythm = 1 − (sentenceLengthVariance / 100)chaos = punctuationChars / totalChars × 10
This is deliberately not a machine learning approach. The constraints are the point — every computation is transparent, auditable, and runs in microseconds.
Images are downsampled to 200×200 pixels for speed. Edge detection uses neighbor-pixel differencing (a simplified Sobel approximation). Chaos is measured via Shannon entropy of the brightness histogram:
entropy = −Σ (pᵢ × log₂(pᵢ)) for each brightness bucketchaos = entropy / 8 (normalized, since max entropy for 256 bins ≈ 8)
Flow uses the second derivative of brightness — measuring how smoothly gradients change, not just whether they exist.
Live audio is processed via the Web Audio API's AnalyserNode with a 512-sample FFT. Key computations:
energy = √(Σ(sample²) / N) × 3 — RMS amplitudecomplexity = geometricMean(spectrum) / arithmeticMean(spectrum) — spectral flatnesstemperature = highBandEnergy / lowBandEnergy × 0.5 — frequency balance
Spectral flatness is a real audio analysis metric: a value near 1.0 means noise-like (all frequencies equal), near 0.0 means tonal (energy concentrated in harmonics). It maps beautifully to "complexity."
The engine auto-detects numeric columns (those where >50% of values parse as numbers). Rhythm comes from row-to-row delta consistency — if the differences between consecutive values are regular, rhythm is high (think: steady growth). Temperature is computed by comparing first-half mean vs. second-half mean — a rising trend reads as "warm."
The vector is lerped (linearly interpolated at 8% per frame) toward the target, creating smooth transitions. Each render mode maps dimensions differently:
Particles: Density → particle count (50–500). Energy → speed. Chaos → jitter displacement. Flow → connection lines between nearby particles. Rhythm → pulsing size modulation via sin(time × rhythmFreq).
Waveform: Complexity → layer count (3–8). Energy → wave amplitude. Rhythm → modulation frequency. Temperature → hue shift per layer. Each layer uses two superimposed sine functions at different frequencies for organic shape.
Grid: Complexity → grid resolution. Flow determines shape morphing — high flow = circles, low flow = squares. Chaos adds rotational and positional jitter.
Orbital: Complexity → orbit count (3–9). Bodies per orbit driven by density. Temperature → hue. Chaos → wobble displacement. Flow → orbital eccentricity (elliptical vs. circular).
Four oscillators (sine, triangle, sawtooth, square) are always running. The vector controls:
Temperature → base frequency (110 Hz cold → 550 Hz warm). Chaos → random detuning between oscillators. Complexity → how many harmonics are audible. Energy → master volume. Flow → gain envelope attack time (smooth = slow attack, jagged = instant).
This project draws on real research traditions: auditory display / sonification (Kramer, 1994), cross-modal correspondence (Spence, 2011), spectral analysis (Oppenheim & Schafer), and information aesthetics (Manovich, 2001). The Sensation Vector is conceptually similar to a feature vector in machine learning, but computed deterministically rather than learned.
The constraint of being 100% client-side with zero AI is deliberate: it proves that meaningful translation between modalities doesn't require neural networks — it requires thoughtful math and perceptual mappings grounded in how humans actually experience cross-sensory phenomena.