Synesthesia Machine

Input Source

¶ Text

◧ Image

◎ Audio

⊞ Data

Render Style

Particles

Waveform

Grid

Orbital

Feed me something

text · image · voice · data

Sensation Vector

Energy

0.00

Complexity

0.00

Rhythm

0.00

Temp

0.00

Density

0.00

Chaos

0.00

Flow

0.00

Scale

0.00

Output Channels

Visual

Audio

Presets

Poem

Chaos

Calm

Dataset

◈ What Is This?

Synesthesia Machine is a cross-modal translation engine. It takes any input — text, images, audio, or structured data — and translates it into visual and sonic output. The name comes from synesthesia, the neurological phenomenon where stimulating one sense triggers another (e.g., "hearing" colors or "seeing" sounds).

⊘ Zero AI · Zero Server · Pure Math

◈ The Pipeline

Every input passes through the same three-stage architecture:

¶ ◧ ◎ ⊞ Raw Input text · image · audio · csv

→

⬡ Analyzer domain-specific math

→

◉ Sensation Vector 8 dimensions · 0.0–1.0

→

◐ ♫ Renderers visual · audio

The Sensation Vector is the universal bridge. Any input produces one; any renderer can consume one. This decoupled design means you can add new inputs or outputs without touching existing code.

◈ The 8 Dimensions

Dimension	What It Measures	Text	Image	Audio	Data
Energy	Overall intensity	Exclamation density + caps ratio	Mean pixel brightness	RMS amplitude	Normalized mean of numeric columns
Complexity	Information density	Vocabulary richness (unique / total words)	Edge density via neighbor-pixel differencing	Spectral flatness (geometric / arithmetic mean)	Column count × row count scaled
Rhythm	Pattern regularity	Inverse of sentence-length variance	Inverse of histogram entropy	Onset interval consistency	Row-to-row delta consistency
Temperature	Warm vs. cool / positive vs. negative	Sentiment polarity via lexicon lookup	Red channel vs. blue channel ratio	High-frequency vs. low-frequency band ratio	Trend direction (rising = warm)
Density	How packed vs. sparse	Words per sentence (normalized)	Non-white pixel ratio	Zero-crossing rate	Inverse of null/empty cell ratio
Chaos	Randomness vs. order	Punctuation irregularity	Shannon entropy of brightness histogram	Spectral flux + zero-crossing combined	Standard deviation / mean ratio
Flow	Smoothness vs. jaggedness	Word-length transition smoothness	Second-derivative of brightness (gradient smoothness)	Energy stability around midpoint	Autocorrelation of sequential values
Scale	Magnitude / scope	Document character count (normalized to 2000)	Total pixel count relative to 1080p	Mirrors energy (louder = bigger)	Row count normalized to 1000

◈ Text Analysis — No AI

Sentiment is computed using a hand-curated lexicon of ~140 words split into positive and negative sets. The formula:

temperature = 0.5 + (positiveCount − negativeCount) / wordCount × 5
complexity = uniqueWords / totalWords × 1.2
rhythm = 1 − (sentenceLengthVariance / 100)
chaos = punctuationChars / totalChars × 10

This is deliberately not a machine learning approach. The constraints are the point — every computation is transparent, auditable, and runs in microseconds.

◈ Image Analysis

Images are downsampled to 200×200 pixels for speed. Edge detection uses neighbor-pixel differencing (a simplified Sobel approximation). Chaos is measured via Shannon entropy of the brightness histogram:

entropy = −Σ (pᵢ × log₂(pᵢ)) for each brightness bucket
chaos = entropy / 8 (normalized, since max entropy for 256 bins ≈ 8)

Flow uses the second derivative of brightness — measuring how smoothly gradients change, not just whether they exist.

◈ Audio Analysis

Live audio is processed via the Web Audio API's AnalyserNode with a 512-sample FFT. Key computations:

energy = √(Σ(sample²) / N) × 3 — RMS amplitude
complexity = geometricMean(spectrum) / arithmeticMean(spectrum) — spectral flatness
temperature = highBandEnergy / lowBandEnergy × 0.5 — frequency balance

Spectral flatness is a real audio analysis metric: a value near 1.0 means noise-like (all frequencies equal), near 0.0 means tonal (energy concentrated in harmonics). It maps beautifully to "complexity."

◈ CSV / Data Analysis

The engine auto-detects numeric columns (those where >50% of values parse as numbers). Rhythm comes from row-to-row delta consistency — if the differences between consecutive values are regular, rhythm is high (think: steady growth). Temperature is computed by comparing first-half mean vs. second-half mean — a rising trend reads as "warm."

◈ Visual Rendering

The vector is lerped (linearly interpolated at 8% per frame) toward the target, creating smooth transitions. Each render mode maps dimensions differently:

Particles: Density → particle count (50–500). Energy → speed. Chaos → jitter displacement. Flow → connection lines between nearby particles. Rhythm → pulsing size modulation via sin(time × rhythmFreq).

Waveform: Complexity → layer count (3–8). Energy → wave amplitude. Rhythm → modulation frequency. Temperature → hue shift per layer. Each layer uses two superimposed sine functions at different frequencies for organic shape.

Grid: Complexity → grid resolution. Flow determines shape morphing — high flow = circles, low flow = squares. Chaos adds rotational and positional jitter.

Orbital: Complexity → orbit count (3–9). Bodies per orbit driven by density. Temperature → hue. Chaos → wobble displacement. Flow → orbital eccentricity (elliptical vs. circular).

◈ Audio Output (Sonification)

Four oscillators (sine, triangle, sawtooth, square) are always running. The vector controls:

Temperature → base frequency (110 Hz cold → 550 Hz warm). Chaos → random detuning between oscillators. Complexity → how many harmonics are audible. Energy → master volume. Flow → gain envelope attack time (smooth = slow attack, jagged = instant).

◈ References & Lineage

This project draws on real research traditions: auditory display / sonification (Kramer, 1994), cross-modal correspondence (Spence, 2011), spectral analysis (Oppenheim & Schafer), and information aesthetics (Manovich, 2001). The Sensation Vector is conceptually similar to a feature vector in machine learning, but computed deterministically rather than learned.

The constraint of being 100% client-side with zero AI is deliberate: it proves that meaningful translation between modalities doesn't require neural networks — it requires thoughtful math and perceptual mappings grounded in how humans actually experience cross-sensory phenomena.

Synesthesia Machine

Feed me something

The Science Behind Synesthesia Machine

◈ What Is This?

◈ The Pipeline

◈ The 8 Dimensions

◈ Text Analysis — No AI

◈ Image Analysis

◈ Audio Analysis

◈ CSV / Data Analysis

◈ Visual Rendering

◈ Audio Output (Sonification)

◈ References & Lineage