Hall XII · Unsolved · Beinecke MS 408 Early 15th c. · Carbon-dated 1404–1438 Unsolved · 600 years

The Voynich Manuscript

240 vellum pages. ~38,000 words in an unknown script. Six centuries. Zero plaintext.

CatalogBeinecke MS 408 (Yale University)
AuthorUnknown
Carbon Date1404–1438 CE (vellum, McCrone Lab 2009)
Script"Voynichese" — ~25–30 base glyphs (EVA transcription)
Broken ByNo one. Every published "solution" has been refuted.
Modern LessonStatistical signatures alone cannot tell cipher from constructed language from hoax

Why This Matters

The Voynich Manuscript is the cryptographic equivalent of an unconquered Everest. Carbon dating places its vellum firmly in the early 15th century. Its 240 surviving pages — illustrated with unknown plants, naked women bathing in green pools connected by tubes, astronomical charts, and recipes — are written in a flowing, confident script that no one in 600 years has been able to read. Every major code-breaker of the modern era, from William Friedman through the present, has tried. None has succeeded. It is the only document in this museum where the curator must say, honestly: we do not know what this is.

📜Historical Context

The manuscript surfaces in the documentary record around 1580 in the court of Holy Roman Emperor Rudolf II in Prague, who reportedly paid 600 gold ducats — a fortune — for it. Its provenance trail then runs through the alchemist Georg Baresch, the polymath Athanasius Kircher (who failed to decipher it), and a Jesuit library in Rome, where the rare-book dealer Wilfrid Voynich purchased it in 1912. He gave it the name it bears today. Yale's Beinecke Library has held it since 1969 as MS 408.

The vellum itself was scientifically dated in 2009 by the University of Arizona to 1404–1438 with 95% confidence. The iron-gall ink is consistent with the same period. So whatever the manuscript is, it was physically created in the early 1400s — long before the modern hoax theories that claim Voynich himself forged it. The illustrations show plants that botanists cannot identify, anatomical drawings of women in elaborate plumbing, and 360-degree astronomical wheels with twelve segments often labeled with what appear to be zodiacal personifications.

William Friedman — the cryptanalyst who led the U.S. break of the Japanese PURPLE cipher — spent decades on the Voynich on and off, eventually concluding (in a 1959 anagram) that it was probably an early attempt at a constructed a priori philosophical language, not a cipher. His wife Elizebeth Friedman concurred. NSA cryptanalyst Mary D'Imperio wrote the seminal 1978 study The Voynich Manuscript: An Elegant Enigma, which remains the field's reference work. Statistical analyses by Reddy & Knight (2011) found Voynichese has word-length distributions and conditional letter entropy unlike any natural language, but also unlike known historical ciphers — it sits in its own corner of the statistical space.

💡

"Whatever it is, it isn't pretending to be a cipher of Latin or German or any other plain language we know." — Mary D'Imperio, An Elegant Enigma (NSA Center for Cryptologic History, 1978; declassified 1976).

⚙️What We Actually Know

Six centuries of failure has at least produced a clear inventory of statistical facts about the script:

  • Glyph alphabet: ~25–30 base glyphs depending on whose transcription you use. The modern standard is EVA (European Voynich Alphabet) developed by René Zandbergen and Gabriel Landini in 1998.
  • Word length: mean ~5 characters; the distribution is unusually narrow and binomial-shaped — natural languages have a wider, longer-tailed distribution.
  • Conditional entropy: very low. If you know one letter, you can predict the next much more accurately than in any natural language. Voynichese is more "constrained" than even simple substitution ciphers of plaintext languages.
  • Section dialects: the manuscript has at least two scribal "languages" (Currier A and Currier B, identified 1976) with measurably different statistics, suggesting either two scribes, two source texts, or two encryption keys.
  • Word repetition: the same word often appears two or three times in a row — a pattern that is rare in natural language and absent from most known historical ciphers.
  • Page-position effects: certain glyphs appear preferentially at the start of lines or words, suggesting a structural rule (like Hebrew final letters or Arabic positional forms).
EVA GLYPH FREQUENCIES (approx., from Currier A pages) o e a i y h d l k s r n t m
Approximate EVA-glyph frequencies (Currier A pages). The distribution is roughly Zipfian — like natural language — but the joint distributions of glyph pairs are far more constrained than any known historical plaintext.
💀Decipherment Attempts (none successful)
William Newbold (1921)
Complexity: Refuted within years

University of Pennsylvania professor claimed the script encoded a Latin shorthand by Roger Bacon. John M. Manly demolished the claim in Speculum (1931) by showing Newbold's method was so flexible it could "decode" any text into anything.

Joseph Martin Feely (1943)
Complexity: Refuted

Claimed a vowel-deleted Latin substitution. Statistical follow-up showed the proposed mapping yielded ungrammatical Latin throughout.

William & Elizebeth Friedman (1944–1959)
Complexity: Concluded "constructed language, probably not cipher"

The Friedmans assembled a study group of WWII-era cryptanalysts. After 15 years they concluded Voynichese was most likely an early attempt at an a priori philosophical language — a category invented later in the 17th century by John Wilkins. They published the conclusion as an anagram, decoded posthumously.

Stephen Bax (2014)
Complexity: Partial — < 10 words; widely disputed

Linguist proposed identifications for ~10 plant names by matching glyphs to Arabic and other medieval botanical labels. Other Voynich scholars consider the matches statistically indistinguishable from chance.

Gerard Cheshire (2019)
Complexity: Refuted

Claimed the manuscript was "proto-Romance," a previously unknown medieval language. The University of Bristol withdrew its press release after medievalists and Romance linguists pointed out the proposed grammar did not exist.

⚠️

Why every attempt has failed: A claimed decipherment must (a) work consistently across the whole manuscript, not just cherry-picked words; (b) produce grammatically coherent text in the claimed language; (c) match the unique statistical fingerprint of Voynichese (binomial word lengths, low conditional entropy, repeated-word runs). No proposed solution has met all three. Until one does, the responsible answer is: we do not yet know.

🎓What It Teaches Modern Cryptography
Voynichese PhenomenonModern Crypto Parallel
600 years undecipherableSome plaintexts may be permanently lost: archaeology of digital media, expired private keys, dead-drop OTPs without their pads
Looks like language but isn't quiteDistinguishing cipher / code / constructed language / random / hoax requires multiple statistical tests, not just one
Word repetitions and binomial lengthCiphertext indistinguishability (IND-CPA): modern ciphertext should look like uniform noise, with no exploitable structure of any kind
Multiple "scribal dialects" (Currier A/B)Mixed encryption modes / multiple keys in the same corpus — a forensic problem that recurs in ransomware analysis
Every published solution has been wrongClaimed cryptanalytic breaks must be verified by independent reproduction, not press releases
🎓

Direct connection: The reason modern cryptographers prefer ciphers with provable properties (IND-CPA, IND-CCA2) over ciphers that merely "look secure" is the Voynich problem in reverse. Voynichese looks like language but probably isn't; modern ciphertext should look like noise but actually carries information. Both cases prove the same point: statistical appearance is not proof of structure (or its absence).

Quick Facts
Exhibit40 of 40
Pages240 surviving (of est. ~272 original)
Words~38,000
Glyph alphabet~25–30 (EVA standard)
Vellum date1404–1438 (carbon, 95% CI)
Held atBeinecke Rare Book Library, Yale
CatalogMS 408
StatusUnsolved
🎴About this Demo

The interactive demo above is not a decipherment of Voynichese — that remains unsolved. It is a visualization tool: it lets you type English and see what your text would look like rendered in EVA glyph letters (the modern romanized transcription of the script), and decode back. Use it to get a feel for the visual rhythm and length of Voynichese "words" — try a long passage and notice how the result looks Voynich-like even though it carries trivial English plaintext.

For images of the actual manuscript, visit the Beinecke MS 408 digitization ↗ — every page is high-resolution and freely downloadable.

📜If You Want to Try

The Voynich attracts thousands of would-be solvers a year. If you want a real shot, the entry barriers are:

  • Read D'Imperio's An Elegant Enigma (free online from NSA).
  • Use the EVA transcription, not your own glyph identification.
  • Test your hypothesis on the whole manuscript, not 10 cherry-picked words.
  • Reproduce the binomial word-length distribution and the low conditional entropy in your proposed plaintext language.
  • Submit through the voynich.nu mailing list ↗ for peer review before any press release.
← Previous Hall XII: Unsolved Ciphers