Cryptanalysis Techniques
"To know how to defend, you must first know how to attack."
Ten techniques that break almost every classical cipher in this museum β from Al-Kindi's frequency tables (850 AD) to modern hill-climbing algorithms.
Open Codebreaker's Workbench β Open Cipher Detective β10 Techniques That Break Classical Ciphers
Languages have predictable letter frequencies. In English, E=12.7%, T=9.1%, A=8.2%. Any cipher that maps one letter to one symbol preserves these frequencies. Count the symbols, compare to known frequencies, recover the key.
In a Vigenère cipher, the same plaintext + same key position = same ciphertext. Identical repeated strings in the ciphertext reveal probable key length. Their spacing is likely a multiple of the key length.
Measures statistical similarity to natural language. English text has an IC of ~0.066. Random text has ~0.038. A polyalphabetic cipher produces values between these β and the IC can reveal the key length without finding repeated strings.
Guess probable plaintext words called "cribs" β military messages often start with standard phrases. The Enigma was broken partly because operators always began with WETTER (weather), HEIL HITLER, or ANX (a header). Known structure is a fatal weakness.
When some plaintext is known, the key can often be derived directly. The Hill cipher's matrix key is recoverable with just two known plaintext-ciphertext pairs by solving a system of linear equations. Enigma used weather forecasts as cribs.
Start with a random key. Decrypt. Score the result using English language statistics β common digrams like TH, HE, IN. Make random changes to the key. Keep improvements, discard downgrades. Repeat millions of times. Works against substitution, Playfair, transposition.
Advanced optimization heuristics that explore key space more broadly than pure hill climbing. Genetic algorithms evolve populations of candidate keys. Simulated annealing occasionally accepts worse solutions to escape local optima. Breaks double transposition, Playfair, Hill cipher in seconds.
Japan's Purple machine routed plaintext through banks of telephone-style stepping switches rather than rotors. The US Signals Intelligence Service had no machine to study, so they hunted statistical regularities in intercept traffic β looking for cycles in how the switches advanced. On September 20, 1940, Genevieve Grotjan spotted the alignment that revealed the wiring of the consonant bank, letting Rowlett's team build an analog replica from inference alone. The result was MAGIC, the intelligence stream that read Japanese diplomatic traffic before Pearl Harbor.
When a 250-year-old homophonic cipher resists every manual attack, treat it as a translation problem. Kevin Knight and his collaborators modeled the Copiale manuscript's symbol stream with a hidden Markov model trained on German n-grams, then applied expectation-maximization to align symbols to phonemes. After several false starts (including the wrong source language), the EM algorithm converged on German β and the Copiale Order's initiation ritual emerged. The first major historical cipher broken by computational linguistics.
John F. Byrne's Chaocipher (1918) survived 92 years because its dynamic-permutation rule was kept secret. When the family donated his papers to the National Cryptologic Museum in 2010, Moshe Rubin reconstructed the algorithm from the worked examples. George Lasry later confirmed that with sufficient ciphertext (a few hundred characters of crib), simulated annealing on the two starting alphabets recovers the key β proving the cipher is not unbreakable, only secret.
Speed comparison: A Vigenère with a 5-letter key that took weeks in the 1800s is cracked in under one second today. Monoalphabetic substitution falls in milliseconds.
Try the Techniques
Apply cryptanalysis tools to real ciphertext.
IC = Ξ£ ni(niβ1) / N(Nβ1) Β· β
Letter Frequencies (gold = input, outline = English)
Frequency analysis is the oldest known cryptanalytic technique, formally described by the 9th-century Arab polymath Al-Kindi in his Manuscript on Deciphering Cryptographic Messages (c. 850 AD). It exploits the fact that monoalphabetic substitution ciphers preserve letter frequency β the cipher just relabels each letter, but a frequent letter in the plaintext stays frequent in the ciphertext. This is why every cipher invented after VigenΓ¨re had to break this property: a polyalphabetic key flattens the frequency distribution, denying the analyst the very pattern Al-Kindi discovered.
Drag a probable plaintext word ("crib") across the ciphertext. Where the XOR or subtraction produces readable text, you've found the key position. This is how Bletchley Park broke Enigma β they guessed words like WETTER and HEILHITLER.
If you know some plaintext and its corresponding ciphertext, you can derive the key directly. For a VigenΓ¨re cipher: Key[i] = (Cipher[i] β Plain[i]) mod 26. Enter a matched pair to recover the key.
Watch a hill-climbing algorithm break a Caesar cipher by scoring each shift against English letter frequencies. The algorithm starts at shift 0, tests neighbors, and keeps the best score β climbing toward the correct key.
Simulated annealing improves on hill climbing by occasionally accepting worse solutions to escape local optima. Watch the temperature cool as the algorithm converges on the answer. Higher temperature = more exploration; lower = more exploitation.
12 Famous Codebreaks in History
The moments that changed wars, toppled spies, and birthed the computer.
First documented scientific cryptanalysis. Introduced statistical analysis to codebreaking. Every cipher for the next 400 years was vulnerable.
Ended the myth of the "indecipherable cipher." Babbage kept his method secret; Kasiski published it in 1863 and received the credit.
First widely published method for breaking polyalphabetic ciphers. European diplomatic Vigenère systems collapsed.
Created the first Enigma-breaking machines. Passed their work to Britain and France just before WWII began β giving Bletchley Park a head start.
The US could read Japanese diplomatic traffic before Pearl Harbor. The diplomatic warning was there β the military intelligence chain failed to act on it.
Showed theoretical weaknesses in DES block cipher design. Revolutionized how cryptographers design and evaluate cipher strength.
Shortened WWII by an estimated 2β4 years. The Bombe machine tested thousands of possible Enigma settings per minute, exploiting known plaintext cribs.
Led to the creation of Colossus β the world's first programmable electronic computer. The direct ancestor of modern computing was built to break a cipher.
Soviet operators reused one-time pad key material under wartime pressure. VENONA decoded thousands of messages and exposed Julius Rosenberg and other Soviet spies in the US.
Found linear approximations of DES S-box operations, reducing the work to break DES from 2β΅βΆ to 2β΄Β³. Accelerated the case for replacing DES with AES.
Broke RSA implementations by measuring how long decryption took. The math was fine β the implementation leaked secrets through time. Side-channel security became a new discipline.
Produced two different PDF files with the same SHA-1 hash. Forced the entire internet to migrate from SHA-1 to SHA-256 and SHA-3. Cryptographic hash functions are not forever.
The Big Pattern: Most famous codebreaks succeeded not from pure mathematics, but from human mistakes (reused OTP keys, predictable message headers), protocol flaws (Enigma operators sending the same message twice), and implementation errors (RSA timing leaks). The math is often the last thing that fails. This is as true today as in Caesar's time.