Cipher Corpus
100,026 verified ciphertexts for learning, testing, and benchmarking classical cryptanalysis.
Each record includes cipher type, key, plaintext, difficulty, language, and expected attacks — fully machine-readable and openly licensed. Browse in Challenge Mode to attempt a solve, or switch to Known Answers to study the metadata.
Challenge Mode hides the solution — attempt a ciphertext-only solve. Switch to Known Answers to see plaintext, key, and expected attacks.
Download Corpus
All records are openly licensed. Synthetic records are CC0.
For Learners & Builders
Learners & Educators
Practice identifying cipher types, applying attack methods, and measuring how text length, spacing, language, and key choice affect solvability.
- Break a Caesar cipher by brute force
- Estimate Vigenère key length by index of coincidence
- Use Kasiski examination on repeated patterns
- Compare transposition with substitution
- Measure how spacing removal changes difficulty
Tool Builders & Researchers
Machine-readable known-plaintext/ciphertext pairs for testing solvers, cryptanalysis tools, and LLM evaluation workflows.
import json
with open("all.jsonl", "r", encoding="utf-8") as f:
for line in f:
rec = json.loads(line)
print(rec["id"], rec["cipher_type"])
Do not train and evaluate on the same split when measuring model performance.
Attribution & Citation
Cipher Corpus builds on CipherBank by Li et al. (2025) — the first systematic benchmark for LLM cipher-breaking on classical ciphers. CipherBank demonstrated that even advanced models achieve only ~45% accuracy, establishing the need for comprehensive evaluation infrastructure.
Cipher Corpus extends this with 100,026 test cases across 82 algorithms (vs. CipherBank's 2,358 across 9), 55 historical records with verified provenance, 9 languages, noisy variants, and blind benchmark splits. CipherBank paper →
@misc{lester2026cipherCorpus,
title={Cipher Corpus: Comprehensive Classical Cryptanalysis Benchmark},
author={Lester, Paul},
year={2026},
url={https://ciphermuseum.com/cipher-corpus.html},
note={100,026 test cases across 82 cipher algorithms}
}