Known-Ciphertext Benchmark Library

Cipher Corpus

100,026 verified ciphertexts for learning, testing, and benchmarking classical cryptanalysis.

Each record includes cipher type, key, plaintext, difficulty, language, and expected attacks — fully machine-readable and openly licensed. Browse in Challenge Mode to attempt a solve, or switch to Known Answers to study the metadata.

82 cipher types · 84 engines 9 languages 55 historical records Beginner → Expert tiers Noisy & multilingual variants JSONL / CSV / JSON LLM 3-shot eval export

Challenge Mode hides the solution — attempt a ciphertext-only solve. Switch to Known Answers to see plaintext, key, and expected attacks.

Open Dataset

Download Corpus

All records are openly licensed. Synthetic records are CC0.

Complete Dataset

By Difficulty

Specialty Sets

Working With the Corpus

For Learners & Builders

Learners & Educators

Practice identifying cipher types, applying attack methods, and measuring how text length, spacing, language, and key choice affect solvability.

  • Break a Caesar cipher by brute force
  • Estimate Vigenère key length by index of coincidence
  • Use Kasiski examination on repeated patterns
  • Compare transposition with substitution
  • Measure how spacing removal changes difficulty

Tool Builders & Researchers

Machine-readable known-plaintext/ciphertext pairs for testing solvers, cryptanalysis tools, and LLM evaluation workflows.

import json
with open("all.jsonl", "r", encoding="utf-8") as f:
    for line in f:
        rec = json.loads(line)
        print(rec["id"], rec["cipher_type"])

Do not train and evaluate on the same split when measuring model performance.

Research Context

Attribution & Citation

Cipher Corpus builds on CipherBank by Li et al. (2025) — the first systematic benchmark for LLM cipher-breaking on classical ciphers. CipherBank demonstrated that even advanced models achieve only ~45% accuracy, establishing the need for comprehensive evaluation infrastructure.

Cipher Corpus extends this with 100,026 test cases across 82 algorithms (vs. CipherBank's 2,358 across 9), 55 historical records with verified provenance, 9 languages, noisy variants, and blind benchmark splits. CipherBank paper →

Cite Cipher Corpus
@misc{lester2026cipherCorpus,
  title={Cipher Corpus: Comprehensive Classical Cryptanalysis Benchmark},
  author={Lester, Paul},
  year={2026},
  url={https://ciphermuseum.com/cipher-corpus.html},
  note={100,026 test cases across 82 cipher algorithms}
}