Wiggles, part 16


Just a little ongoing story to give you something to play with until the next blog post.

RSV LZYVH LA RSV MCH OCTV XV C QNRRQV YLU CYU ZVYR MCPW RL XCWNYO UHNYWD ALH C UNAAVHVYR RCMQV LA LAANPV ZLHWVHD. YLYV LA RSV LRSVH PEDRLXVHD ZVHV QLLWNYO CR XV CD N VUOVU XJ ZCJ RL RSV DRCOV, MER N WYVZ ALH C ACPR RSCR VTVHJ LYV LA RSVX SCU MVVY DRCHNYO CR RSV RZL LA ED CYU AVCHNYO C MNO PLYAHLYRCRNLY LA DLXV WNYU. C AVZ ZVHV VXMCHHCDDVU ALH XV RSCR N’U MVVY VBKLDVU RL RSND RJKV LA MVSCTNLH (XJ EDNYO VYOQNDS XCHWVU XV CD C KLRVYRNCQ ALHVNOYVH) CYU RSVHV ZCD DLHR LA C PLQQVPRNTV DNOS LA HVQNVA RSCR RSV UHCXC ZCD YLZ LTVH. HVCPSNYO ULZY, N KNPWVU EK XJ OENRCH AHLX ZSVHV N’U DVR NR YVBR RL XJ DRLLQ, KHLKKVU XJDVQA LY RSV VUOV LA RSV DRLLQ CYU DCNU “DSNRDEHV DSNXCDSNRC” (“DLHHJ ALH MVNYO NXKLQNRV”). VTVHJLYV XEHXEHVU MCPW “NV, UL XCNY, UL XCNY” (NR DLEYUD QNWV “ULEOS XNYU”, MER ND RSV ICKCYVDV KHLYEYPNCRNLY LA “ULY’R XNYU.” NY LRSVH ZLHUD, “ULY’R QVR NR MLRSVH JLE.”) RSV PLYTVHDCRNLYD MVRZVVY RSV PEDRLXVHD OLR QLEUVH, CYU N RNUVU XJDVQA LTVH MJ KQCJNYO C PLEKQV CPLEDRNP PLTVHD LA DLERSVHY CQQ-DRCHD CYU VBNQV SNRD. N EYUVHDRCYU RSCR CDPCK OVRD NY JLEH OHNQQ NA JLE PLTVH LRSVH KVLKQV’D XEDNP NY RSV DRCRVD. ZV ULY’R SCTV CDPCK SVHV.

Advertisements

Pointy


I was trying to solve a substitution cipher, and the clue was that the word “point” was in the message. I wrote a little VBscript for rolling “point” across the text to see if something readable might pop out, but I got this, instead. It looks cool, but I’ve given up trying to crack this puzzle for the time being. Looks kind of fractal…

Thinking About Encryption, Part 19


This entry is going to be a bit more of a grab bag than usual.
Some time earlier, I mentioned that there are three reasons for encrypting text or files. Technically, there is a fourth – and that is simply to partake in a hobby along the lines of Sudoku. This is where the American Cryptogram Association comes in. The ACA is an NPO dedicated to promoting cryptography as a hobby. In this sense, the ciphers they create for each other are intended to be broken, although some are harder than others. With these ciphers, especially the ones related to Vigenere, you want to recover the key when possible, because that makes solving the cipher as a whole that much easier (and may be required to claim credit for cracking the puzzle).

Before I get too much farther, I want to introduce the Index of Coincidence (IC). William Friedman (menioned last time), developed IC to measure how similar two separate pieces of text are, or how likely it is that a cipher was created using more than one alphabet (monoalphabet or polyalphabet). IC is used by the ACA to determine the likelihood that a cipher may be of the Vigenere variety. As described in the wiki article, IC can also be used for determining the Vigenere key length, but the ACA introductory material doesn’t get into that.

For our purposes, IC = sum (of the letter frequencies times the frequencies minus 1), divided by (the total letter count times the total minus 1).

Say we have the text:
PODF VQPO B UJNF, UIFSF XFSF UISFF CFBST. B NPNNB CFBS, B QPQQB CFBS BOE B CBCZ CFBS. POF EBZ, UIF UISFF CFBST XFOU GPS B XBML JO UIF XPPET.

A = 0
B = 16
C = 7
D = 1
E = 3
F = 19
G = 1
H = 0
I = 5
J = 2
K = 0
L = 1
M = 1
N = 4
O = 6
P = 8
Q = 4
R = 0
S = 10
T = 3
U = 7
V = 1
W = 0
X = 4
Y = 0
Z = 2

For the example A = 0, B = 16, C = 7, etc. There are 105 letters total.
IC = ((0 * -1) + (16 * 15) + (7 * 6) +…) / (105 * 104)
IC = 0.0837

Now, for:
FNPHI BFNNW WYVTU HFQNE EHHTI ERESM
ISNPC YDAOH ODRPB SDMSE NUOZU AODPK
SENUC ZVDNB HTVTU USQSE NUGIV NGICD
RWNOY UETUH KAFDF

A = 3
B = 3
C = 3
D = 7
E = 8
F = 5
G = 2
H = 7
I = 5
J = 0
K = 2
L = 0
M = 2
N = 11
O = 5
P = 4
Q = 2
R = 3
S = 7
T = 5
U = 9
V = 4
W = 3
X = 0
Y = 3
Z = 2

IC = ((3*2) + (3*2) + (3*2) + …)/(105 * 104)
IC = 0.047

The plaintext is the same for both examples, but the IC values are seriously different. The first one is for a simple Caesar shift, and the second is a Vigenere cipher. The ACA’s suggestion is that anything over 0.055 is most likely from a single alphabet (or a monoalphabet cipher), and 0.0385 is closer to random text, or what to expect for a polyalphabetic cipher. So, the Caesar shift cipher IC of 0.0837 is strongly correlated, while the Vigenere IC of 0.047 is a bit on the high side. But still, this illustrates the concept.

My plaintext was:
“Once upon a time, there were three bears. A momma bear, a poppa bear and a baby bear. One day, the three bears went for a walk in the woods.”

The point is that IC is a good starting point when trying to decide if you’re working with a Vigenere-type or not. Note that in the wiki article, IC is normalized by multiplying it by the number of letters in the alphabet (26 for uppercase English), while the ACA version doesn’t bother with normalizing.

—-

The next application for IC is to identify the key length for Vigenere-type ciphers. The idea here is that if you pick the wrong key length, the IC for individual letter groups will be closer to 0.0385 (i.e. – random text). The process is to pick a key length of 2 to start, separate the letters into groups as normal when deciphering Vigenere text, then calculate the Index seperately for each group, add the Index values together and divide by the number of groups (to get Delta IC).

Key Length = 2
Group1 = FPIFNWVUFNEHIRSINCDOORBDSNOUOPSNCVNHVUSSNGVGCRNYEUKFF
Group2 = NHBNWYTHQEHTEEMSPYAHDPSMEUZADKEUZDBTTUQEUINIDWOUTHAD
IC1 = 0.058
IC2 = 0.053
Delta IC = 0.055

Then, repeat the process with progressively longer key lengths. When you reach the correct length, each of the groups will consist of simple Caesar-shifted ciphers, and the IC will maximize as a result. Stop when you think you’ve reached the largest possible key length (or, when you get to cipher length/2). (KL = key length)

KL = 2, Delta IC = 0.055
KL = 3, Delta IC = 0.059
KL = 4, Delta IC = 0.057
KL = 5, Delta IC = 0.044
KL = 6, Delta IC = 0.096 <- Max.
KL = 7, Delta IC = 0.048
KL = 8, Delta IC = 0.055
KL = 9, Delta IC = 0.050
KL = 10, Delta IC = 0.060

The theory works pretty well. My key was 6 letters long. (Note that the max Delta IC will approach 0.1.)

Finally, we can use a variation of IC to determine the Vigenere key itself, by summing the products of the individual letter frequencies for a specific group against the relative letter frequencies for English. The largest result indicates the best suggestion for the Caesar shift for that grouping. I used the percentages given in the wiki article.

The pseudocode looks like:

Load English ratios into aryRatios[]
Load Vigenere strings to aryGroup[] for keylength = 6
set alphaShift = 0
for each str in aryGroup[]
chi = 0
for alphaShift = 0 to 25
for i = 0 to 25
chi += str[(i + alphaShift) % 26] * aryRatios[i]
next
next
print maxChi
print maxAlphaShift
next
print maxAlphaShift for each str in aryGroup[]

I wrote a “quick” program in VBScript, which gave me:

Group1 = FFVNIIDRSUSVVSVREF
Max Shift = R Max Chi = 1.21881

Group2 = NNTEESAPEAEDTENWTD
Max Shift = A Max Chi = 1.46381

Group3 = PNUERNOBNONNUNGNUF
Max Shift = J Max Chi = 1.24241

Group4 = HWHHEPHSUDUBUUIOH
Max Shift = D Max Chi = 1.18735

Group5 = IWFHSCODOPCHSGCYK
Max Shift = O Max Chi = 1.10962

Group6 = BYQTMYDMZKZTQIDUA
Max Shift = M Max Chi = 1.04946

Potential key: |RAJDOM|

Oh, so close. But no cigar. My keyword was “RANDOM”. The problem was that the plaintext message was too short for the letter frequencies to really approximate English, and the “J” shift for group 3 came out just SLIGHTLY bigger than for the “N” shift (chi = 1.227).

But, hey, plugging in “RA_DOM” for the key lets me pull out enough of the plaintext as to let me guess what the key should be, anyway, even if I couldn’t figure it out from the incorrect key on my own.

One last topic for this entry. Autokey.
The idea behind the Autokey cipher is that the plaintext is actually part of the key for Vigenere-type ciphers. Note that the wiki article gives the method used by the ACA.

First, we take the plaintext message, and what’s called the “primer”. The key is the primer + plaintext.

Plaintext: “Once upon a time, there were three bears.”
Primer: “RANDOM”
Key: “RANDOMONCEUPONATIMETHEREWERETHREEBEARS”

The advantage of autokey is that it generates a key that’s longer than the plaintext. This offers the same security to Vigenere that running key does. Therefore, you can’t attack the key length, and the Index of Coincidence isn’t going to help you here.

The weakness of autokey is the same as for running key – it’s especially vulnerable to “probable word” attacks.

Say we have the full plaintext used above, and we use Vigenere to encrypt it with autokey with the primer “RANDOM”. The ciphertext becomes:

FNPHI BCACX CBSGH XZQAX YIKLN IVFXH
IWENS MDSBQ ODMPP TPRBT OGPNE EBRBL
EEBRP LFHAP HUIWH PXLFX HIWAF RTWGN
EJTQY ZNPHP GWBWZ

To decipher this message normally, we start with the primer.
As we decrypt each letter of the ciphertext, we add the plaintext to the end of the primer to build up the key as we go.


R + F -> O - RANDOMO
A + N -> N - RANDOMON
N + P -> C - RANDOMONC
D + H -> E - RANDOMONCE
O + I -> U - RANDOMONCEU
M + B -> P - RANDOMONCEUP
O + C -> O - RANDOMONCEUPO
N + A -> N - RANDOMONCEUPON
C + C -> A - RANDOMONCEUPONA
E + X -> T - RANDOMONCEUPONAT

To get the finished plaintext, just remove the primer from the fully assembed key and add word breaks.

“ONCE UPON A TIME THERE WERE THREE BEARS…”

To crack autokey, pick a common English word you’d expect to see in the plaintext. Write this word over the cipher as a temporary key and try to decipher the text as normal. Then, shift this temp word by one letter to the right and repeat the process. Do this for all positions of the word (i.e. – “THERE”, “ETHER”, “RETHE”, “ERETH” and “HERET”). The reason for doing this is that we don’t know the lenth of the primer, and there’s no reason to expect that our temp word is going to line up perfectly with the target text in the cipher on the first try.


THERE THERE THERE THERE THERE THERE (Temp key)
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH (Cipher)
MGLQE IVELT JUOPD ESOJT FZGUJ POBQD (Result)

ETHER ETHER ETHER ETHER ETHER ETHER
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH
BUIDR XJTYG YILCQ TGJEH UPDHW ECYTQ

RETHE RETHE RETHE RETHE RETHE RETHE
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH
OJWAE KYHVT LXZZD GVXTT HEREJ RRMQD <- “THERE”

ERETH ERETH ERETH ERETH ERETH ERETH
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH
BWLOB XLWJQ YKONA TIOTQ URGSG EEBEA

HERET HERET HERET HERET HERET HERET
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH
YJYDL UYJYE VXBCO QVZWE RETHU BROTO <- “WERETH”

We have two promising options here, with “THERE” showing up in test case 3, and “WERETH” in test case 5. The first option confirms that our temp word may be in the key. The fact that “WERETH” shows up in close proximity to “THERE” indicates that we should try attacking that section by completing that part of the key with “THEREWERE”. (I added “TH” at the same time because “WERE” is going to give us “WERETHREEB”, and with the extra two letters “WERE THREE BEA” is pretty good confirmation that we’ve cracked the key.)


----- ----- ----- ---ET HEREW ERETH (Key)
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH (Cipher)
----- ----- ----- ---WE RETHR EEBEA (Plain)

What remains now is to work left to extract the beginning of the plaintext, along with the primer. We do this by plugging “THERE” into the deciphered text line ahead of “WERE”, and then check the Vigenere table to get the matching cipher text. To make things a bit easier, I’m going to flip the lines vertically


----- ----- ---TH EREWE RETHR EEBEA (Plain)
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH (Cipher)
----- ----- ----- ---ET HEREW ERETH (Key)

Gives:


----- ----- ---TH EREWE RETHR EEBEA (Plain)
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH (Ciper)
----- ----- ---NA TIMET HEREW ERETH (Key)

Going through the motions completes the full puzzle.


ONCEU PONAT IMETH EREWE RETHR EEBEA (Plain)
FNPHI BCACX CBSGH XZQAX YIKLN IVFXH (key)
RANDO MONCE UPONA TIMET HEREW ERETH (Key)

Summary:
1) The Index of Coincidence (IC) is good for determining whether a cipher uses one, or more than one alphabets.
2) Normalizing IC by multiplying it by the number of letters in the alphabet isn’t necessary as long as you’re consistent.
3) IC can also be used to obtain the key length of Vigenere-type ciphers.
4) A version of IC can be used for (possibly) obtaining the shift values for Caesar shifted ciphers.
5) The Auto-key cipher is just a Vigenere variant where the plaintext is used for the key.
6) The key doesn’t cycle for Auto-key, so it is impervious to key length attacks.
7) Auto-key is very susceptible to “probable word” attacks.

Wiggles, part 15


Just a little ongoing story to give you something to play with until the next blog post.

G COKN RVAVFNVZ BIKVSY GP VPXSGKL, FPZ BJDVZ GP F SGNNSV QSJKVR. NLV XOI HFK JUDGJOKSI NRIGPX NJ ZVQGZV HLVNLVR NJ UV QJPYOKVZ JR FPXRI, FPZ LV SVFPVZ GP F UGN NJ YJQOK UVNNVR JP BV. HLVP LV RVFSGEVZ NLFN LV HFK KNFRGPX GPNJ BI QLVKN, LV NJJT F KNVA UFQT NJ NRI NJ AVVR OA FN BI YFQV. UVQFOKV G HFK NJHVRGPX FSBJKN F NLGRZ JY F BVNVR JDVR LGB (12”), LV LFZ NJ NFTV F KVQJPZ KNVA UFQT. G NJJT F KNVA YJRHFRZ, FPZ NLGK NORPVZ GPNJ F TGPZ JY YOPPI SGNNSV ZFPQV, OPNGS G XJN LGB NJ NLV ZJJR. NLFN HFK JAVP NJ SVN JON KJBV JY NLV RVVT, FPZ UI NLV NGBV BI ZFPQV AFRNPVR RVFSGEVZ LV HFKP’N HGNLGP FRB’K ZGKNFPQV JY LGK UVVR, LV YGXORVZ NLFN LV’Z UV QJPYOKVZ. FPZ, HLVP IJO’RV QJPYOKVZ, NLV UVKN ASFQV NJ UV GK NLV NRFGP KNFNGJP, UVQFOKV IJO QFP UV QJPYOKVZ JP F PGQV, QJJS AFNQL JY YSJJR NGSV, FPZ AFKK JON OPNGS NLV PVWN NRFGP FRRGDVZ. LV KAOP FRJOPZ, KNOBUSVZ FHFI F YVH KNVAK, NLVP AOTVZ FSS JDVR NLV KGZVHFST. COKN FPJNLVR YRGZFI PGXLN.

Thinking About Encryption, Part 18


Just when I think I’m out, they drag me right back in…

The Gronsfeld cipher is named after a Count Gronsfeld (I can’t find anything for specifically which Count), and is a variant of the Vigenere cipher. The only real difference is that the Gronsfeld key is a string of single digits between 0 and 9. The advantage that the key is not a human-readable word (or string of words) is completely wiped out by the fact that it only uses 10 alphabets (0-9) (compared to Vigenere’s 26 (A-Z)). You attack Gronsfeld the same way you would Vigenere, either by determining the key length, or by “key elimination” (AKA: a “probable word” approach against the plaintext).

I encountered Gronsfeld accidentally while reading a Signal Corps Bulletin story written by the famed William Frederick Friedman (1891-1969), first head of the American Signal Intelligence Service (SIS). He began the division with three “junior cryptanalysts” in April 1930 – Frank Rowlett, Abraham Sinkov, and Solomon Kullback. Although the SIS was a secret branch of the U.S. Army Signal Corps, Friedman published Edgar Allen Poe, Cryptographer in American Literature, vol. VIII, no. 3, November 1936, and it was reprinted in issue no. 97 of the Signal Corps Bulletin (for July to September 1937). He followed this up with Jules Verne as Cryptographer, in issue no. 108 (April-June 1940). (These bulletins have been declassified.)

I discovered all this as I was writing up the blog entry on famous substitution ciphers in fiction. Specifically, I was researching Jules Verne’s Journey to the Center of the Earth cipher, and had found a link mentioning that Verne had used ciphers in other works as well. In trying to follow this up, I reached Cipher Mysteries, which had a link to a section of Bulletin no. 108, with Friedman’s essay on Verne. This essay covered Journey (1864), The Giant Raft (1881) and Mathias Sandorf (1885). This got me interested in the Signal Corp bulletins, and I pretty quickly found the essay on Poe in SCB 97.

The Signal Corps Bulletins as a whole are absolutely fascinating, historically-speaking. They include technical information, training manuals, casual information, people movements, fiction, and articles showing a growing understanding of the importance of ciphers in the Corps. No. 97 also has an article on “Cipher Busting in the Seventh Corps Area” by Col. Stanley L. James, and on “Analysis Versus the Probable Word” by Howell C. Brown. No. 108 has a piece on “Transpositions” by W.C. Babcock, but that’s missing from the excerpt PDF I’ve found. The problem is being able to find more than 2-3 bulletins online that have articles on ciphers. Fortunately, I did locate the nsa.gov site for declassified papers, specifically the William Friedman Collection. This got me to Articles on Cryptography and Cryptanalysis from the Signal Corp Bulletin. This is a 300-page PDF that collects close to 30 articles, both written by Friedman and others, including the Poe and Verne articles, as well as an addendum on the Poe article that appeared in Bulletin 94.

I’ve already covered the runic cipher in Journey to the Center of the Earth, but in his article Friedman goes into much greater detail on the accuracy of Verne’s descriptions of the attack on the cipher, and whether Verne himself really understood what he was talking about (general consensus – Verne was a half-decent amateur).

In Mathias Sandorf, the cipher uses what Friedman calls a rotating grille. This is a square ruled into a grid, with a few holes punched out. If the cipher text is grouped into 6 rows of 6 letters each, the grid needs to be sized so that it covers exactly the 6×6 text. First, you place the grille over the text with the “up” (north) edge pointing up, and you write out the letters that appear in the holes, left to right top to bottom. You then rotate the grille 90 degrees (right or left, depending on the algorithm), and write out the next set of letters. Do two more rotations, and you have the full message (you encrypt the message the same way). Any sized grids can be used as long as they’re square; even-numbered sides will use all the cells of the grid; odd-numbered sides will leave one cell covered (you can use rectangular grids, but the ways you can rotate them are more limited). With Verne’s cipher, the grid is 6×6, for 36 cells. One-quarter of the cells (9) have holes specifically chosen to expose different letters with each rotation.

The ciphertext, taken from wikisource.org is:

ihnalz zaemen ruiopn
arnuro trvree mtqssl
odxhnp estlev eeuart
aeeeil ennios noupvg
spesdr erssur ouitse
eedgnc toeedt artuee

You’ll have to massage it a bit to make the text square. OR, just write it out on graph paper, one 6×6 block at a time.

The text as written out from the grille is reversed, and needs to be re-reversed to read: “‘·Tout est pret. Au premier signal que vous nous enverrex de Trieste, tons se le•erout en masse pour l’independance de la Hongrie. Xrzah.” Verne has his main character claim that the last 5 characters are a “conventional signature,” when they’re actually nulls to pad out the block. According to Friedman, Verne makes a few assumptions in the deciphering of this message that a professional cryptographer would have never made. But, still, Verne’s doing better than Poe.

Ok, this gets me to The Giant Raft. In this story, the hero, Joam Dacosta, AKA Joam Garral, is accused of committing a murder. A letter is sent proving that Joam is innocent, but it’s encrypted and no one has the key. The Judge on the case, Jarriquez, sets out to break the cipher in 8 days before Joam is to be executed. A large part of the story consists of Jarriquez’ (mostly failed) attempts at this. The last paragraph of the letter reads:

Phyjslyddqfdzxgasgzzqqehxgkfndrxujugiocytdxvksbxhhuypohdvyrymhuhpuydkjoxphetozsletnpmvffovpdpajxhyynojyggaymeqynfuqlnmvlyfgsuzmqiztlbqgyugsqeubvnreredgruzblrmxyuhqhpzdrrgcrohepqxufivvrplphonthvddqfhqsntzhhhnfepmqkyuuexktogzgkyuumfvijdqdpzjqsykrplxhxqrymvklohhhotozvdksppsuvjhd.

The Judge eventually decides that this is a Gronsfeld cipher. Verne himself was convinced that Gronsfeld was impossible to decipher without the key, or at least without a significant amount of intelligence and luck. What’s funny is that in the same issue of the Signal Corps bulletin as Friedman’s article, is Howell Brown’s “Analysis Versus the Probable Word,” which specifically demonstrates how to attack a short Vigenere cipher (when attacking the key won’t work) using a “probable word.”

Howell’s example uses the ciphertext:
“YGFAT NZAQS CAAAX QSGGO EZAGP RYAXX”

His approach is to write the word he thinks may be in the plaintext (the “probable word”) vertically at the left of the message, and apply that to the cipher to see what key pops out. If the probable word is correct, the first letter of the word would be the first letter to appear in the ciphertext, the second letter would be the second in the cipher, etc., making for a keyword that reads diagonally top left to bottom right. Howell uses “BEARER” as his probable word.

- YGFAT NZAQS CAAAX QSGGO EZAGP RYAXX
B ZHGBV OABRT DBBBY RTHHP FABHQ SZBYY
E -KJEX RDEUW GEEEB UWKKS IDEKT VCEAA
A --FAT NZAQS CAAAX QSGGO EZAGP RYAXX
R ---RK EQRHJ TRRRS HJXXF VQRXG IPROO
E ----X RDEUW GEEEB UWKKS IDEKT VCEAA
R ----- EQRHJ TRRRS HJXXF VQRXG IPROO

The advantage of this approach is that it’s easier than writing “BEARER” on a separate piece of paper and sliding it under the ciphertext and checking the key values individually. It’s a bit difficult to read, but starting at the 8th letter, “BUSTER” is spelled out diagonally, and this is the most promising key to try applying to the rest of the ciphertext, because it is the only thing in English here.

To make the key easier to read, delete the leading nulls for lines 2-6, and reformat:

ZHGBV OA B RT DBBBY RTHHP FABHQ SZBYY
KJEXR DE U WG EEEBU WKKSI DEKTV CEAA
FATNZ AQ S CA AAXQS GGOEZ AGPRY AXX
RKEQR HJ T RR RSHJX XFVQR XGIPR OO
XRDEU WG E EE BUWKK SIDEK TVCEA A
EQRHJ TR R RS HJXXF VQRXG IPROO

Reversing the approach, Howell writes out the ciphertext in rows of 7 letters each:

BUSTER?
-------
YGFATNZ
AQSCAAA
XQSGGOE
ZAGPRYA

And gets

BUSTER?
-------
YGFATNZ
dontle?
AQSCAAA
bearer?
XQSGGOE
eeanyd?
ZAGPRYA
cument?

Assuming the first line is “don’t let,” the key becomes “BUSTERS” and the message is: “don’t let bearer see any documents.”

Why is this useful? Well, with Gronsfeld, the key is made up of the digits 0-9, and the Judge in the story thinks that the message contains either the name of the suspect, or the name of the sender of the message. Verne makes the wrong assumption that the suspect’s name (Dacosta) is only allowed to appear at the very beginning, or the very end of the text, and the author’s name (Ortega) is not revealed to the readers until near the end of the story. If he had used Howell’s method on the probable word “Dacosta”, the idea would be to accept only diagonal numbers where the difference between the word and the cipher text is between 0 and 9. (To get started, P – D = 16 – 4 = 12. This is too big and gets ignored. Also, because the alphabet wraps at “Z” – Z + 3 = C – negative numbers between -1 and -9 are also allowed, with the sign removed.)

- PHYJSLYDDQFDZXG
D -4-6-8-00-20--3
A -------44-54--7
C ---7-8-11-31--4
O ----4----2---9-
S ------6-----75-
T ------5-----64-
A -------44-54--7

Pulling out the leading nulls for lines 2-6 shows that none of the columns consist of only 0-9.

-4-6-8-00-20--3
------44-54--7
-7-8-11-31--4
-4----2---9-
--6-----75-
-5-----64-
-44-54--7

There are two things to note here – first is that there is no diagonal line that is made up only of the digits 0-9. Second, we’re only trying this example on a very small part of the beginning of the cipher. If we (and Verne) were to continue in this way through the entire cipher message, we’d crack the key a little more than halfway in. Using a computer, this would be trivial, but it is kind of easy to make a manual mistake here, or to overlook the correct diagonal string.

This is where Friedman’s attack on the key length makes more sense (and in fact, Howell also advocates attacking the key length on longer ciphertexts). We look for 2-, 3- and/or 4-character groupings that appear more than once in the message.

Phyjslyddqfdzxgasgzzqqehxgkfndrxujugiocytdxvksbxhhuypohdvyrymhuhpuydkjoxphetozsletnpmvffovpdpajxhyynojyggaymeqynfuqlnmvlyfgsuzmqiztlbqgyugsqeubvnreredgruzblrmxyuhqhpzdrrgcrohepqxufivvrplphonthvddqfhqsntzhhhnfepmqkyuuexktogzgkyuumfvijdqdpzjqsykrplxhxqrymvklohhhotozvdksppsuvjhd

Friedman focuses only on 3- and 4-letter groupings:
DDQF (twice, 186 letters apart)
KYUU (twice, 12 letters apart)
HHH (twice, 54 letters apart)
RYM (twice, 192 letters apart)
RPL (twice, 60 letters apart)
TOZ (twice, 186 letters apart)

To be pedantic, what we’re looking for are the factors of these spacings that are common between all groupings. We have to keep in mind, though, that there may be “accidental hits” that are purely coincidental and must be ignored. Note that DDQF and TOZ are both 186 letters apart, so we only need to consider 186 once.

186 – 2, 3, 6, 31, 62, 93
12 – 2, 3, 4, 6
54 – 2, 3, 6, 9, 18, 27
192 – 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, 64, 96
60 – 2, 3, 4, 5, 6, 10, 12, 15, 20, 30

Of all these factors, only 2, 3 and 6 are common across all groupings (i.e. – there are no accidental hits). But, keys of lengths 2 or 3 are too short to be secure, so we can assume that the key length must be 6 digits long.

The next step then is to group the cipher text, to effectively create 6 alphabets. That is, we put letters 1, 7, 13, 19… together for the first row. 2, 8, 14, 20… for the second row. 3, 9, 15, 21… for the third row, etc.

PYZZXRIXHHMYPSMPHYEQYMBSNGRQREIPVQHMEZMQSXMHVS
HDXZGXOVHDHDHLVDYGQLFQQQRRMHGPVHDSHQXGFDYHNHDU
YDGQKUCKUVUKEEFPYGYNGIGEEUXPCQVODNNKKKVPKXKOKV
JQAQFJYSYYHJTTFANANMSZYURZYZRXRNQTFYTYIZRQLTSJ
SFSENUTBPRPOONOJOYFVUTUBEBUDOUPTFZEUOUJJPROOPH
LDGHDGDXOYUXZPVXJMULZLGVDLHRHFLHHHPUGUDQLYHZPD

The purpose of this step is to simply get the letter frequencies of each grouping together. But, rather than sorting the letters from most frequent to least, we want to keep them in alphabetical order for the French alphabet. The reason for this is that the Vigenere and Gronsfeld ciphers are simple Caesar shift ciphers. That is, each grouping is just slid a fixed number of letters to the left. So, if we match up the distributions for each group one at a time against the normal plaintext distributions, we can immediately determine how much each group was shifted, which gives us the key. (Keep in mind there’s no “W” in the French alphabet.)

This blog entry is getting too long, so I’ll just show the example for group 2, using the plaintext distribution that Friedman provides for a typical 50-letter message.

It’s pretty clear that group 2 is shifted three positions to the right, giving us a key of _3____.

Doing the same thing for the other groupings, if you do check out Friedman’s article, gives you the key 432513.

Going back to Verne’s story, once the Judge learns that the name of the author of the letter is “Ortega,” he tries applying it to the first 6 letters of the ciphertext, then the last 6.

PHYJSL
ORTEGA
1-45--

SUVJHD
ORTEGA
432513

Presuming that Ortega signed the letter at the end, and that he now has the key, the Judge proceeds to decipher the entire letter, and Dacosta is saved at the final minute. Friedman’s contention is that if Jules Verne really did understand the cipher he’d written his story around, the story would have been at least 50% shorter.

To be honest, I have difficulty in remembering that Vigenere (and the Gronsfeld variant) are just simple shifts of the entire alphabet when you’re dealing with short keys. Friedman’s demonstration of how to attack the key and then obtain the amount of shift for each alphabet makes things a lot easier for me to understand. But, the point of Howell’s article is that if the ciphertext is short, then you have no choice but to use the probable word approach.

Summary:
1) Gronsfeld uses the digits 0-9 for the key.
2) It’s considered a bit more difficult because the key is not a human-readable word.
3) Because it only uses 10 alphabets, Gronsfeld is actually easier to break if you find the key length.
4) The “probable word” approach demonstrated by Howell works for both Vigenere and Gronsfeld when you have shorter ciphertext messages.
5) Once you have the key length, comparing the letter frequencies of each alphabet group to the letter distributions of the plaintext language will help in giving you the shift value for each group.
6) In cases where a given group doesn’t have a clear letter distribution, you can try applying as much of the key as you already have to the ciphertext in order to guess at different words in the plaintext, and get the rest of the key that way.
7) Signal Corps bulletins rule!

Wiggles, part 14


Just a little ongoing story to give you something to play with until the next blog post.

“C’M KXQQW, KCQ. WXB’SF YPD AXX MBOY AX DQCNG. WXB IFAAFQ IF RXCNR NXV.” AYF DQBNG HBKA KNPQZFD PA MF PND DBR CN YCK YFFZK. CA VPK P LQCDPW NCRYA, PND AYF IPQ C’D LXBND VPKN’A PZZ AYPA UPOGFD, IBA CA VPK RFAACNR CNAX MCD-KBMMFQ PND AYF PCQ OXNDCACXNCNR DCDN’A VXQG. PZKX, AYF UZPOF VPK APOYC-NXMC (KAPND-PND-DQCNG), VYCOY YPD UFXUZF KAPNDCNR P ZCAAZF OZXKFQ AX FPOY XAYFQ AYPN BKBPZ, PND AYF YFPA, YBMCDCAW PND QFFG OXMICNFD AX OXMUFZ UFXUZF CNAX XQDFQCNR ZXAK XL IFFQ, PND AYFQFLXQF RFAACNR P ZXA DQBNGFQ AYPN VPK RXXD LXQ AYFM. XNF XL AYF OZCFNAFZF AYCK FSFNCNR VPK P KPZPQWMPN (P DFKG-IXBND VYCAF OXZZPQ UPUFQ UBKYFQ) CN YCK LXQACFK, AQWCNR AX IZXV XLL P ZXA XL KAFPM PLAFQ P IPD DPW XL VXQG. CN YCK OPKF, AYCK AXXG AYF LXQM XL P UCAOYFQ XL IFFQ XN PN FMUAW KAXMPOY, PND AYFN AQWCNR AX RQXUF XNF XL AYF WXBNRFQ XZK (XLLCOF ZPDCFK, IPKCOPZZW P MCNCMBM VPRF-ZFSFZ KFOQFAPQW KXMFVYFQF). “NPNC KXQF, IPGP WPQXB. GXGX VP NCYXN, NCYXN-RX VX CB!” (“VYPA VPK AYPA, WXB IPKAPQD. AYCK CK HPUPN, KUFPG HPUPNFKF!”) YF KNPQZFD, VYCZF AQWCNR AX GFFU YCK YFPD BU.

Board Cat Kit


I was on a business trip to Osaka recently, and when I arrived at the airport on the way back home, I had a few hours to kill. The gift shop in the main lobby had a number of little laser-cut plywood kits for sale, and I figured I might as well buy one and see what it’s like to build. There’s a large variety to the kits, from animals to musical instruments (a piano, cello and guitar) to big $60 units for making Himeji Castle and a 2′-tall Ferris wheel. While I was tempted to go big, I have no place to keep finished kits like that, so I settled on the Sitting Cat for 1,000 yen ($9 USD.) (I also had a bowl of ice cream for dinner.)

The kit comes in a flat envelope, which includes 2 sheets of thin, pre-cut plywood, and the instruction sheet. The instructions are pictorial only, but still pretty easy to follow. The pieces have to be punched out, and that was probably the most time-consuming part. They do stick in the main form, and you have to be careful because they will break. What may have helped the most might have been if I’d had a cutter knife, and just removed bits of the main form to make taking the pieces out easier.

The pieces are interlocking and force-fit, so you don’t need glue. This is also good since I had to take everything apart a few times because I got the pieces in the wrong sequence. I didn’t see a suggested assembly time, and I wasn’t really paying attention to when I started. I did have a specific deadline, in that I wanted to get past the security checkpoint shortly after check-in opened up (the airline here didn’t allow check-in until 90 minutes before boarding). Either way, I think I took 90 minutes total. If I ever make another one of these, I know I’ll be a lot faster now that I understand what I’m doing.

There’s a 1″ wide strip along the length of one of the sheets that contains an entire backup collection of small pieces that are most likely to break. This was a lifesaver, because one piece shattered as I was trying to punch it out, and a second piece broke as I was trying to push it into place on the main assembly. I needed those backups.

I waited until I got home to take the last two photos of the completed cat, because the lighting is better here than in the restaurant. Overall, it was fun, although a bit frustrating, to build. Next time, I’d want sandpaper to open up some of the notches to make the pieces fit together a little more smoothly.

Most cats are board. This one just doesn’t bother to hide it.

Thinking About Encryption, Part 17


I keep coming back to Vigenere, but the reason for it this time is that I was finally struck by a weakness in using the Running Key cipher. This is related to the use of a word or shorter phrase as the key for a plain Vigenere cipher. Generally, when experts say that Vigenere is unbreakable, they’re talking about using a random string of letters (or numbers 1-26 to represent letters) to create a key longer than the plaintext. But, this is impossible to remember. You need to record the random key on paper somewhere, and use it as a one-time pad.

The older approach to Vigenere was to use one long word, or a short phrase for the key, but this is vulnerable to cycle counting (looking for repeating strings of cipher text that have cycles that are factors of the key length). Instead, we can use the Running Key, which as mentioned in the last entry, is a text string taken from a book used as the source text. The advantage of this approach is that if your book is longer than your plaintext message, and you use a given key string once and only once, it acts like a one-time pad that is easier to remember and/or generate.

But, by using a key string made up of human-readable text, you add a level of predictability to the key that you don’t have with a random collection of characters, allowing someone else to attack the key instead of the ciphertext.

Say you have the following cipher, and you have reason to believe it’s a Vigenere. You run a frequency count on it and there’s no obvious substitution letter distribution (it doesn’t follow the ETOAIN SHURDCLU distribution) and you’re sure the plaintext was in English. There’s also no repeating collections of cipher characters you could use to determine key length. So, you guess that either the plaintext is too short for character collections to occur, or that the sender used a running key. But, there’s one more thing that you think may be likely – and that is that the plaintext might contain the words “mountain,” “submarine,” or “ocean.”

IOAKASTDERGPQQXHVQOLGAABAGRKWAFLEEKSMBOCFUCEYQJH

The idea now is that we apply each of these words, one at a time, across the full length of the ciphertext to see what we get out. And, just to play it safe, we try to decipher the text with these words as a partial key.

Now, just to save myself some work, none of the above three keywords appear in the key string, and the results of trying to decrypt the above cipher text just results in garbage.

For example, applying the key “MOUNTAIN” to the first 8 letters gives:
IOAKASTD – message
WAGYHSLQ – proposed plaintext

Running “MOUNTAIN,” “SUBMARINE,” and “OCEAN” along the entire key string gives us equally unreadable “plaintexts”.

Then, say we run “MOUNTAIN” along the ciphertext to see what keys could have generated this cipher.

Ciphertext – Reversed key
IOAKASTD – WAGYHSLQ
OAKASTDE – CMQNZTVR
AKASTDER – OWGFADWE
KASTDERG – YMYGKEJT
ASTDERGP – OEZQLRYC
STDERGPQ – GFJRYGHD
TDERGPQQ – HPKEKPID
DERGPQQX – RQXTWQIK
ERGPQQXH – SDMCXQPU
RGPQQXHV – FSVDXXZI
GPQQXHVQ – UBWDEHND
PQQXHVQO – DCWKOVIB
QQXHVQOL – ECDUCQGY
QXHVQOLG – EJNIXODT
XHVQOLGA – LTBDVLYN
HVQOLGAA – VHWBSGSN
VQOLGAAB – JCUYNASO
QOLGAABA – EARTHATN

This last bit is kind of promising, in that it looks more like English than any of the other strings do. So, maybe “mountain” does exist in the plaintext 17 characters in. If so, characters 17 to 25 can be skipped in future checks. We go to “ocean,” and character positions 1-16 result in garbage. Starting at position 26:

Ciphertext – Reversed key
GRKWA – SPGWN
RKWAF – DISAS

Everything else is garbage. Note that the hit here starts at position 27, leaving one letter of plaintext between “mountain” and “ocean.” We can reasonably guess that maybe the plaintext is actually “mountains”, so let’s test that again:

Ciphertext – Reversed key
QOLGAABAGS – EARTHATNO

QOLGAABAGSRKWAF – EARTHATNODISAS

We could stop here, but let’s push a bit farther for the sake of experiment. Switching to “submarine”, positions 1-16 still just give us garbage. Starting at 31,

Ciphertext – Reversed key
LEEKSMBOC – TKDYSVTBY
EEKSMBOCF – MKJGMKGPB
EKSMBOCFU – MQRABXUSQ
KSMBOCFUC – SYLPOLXHY
SMBOCFUCE – ASACCOMPA

If we put the pieces together again, we get:

QOLGAABAGSRKWAFLEEKSMBOCFUCE – EARTHATNODISAS____ASACCOMPA

Proposed plaintext (positions 17-43):
mountainsoceanXXXXsubmarine

Making kind of a leap, let’s guess that “XXXX” is “s and”:

LEEK – TERH

Gives us “EARTHATNODISASTERHASACCOMPA

Doing a google search on “that no disaster has” gives us the first paragraph of Mary Shelly’s “Frankenstein” – “You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings.”

Using this as the running key, we get the plaintext:
“KAGOSHIMA IS HOME TO MOUNTAINS, OCEANS, AND SUBMARINE LIFE.”

This may be kind of contrived, but it does show how vulnerable the running key cipher is to attacks on the key. If the cipher book used for the key is online, and you choose the right plaintext words, the book will eventually show up in an internet search and the cipher will fall. And, the longer the plaintext, the greater the vulnerability.

The easiest way to harden this cipher is to do a simple transposition. Break up the word patterns with Scytale or rail fence, or reverse every other line of the main text. You can use the word lengths of the first three or four words of the running key text for the transposition keys, if you like.

What’s interesting here is that the strength of Running Key is that it’s easy to remember and implement, yet it’s weakness is that because it’s easy to remember, it has a predictability and set of rules that allows it to be exploited. Which means, we either employ a convenient, broken system, or an inconvenient, impossible to break algorithm.

Which brings me to the concept of “randomness.” One of the most common themes that I’ve encountered regarding ciphers, electronics and software in general is that there’s really no such thing as “a truly random number generator,” except for maybe pure noise (i.e – a floating, detuned antenna), or a cosmic ray detector. This is particularly important for ciphers when we talk about one-time pads. If we look at the running key, we can see patterns. Grammatical patterns, word-level patterns, and letter-level patterns (repeating letter combinations, or combinations that never occur with normal English words, like “qurzctl”). To make Vigenere truly unbreakable we need a truly random key. However, software random number generators are actually “pseudo-random.” In the old days, they always started with the same numerical sequence, which could be repeated whenever you turned the computer on. You could shake things up a bit by using a “seed,” which was a selectable value for creating a sequence with a different starting point. But, unless the seed was the system time and date, the sequence could be replicable across PCs. Modern PC languages do use more “random” sequences for their generators, but there is still a possibility that the generator (a mathematical algorithm) can get into a predictable sequence.

Where this matters to us, is when we have a long plaintext message, and we’re generating a random Vigenere key string. If the generator has a repeatable pattern that takes the form of ASCII characters “A-Z,” then theortically, we can use that pattern sequence in the same way we did with “submarine” and “mountain” in breaking the running key cipher, by sliding the sequence (say, “ABBQRT”) across the ciphertext to attack the key. If human-readable text appears in our predicted plaintext, such as “defenestr,” we can try guessing at the rest of the plain text (“defenestration”) and work back and forth between attacking the plaintext and attacking the key.

Does this really matter? I’m not sure. My gut reaction is that software random number generators are random enough that even if you encounter a predictable character string, chaos theory will bite you eventually, in that if you don’t know the exact algorithm and the exact starting seed, the sequence will veer off into the unknown very quickly and you’ll be back where you started, unable to tell if you really did crack part of the key, or if you’re lying to yourself. On the other hand, I’m not an expert in random generator algorithms, and I don’t know what is, or isn’t, bad about them. When in doubt, randomize, randomize, randomize.

Finally, one other method for hardening Vigenere is to create a random distribution of letters for the first line of the Vigenere table (say, “BADCFEHGJILKNMPORQTSVUXWZY”) and Caesar-shift that by one letter to the left for each subsequent row. Example:

BADCFEHGJILKNMPORQTSVUXWZY
ADCFEHGJILKNMPORQTSVUXWZYB
DCFEHGJILKNMPORQTSVUXWZYBA
CFEHGJILKNMPORQTSVUXWZYBAD etc.

If you want to make this easier to remember, we have the old “pseudo-random” Caesar shift idea, where we take a keyword (such as JULY-AUGUST), remove duplicated letters (JULYAGST) and fill in the rest of the string starting from the last letter of the keyword, wrapping when we hit “Z”. Like:

JULYAGSTVWXZBCDEFHIKMNOPQR
ULYAGSTVWXZBCDEFHIKMNOPQRJ
LYAGSTVWXZBCDEFHIKMNOPQRJU
YAGSTVWXZBCDEFHIKMNOPQRJUL etc.

One other change we need to implement here is that the key no longer refers to the first letter of the line in the table, but rather to the line number (A=line 1, B=line 2, C=line 3) etc.

From what I understand. If I’m wrong or off-base, someone, please correct me.

Wiggles, part 13


Just a little ongoing story to give you something to play with until the next blog post.

DYFWU LN JYP HXODZN. WTWPUYFW L QFYM HXGPAWN RXWLP NODGZXY MXWFWTWP RXW HPWBLRN AWR KYM, GFB RXWF ONWN RXWLP ZXYFWN RY COU WTWPURXLFA GR RXW NXYZN. L DWGF, MW ONW YOP WKWHRPYFLH MGKKWRN JYP WTWF RXW DYNR RPLTLGK NROJJ, KLQW COULFA G 100 UWF ZGHQ YJ JOONWF AOD (COCCKW AOD). RXW NWHYFB L AWR NYDW HGNX, L POF RY RXW FWGPWNR QYFCLFL (HYFTWFLWFHW NRYPW) GFB RPGFNJWP LR RY RXW ZXYFW. L BYF’R RXLFQ RXWPW’N G ZKGHW LF IGZGF RXGR NWKKN NROJJ RY HONRYDWPN – G PWNRGOPGFR, G CGP, G QYFCLFL YP G BWZGPRDWFR NRYPW – RXGR LNF’R WSOLZZWB MLRX G NODGZXY PWGBWP. NY, FY, FY KOHQU CLKKN YP HYLFN NGJWKU XLBBWF LF DU NXYW. LJ L FWWB RY HGKK DYD, L’TW AYR DU ZXYFW, GFB NXW HYOKB ZGUZGK DW DYFWU LJ L FWWBWB LR. GFB LJ DU ZXYFW WTWP NRYZZWB MYPQLFA? XWU, RXLN LN IGZGF. FY YFW QWWZN WKWHRPYFLHN KYFA WFYOAX JYP LR RY NRYZ MYPQLFA. NXLF-XGRNOCGL, CGCU!

Thinking About Encryption, Part 16


In the last post I mentioned the Beale Ciphers (AKA: the Beale Papers). So, I was planning on getting into book ciphers this time. But, the more I thought about them, the more they bothered me.

By definition, a book cipher is one in which the “key” is some section of text from an agreed upon book. Now, I have suggested that the keys for Vigenere ciphers could come from book or movie titles, or specific paragraphs from the book, but this is more in keeping with running key ciphers (more about those below). Instead, book ciphers generally involve counting words or letters in the text of the book, and can be thought of as being one of two types.

In the first type, you count the words in the book, and use the words corresponding to the numbers to build the ciphertext. For example, in this blog entry, word 1 is “In,” 2 is “the”, 3 is “last,” etc. Therefore, if the ciphertext is

8 16 17 23 21

Then the message is “Beale was planning this book.”

This method is more accurately a code rather than a cipher, and the text becomes a code book. The greatest weakness here is that your book needs to contain the words you plan on using. That is, if you are going to write home about troop movements and your code book is “Healthy Southern Cooking,” you’re in trouble. The thing is, you have to select a book that you would reasonably be expected to carry, and if your cover is that you’re a financial planner, having “Jane’s Book of Tanks” might draw unwanted suspicion towards you.

This is where the second type of book cipher comes in. It is a true substitution cipher, and because it allows for multiple values of all letters in the alphabet, it is homophonic. This time, we either use the first letter of each word, or we count all the letters in the words (starting from some page and/or paragraph either from the front or back of the book). Using just the first letters of each word from this blog, starting from the top, the message

13 10 3 6 11 18 24

would read “palmtop”.

Interestingly, I don’t have any words in the first paragraph that start with “e,” “d” or “n,” so my initial plan to spell out “cipher,” and “plaid plan” didn’t work out. I would need a much larger block of working text to get 10-15 words that start with “e”; and “x” and “z” might never show up in the book at all. One work around for “x” would be to treat “ex” (expect, exact) as substitutes.

We can get around this last problem by counting all the letters (I = 1, n = 2, t = 3, h = 4, e = 5), but this gets ugly fast, as we get into the 10,000’s before reaching chapter 2 of the book. That’s really not a problem, though, as we could be fine just using the first 3-4 pages of our book. In Arthur Conan Doyle’s The Valley of Fear, Doyle has his ciphermessage start with the page number and column of the cipher book where you’re supposed to start counting. I view this as a weakness in this kind of cipher. It’s going to be a gimme for any authority member that has a copy of the message, and a list of all of the books you’re carrying. It also illustrates part of why I dislike book ciphers as a practical way to exchange messages on a regular basis.

37 103 2 7 59 39 139 = run away

They’re slow! They’re bulky! They’re error-prone!

You really need to double-check the letter distributions of the book first. In my example from this blog entry, “u” only shows up twice, and “y” doesn’t appear until the end of the text. If the plan is to have multiple instances of the most common letters, your book needs to allow for that. But, that can be someone else’s problem since HQ or your handler may be the ones picking the book for you.

Let’s say that you’re a spy, and you’re in some other country. If you get searched, you don’t want to have a blatant code book on you, and whatever book you do have should be in keeping with your persona. But, what if the authorities look in the book? If there are numbers over each of the letters, or at the beginning of each word, that’s going to look suspicious. And, if you wrote up a table with the alphabet and strings of numbers representing each letter to save you the effort of numbering all the words every time you want to send a message, that’s going to be equally suspicious. This means that whenever you want to send an encrypted message back home, you’re going to have to recount all the letters in your book over and over again. It’s going to be easy to make a mistake, and you’ll spend a lot of time recounting the letters to make sure you didn’t slip up. You don’t want to do this all the time, and you certainly wouldn’t want to rely on a book cipher in an emergency when you only have a few minutes to write up your ciphertext before making an escape.

So, book ciphers for sending messages back home are a pain. But, it wouldn’t be so bad if HQ used the book cipher for sending instructions TO you. If you only receive messages occasionally, deciphering them wouldn’t be so bad and HQ could use software to do the letter counts to ensure that the numbers are correct. And, in this situation, HQ could assign different books to different agents, and keep the titles for each book in a file somewhere. In fact, it would be trivial to index each book once, build up a homophonic table for the entire alphabet for each book, and just use the correct table when encrypting messages to send to any given agent. Life would be easy for HQ, just not for the people in the field.

There’s another question here, though. If the idea is to avoid arousing the suspicions of the authorities by carrying an innocuous book with you, why are you expecting to get searched? Yes, there may be routine checkpoints where you may have to hand over your passport, or your papers, but having all of your belongings checked probably means that they suspect you already. In which case, you can expect that they’re going to record the titles of the books in your possession. But, that’s really not going to help anyone very much unless they’ve managed to intercept an encrypted message or five. Meaning that if they have a message, and they have you, any book you’re carrying will be used to try cracking said message. And, if it that works, and they still let you go, you know that they’re planning on encrypting their own messages to send to your HQ, too.

If we look at the wiki entry on book codes, we can see that they’re very popular in fiction. In real life, there’s a mention of Benedict Arnold’s Arnold cipher, and there’s something called Cicada 3301, which I’ve never heard of before. And, of course, there’s the Beale Papers.

One of the more common examples for books for use with the book cipher is the Bible, because it’s easily found anywhere, and Christians can be expected to carry a copy with them. Naturally, if the enemy intercepts a ciphertext that has page and paragraph numbers on it that kind of look like Bible passages, the first book they’re going to check is the Bible. But, this does suggest a solution to the above issue of the authorities looking at your belongings. If you don’t need the book for enciphering or deciphering messages every day, then don’t keep a copy with you. Pick a book that you can buy at any bookstore when you need it. Or, use the Gideon’s Bible left in the desk of your hotel room.

With a Bible cipher, you can use passage numbers to narrow down the text to a specific passage, and just count the letters in that passage. Another option is to use the page number, the line number on that page, and then the letter count value (i.e. – 3/14/20 = page 3, line 14, 20th letter in from the left). But, this weakens the cipher somewhat by making it look more like a book cipher.

If you haven’t heard of the Beale Papers, you can check the Wiki entry. Letter #2 is the only one that’s ever been publicly decrypted, and that used the Declaration of Independence. Even so, there are errors, and certain adjustments that need to be made to do the decryption. Can you imagine the amount of work necessary for counting three separate documents, up to 1000 words each, if letters #1 and #3 really do use different source books? Why was the Declaration so easily obtained by Beale, when the other two letters apparently use cipher keys that don’t involve any other book or document that could reasonably be expected at the time to be found at a bookstore, library, or in the possession of the innkeeper where Beale stayed? I’m leaning toward the theory that the Beale Papers are a hoax perpetrated to sell the pamphlets claiming to contain instructions for where to find the “Beale treasure.”

Additionally, if you look at the treasure with a critical eye, do you really think anyone in Beale’s party would have allowed him to leave by himself with entire wagons filled with gold and jewels? Through native American territory? I wouldn’t have. I would have stuck to those wagons like glue all the way back to Tennessee, probably along with everyone else in that mining party. And then I would have cashed in my share the second I got to a big enough city.

Anyway, book ciphers. If I were to implement a book cipher today, I’d make the book something available for download online. I’d put 30-40 ebooks on a tablet, and have a simple app to do the word counts for me and generate the letter table automatically. If necessary, I’d delete the table when I was done, and I’d have a system where different books would be used as the key based on the day of the month, or have the book title somehow incorporated in the ciphertext so that the message would tell you which book to use. Alternately, selecting books from copyright-free hosting sites would also work. But, maybe I wouldn’t pick “The Dancing Men,” “The Gold-Bug” or Doyle’s “The Valley of Fear” (which revolves around a book cipher using Whitaker’s Almanac).

I’ll mention here that book ciphers are often referred to on the net, and in fiction, as Ottendorf ciphers. The wiki talk page for book ciphers suggests that the name may come from Major Nicholas Dietrich, Baron de Ottendorf, “a German mercenary at the time of the American Revolution.” He worked for Major Andre, when Andre was negotiating with Benedict Arnold (using the Arnold cipher) in the failed attempt to surrender West Point to the British in 1780. Andre, and his superior, General Henry Clinton, were supposedly known to use book ciphers, but there’s no clear explanation for why they’re named after Ottendorf.

Specifically, Ottendorf ciphers consist of number triplets separated by hyphens (i.e. – 5-12-3). The first number is the line number for the page used for the cipher. The second value is the word in that line, and the third is the number of the desired letter in that word. The first number in the cipher can be the page used for that message.

Ottendorf is used in the film National Treasure, while the keys are Benjamin Franklin’s Silence Dogood essays. I have to admit that I haven’t seen the movie, and all I can find online is a clip on youtube showing the numbers being transcribed from the back of the Declaration of Independence, and a reference to the Dogood essays. The problem is in finding out which essay was used, and getting a photocopy of it (the online transcriptions of the letters are not formatted properly, and they don’t use Franklin’s original spellings.) At the moment, I don’t know if the movie shows a real cipher, or just fakes it.

The first few triplets are:
10-11-8 10-4-7 9-2-2 14-8-2 18-7-4 13-10-4 1-5-1 5-8-1

This supposedly corresponds to “The vision to see the treasured past, comes as the timely shadow crosses in front of the house of Pass and Stow”, but I can’t recreate that using what I have. The problem with Ottendorf in the computer age is that you need to use a source text where the line breaks don’t change when you resize the browser. That is, you really need a paper copy of the book or source text, or a scan thereof.

Now, for the Running Key Cipher. In fact this IS the Vigenere cipher, just using text from some desired book to create really long keys. And, this is just an extension of what I was alluding to in my paint mixer algorithm. (Note that by using a book key, RK is actually much, much easier to break than the Vigenere cipher is. See the next blog entry.)

To use the Running Key Cipher, pick a book, any book. You then want to tell your recipient what page and line number to start from. This way, you can reuse this book for as many messages you want, and just change the starting page and line numbers. In the key, you need to remove all spaces and punctuation, but retain letter repetitions.

I’m going to use Verne’s Journey to the Center of the Earth as my book, but because I’m using the online version at etc.usf.edu, I’m saying that Chapter 1 starts with page 1. I’ll then start with line 4, including the blank line in the line count. My cipher key will start with “Marthamusthaveconcludedthatshewasverymuchbehind”, and can be made as long as my plaintext. And my plain text is, “Cheese burgers are the cheesiest burgers I know.”

Marthamusthaveconcludedthatshewasverymuc
CheeseburgersarethecheesiestburgersIknow
———————————
OHVXZENOJZLRNETSGJPWKIHLPELLIYNGWMWZIZIY

We break this string up into groups of 5 (or whatever), adding padding as necessary:

OHVXZ ENOJZ LRNET SGJPW KIHLP ELLIY NGWMW ZIZIY

Finally, starting with A=0, we encode the 3-digit page number, and 2 digit line number. For page 1, line 4, this becomes AABAC. While the wiki entry suggests putting our encoded number in the second to the last grouping of the cipher text, like:

OHVXZ ENOJZ LRNET SGJPW KIHLP ELLIY NGWMW AABAC ZIZIY

we can actually use any method we like, as long as both sender and recipient agree to it beforehand. Personally, reversing the key (CABAA) and putting it at the beginning of the first five groups as follows, seems a little less obvious, assuming whoever intercepts our message also knows to read the wiki article.

COHVX AZENO BJZLR ANETS AGJPW KIHLP ELLIY NGWMW ZIZIY

Other variants of the Running Key cipher have already been covered in previous blog entries, such as using the running key for a binary substitution, XORing the individual bits of the plaintext with the individual bits of the ciphertext.

The wiki article also talks about “Ciphertext appearing to be plaintext.” I like this concept, because it makes the cipher look less like a cipher. The idea here is the classic military “Alpha Bravo Charley” substitution of words for individual letters. If you have a dictionary and a random number generator, the above text could turn into something like:

“Clearly other humans viewed Xerox as zero-entried neutral objects.” etc.

At a cursory glance, this “cipher as plaintext” just looks like a badly-written technical paper. It does NOT look like a secret message, and this makes the message somewhat less likely to be confiscated during a casual body search.

Summary:

1) Book codes use books as the code book, substituting location numbers for full words.
2) Book codes require the book to contain the words you plan on including in your plaintext, which restricts the kinds of books you can use.
3) Book ciphers substitute word or letter location numbers for the plaintext on a letter-by-letter basis.
4) Numbering can start from the beginning or end of the book, from any given page or line.
5) If your job includes the occasional body search, you can’t have the numbers written in the book, or tables of numbers on a sheet of paper. So, you have to recount all the numbers every time you need to encipher a message.
6) Book ciphers are time-consuming, and prone to miscounts.
7) They’re better for messages from HQ to operative, rather than the other way around.
8) They’re popular in fiction.
9) Pick ebooks, and have multiple texts on a tablet, plus indexing software.
10) Book ciphers are very hard to break if you use obscure books that can be easily downloaded from the net when needed.

11) Running Key Ciphers can use a Vigenere table, and then really long keys from a book.
12) RK ciphers are NOT book ciphers.
13) Unlike Vigenere, RK can be easy to break (see next blog entry).
14) The page and line numbers are encoded into the ciphertext.
15) All cipher types that consist of letter substitutions (“A-Z”, “a-z”) can employ “ciphertext as plaintext,” where individual cipher letters are replaced by randomly chosen whole words starting with that letter (i.e. – “a” = “alpha,” “a,” “at,” “Arnold.”)