The Code Book – Cipher 7, part 4

The computer fan is noisy, so I break the program, make note of the last permutation calculated, turn the laptop off and go to bed. The next morning, I get ready for work, restart the script where I left off and leave. I return in the afternoon, and I have a few more files stored with 5 hits on 3-letter strings, but the distributions are all wrong. That is, each file has one string that appears 5 times in the permuted ciphertext, but that’s it. There should be several strings in the list at 5 hits, not just the one. A couple of the permutations have one string that appears 3 times, and between 5-10 strings that occur twice each. And none of the 5- or 3- letter strings are bracketed by anything that could be treated as a space character.

The script still isn’t finished, and I’m falling back on the one remaining possibility – I’m doing something wrong with the ciphertext and corrupting the text I’m trying to permute. I’m thinking harder about printing out the script and the ciphertext and sitting down and working everything out by hand, from scratch.

But, I still have two things working for me. First, I’ve saved all my permutation frequency counting results files, so I don’t need to redo them again on the small chance that I’ve done it right. Second, I’ve had a sudden flash of insight on a weakness in this particular cipher regarding something that I really should have seen before.

100 years ago, when this cipher was first used, the idea of running all 40,000+ permutations of a long ciphertext by hand would have been unthinkable. With a modern computer, depending on exactly what it is you’re doing, it may take as little as 2 hours to complete a moderately complex task.

So… I talked about character frequency counts before, and the German alphabet frequency table. When I first ran my frequency counts, I was just looking at one pair of 2 characters at a time to see which sets of individual pairs to pick prior to matching them all together. That is, do columns 1 and 2 potentially look better together than columns 1 and 3 or 2 and 3? etc.

However, say I just pick the column order as 12345678, pairing up 1 and 2, 3 and 4, 5 and 6, and 7 and 8, build up my cipher string this way and count the frequencies of each pairing (i.e. “MU”, “UP”, “CU” and so on). Then I do the same thing for 21345678 (for “UM”, “UP” and “CU”), all the way up to 87654321. You may say, that’s a lot of work that just duplicates what I’ve done before without adding anything new.

And you would be right. But, let’s pretend that the true pairings are 25, 16, 74 and 83. But, I don’t know this yet. And let’s say that the plaintext is really “I eat codes.” Remembering that the keyword is 8 ciphertext characters long, the scrambling is going to actually be for every 4 characters. Even though the words I make will be all jumbled up if I get the order wrong (“I ea”, “t co”, “des.” Or, “aI e”, “ot c” “.des”. Or, “eaI “, “cot “, “”) what WON’T change are the frequency counts of those letters.

In other words, if I run all 40,000+ permutations, and store the frequency counts to a file, somewhere in there will be 24 permutations of 25, 16, 74 and 83 (eg. 16257483, 74258316 and 83162574) which will ALL HAVE THE EXACT SAME FREQUENCY COUNTS.

All I need to do is write one more script to find the best matches (technically, there should be two sets of 24 permutations, because the flipped pairs 5, 61, 47 and 38 will also qualify). I do this thing. I run my script. And I get 1000+ output files. Why? Because I overlooked the fact that each batch of “wrong” permutations will also be the result of 24 pairings of the characters that show up on each set of rows. All I’ve done is race down another deadend. (Although, I could have actually done the frequency counting and kept only the permutations with the best distribution patterns.)

Sob. Sob. Yeah, cry me a river. I’m ready to ditch it all and go buy another Zelda game for my 3DS, but there remains only the last possibility that I’ve been putting off facing. That I did something wrong in putting the ciphertext in the table format. But… There’s one last last-ditch move I can make first. I’ve got access to the Swedish team’s notes, and they’ve got the deciphered final paragraph. That gives me the spacing between words. Assuming that the substitution table includes (and uses) spaces between words consistently, then I can do a character count for the first 10 words or so of the final plaintext paragraph and then use that to go through my ciphertext to search for any character that occurs at identical intervals. So, I write up another script specifically for this purpose, embed the permutation generator, and let it run all 40,000+ variants of the ciphertext.

No hits. Final failure of the sanity checks.

So, I go back to the Singh book and read the appendix. I pull out a pencil and notebook, and I very carefully follow each step. Drumroll – yeah, I’m kicking myself in the head now.

But… you know that old adage from Thomas Edison? “I didn’t fail. I just proved 10,000 ways that something doesn’t work.” (paraphrased.) Yes, I now know what a bad decrypting approach looks like. All the curves, all the frequency counts come out more or less linear. That’s good to know. Plus, I now have all my scripts pre-written, for frequency counting, 3-character word counting, finding bracketing characters, and the permutator.

All I need to do is change the section where I build up the transformation table, and that’s really easy. But now, I’m tired and grouchy, so I skip character counting and just jump right in on looking for three-character groupings and bracketing, brute-force. This takes about 5 seconds, so I know something’s wrong. That is, one of the very first permutations, 23456781, comes back with 65 hits on one three-character group, with brackets (i.e. – “pCy9p”) in the first 25% of the ciphertext string (I’m checking only the first 881 characters again, for speed). I look in the output file, and there’s lots of bracketing, and the distribution on the three-character strings is looking more reasonable. But still, the transposition key is too simple. So I run the script again and let it finish a few hours later. And sure enough, there are only two column permutations with really high 3-character counts, and they’re mirror reflections of each other.

I take this as a sign, but just to be sure, I run the space interval checker again. It hits on 23456781, and 15 other variants (“23456781”, “23457618”, “23457681”, etc.) I plug a couple bracket characters into the substitution table, and can’t really tell if the word breaks make sense or not in German. But, there’s one thing I do know, and that is German has the words “ein”, “eine” and “einer”. And I can do search and replaces on the ciphertext to run another sanity check. At this point, I’m using the key “23457618” with a semi-randomly generated look-up table (I just wrote out the characters A-Z, a-z, 0-9, . and *). All that remains is to start putting the correct characters in the correct places in the table.

It takes me a while to realize that certain words are coming out wrong (I’m using an online German dictionary at this point). A couple of the letters are doubling up (e.g. – I=n and o=n). That’s when I realize that, yeah, it’s still possible to have the 2-column pairings made right, and still have the pairs themselves out of order (“12345678” versus “12563478”). There’s one word I know is in the plaintext – “maschin”. I compile all 16 permutations of the ciphertext key, and then do a find in notepad on “maschin”. That locates the one column permutation that I want, which again turns out to be “23456781”. And then it’s just a matter of interactively displaying the text and plugging in values for the substitution table.

However, I don’t know German, and there are still a number of words that are incomplete that aren’t included in the Swedish team’s notes. I copy a large enough section of the plaintext that I do have done, and do a yahoo search. That’s when I come up with 20+ powerpoint presentations on the net for people that have also cracked this code, and have much more complete explanations for their work. Sigh. Anyway, I’m keeping my scripts. Who knows. They may turn out to be useful someday. In the meantime, there are only 3 ciphers from the Code Book contest that I haven’t tackled – one on the Enigma machine, and two for RSA encryption. I have no interest in RSA, and I’m too tired to tackle Enigma at this time. Maybe later in the Fall, or something.

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: