Thinking About Encryption, Part 9


There’s a concept in mathematics that revolves around changing domains, or changing systems. By this, I mean that you can have an equation that you map in a Cartesian coordinate system, which is essentially the x-y grid, and the equation can be in the form of y = a*x + b. Quite often, though, the problem you want to solve may not be that easy in a Cartesian system, but it could be trivial when using polar coordinates, which are represented by a radius and an angle. Or in vector space, with a magnitude and a direction. To go from one system to the other, you need a conversion formula. An alternative example could be using the Fourier transform to switch between the time and frequency domains.

The question then becomes, are there “systems” in cryptography? Is there a moral equivalent to Cartesian and polar coordinates? In a gross sense, I think we could say that are two types. The first is on the message level, and the other is on the media level.

For anyone new to ciphers, it would seem that enciphering is either a matter of moving the plaintext letters around (transposition), or changing them to something else (substitution). I’ve only really addressed rail fence, Scytale, Caesar and Vigenere, and if those were all the algorithms anyone had ever come up with, we could be forgiven in thinking that the point is to just manipulate individual letters.

But, look at the English language. We have phonemes, letters, syllables, words, sentences and paragraphs. And, on a meta level, we have nouns, verbs, adjectives, adverbs and everything else. On a message level, we could (and many other people already have) develop algorithms that encrypt plaintexts on a non-letter basis. It’s a bit tricky trying to define phonemes for individual words if the sender and the receiver have different accents. You’d need to go with the dictionary definitions or fixed 2- and 3-letter combinations, but it is still possible to develop a random substitution table for phonemes, either to other phonemes, or to numbers. As an example, “cram” would turn into “‘kram”, and “through” would become “thru” (where “u” has the umlauts). The Dyslexia Reading Well lists 44 phonemes in the English language, so we could either code them as upper or lower letters (the 26 upper plus 26 lower letters, which would allow for some duplications), or simply as the numbers 1-46 (if you don’t use delimiters, then 1-9 would need leading “0” padding). If you want to get nasty, use hexadecimal pairs (00-FF), which would give you 0-255 potential assignments.

Syllable substitutions would work pretty much the same way. The Quora page answering the question “How many syllables are there in English” gives two different answers: over 10,000, and exactly 15,831. Given that, to do the encryption, you’d need an agreed upon substitution table, you can simply pick the syllables, or letter groupings, you care about, and use those. In this case, it would probably  be easier to use number values. That is, “gi” = 1, “ven” = 2, “tt” = 3, “le” = 4, “ers” = 5. “Given letters” would then be 1 2 4 3 5. Any text not encrypted at the syllable level could be treated as individual letters and given numeric values like 100-200, if you like.

Ok, when we come to words, or word groupings, it might seem that we should give up. Merriam-Webster had 470,000 entries in their 1993 Addenda edition, and there’s the question of what’s an English word? Do we include “teriyaki,” “sushi,” and “cannoli”? But, what if we mix systems? Pick the 100 most common words and give them number values. (I.e. – “the” = 1, 6 and 9; “if” = 2, 10 and 20; “may not” = 7 and 8). Everything else can be represented as letters, phonemes and syllables.

Technically, there’s no point in trying to substitute entire sentences or paragraphs, but there’s no reason we can’t transpose them. We could resort to Scytale and rail fence on a sentence-by-sentence, or paragraph-by-paragrah basis. Alternatively, the substitution table we use above could change based on a key from one sentence or paragraph to the next (hello again, Vigenere).

For the second system, the “media,” we have the computer. As mentioned above, it can be easy for a beginner to get trapped in the idea that a plaintext message is simply whatever you have written down on paper, or typed with a typewriter, as a string of letters, digits and symbols. Even if we make the transition to a computer, we see these characters as represented by the ASCII table. “A” = decimal 65, hex 41; “Z” = 90 decimal, hex 5A. In fact we are performing a random substitution all the time when we type messages up on a computer, because 54 68 65 = “The”. Applying a simple Caesar cipher with a shift of one gives use 55 69 66, or “Uif”. Cracking a random substitution cipher can be done as easily for “Uif” as it is for 55 69 66, through the application of frequency analysis.

The mistake is to not change number systems. ASCII is based on 8-bit binary. 00000000 = 0, and 11111111 = 255 (if we use hex, then 00-FF). General ASCII just uses 0-127 (00-7F), while the upper half of the range (80-FF) can be any graphics characters you like.  This is fine for the English alphabet, but languages like Chinese and Japanese use non-phonetic pictogram systems that can easily require 2,000-5,000 characters for high school-level literacy. Which is where Unicode comes in. One of the most commonly-used Unicode versions is UTF-16, which is a 16-bit variable-width encoding system. Unicode is important to understand if you’re dealing with “letter level” substitutions as described above. Otherwise, at the bit level, we don’t care.

This next part is a bit off-topic, but the second we go binary, we can start talking about filename extensions. .doc indicates that the file is a Word document; .jpg is a JPEG image file, and .mov is a video file that includes audio. Up to this point, I’ve only been concerned with .txt, or ASCII textfiles. But these extensions just represent format conventions for data files used by specific applications. Ignore the application that created the file and now you just have an arbitrarily long string of 1’s and 0’s in sequence. Who cares what the “plaintext” file was? Just move the bits around and the application (or the person trying to access the data in the encrypted file through that application) can’t read it.

To do this, we still have substitution and transposition. The most obvious choices would be Scytale or rail fence, but we can also use swapped pairs.

“The” was 65 41 90, which is also
0110 0101 0100 0001 1001 0000

Swapping pairs of bits, we get
1001 1010 1000 0010 0110 0000 (9A 82 60)

Both 9A and 82 are outside the regular ASCII character range, and will appear as strange graphics shapes. Unfortunately, this is no different than a regular random substitution, and will fall to frequency analysis.

Scytale with a width of 5 gives us,
01100
10101
00000
11001
00000 (and adding 0 for padding)

Or,
01010 10010 11000 00000 01010

Going back to hex pairs, with more 0 padding:
01010100 54
10110000 B0
00000101 03
00000000 00

But, this is where things get a tricky. By adding padding, we’re kind of giving away the width value. Because computers use widths of 2, 4, 8, 16, 32, 64, etc., if the ciphertext is not a multiple of one of those numbers, we can assume padding was added to bring the number of bits up to a multiple of the cipher width. The ciphertext is 25 bits long, and that’s kind of an obvious clue. Next, though, we need to put our ciphertext back into a format the computer can recognize, and that means padding up to 32 bits. We can do that easily by appending 0’s at the beginning or end of the string, but how is our recipient going to know how much padding we’ve added?

Actually, having the key width will help. If the ciphertext is 32 bits long, but the key width is 5, then our recipient can argue as follows: The plaintext length is going to be a multiple of 8 bits, giving me 8, 16, 24 and 32 bits as options. If the plaintext was 8 bits long, then the Scytale portion needs to be padded to 10 bits. For 16, padded to 20. For 24, padded to 25, and for 32, padded to 35. 35 bits is right out, and the closest number to the current 32 bits of the ciphertext is 25. Obviously, there’s 7 bits of padding, either at the front or the back, and then 1 more bit for the Scytale portion. Now, anyone trying to crack this cipher can use the same logic, but they have to apply a bruteforce approach to the ciphertext for every value from 2 to length/2. And, naturally we can choose to reverse some of the columns of the Scytale following whatever rules we want, as well as doing that old “keyword” -> “dekorwy” column shuffling I’ve talked about before for Scytale.

That was transposition. For substitution, we can flip bits:

“The” was 65 41 90, which is also
0110 0101 0100 0001 1001 0000

Flipping 1 to 0, and 0 to 1:

1001 1010 1011 1110 0110 1111 (9A BE 6F)

Again, two of the characters will look like random graphics. And again, this will fall to frequency analysis because it still looks like a random substitution.

Instead, we can exclusive or (XOR) the plaintext against a key. If the key is the same length as the text, say CAT (43 41 54), and we know that the rule for XOR is that 0+1 and 1+0 = 1, otherwise 0+0 and 1+1 = 0, then

0110 0101 0100 0001 1001 0000 (The)
0100 0011 0100 0001 0101 0100 (CAT)
0010 0110 0000 0000 1100 0100 (36 00 B4)

A weakness here is that if we use only human-readable keywords (anything using A-Z, a-z, 0-9 and punctuation), then our keys won’t change the high bit (bit 8), and won’t include values 0-31, which are “non-printable” characters. We can overcome this by using simple hex values for the keys (00-FF) and not worry about trying to make words out of the keys.

And of course, we can combine Scytale with XOR as we like. This brings me back to the idea of making the length of the XOR keyword into the width of the Scytale or rail fence transposition, and reversing the order of the transposition columns if the length is even or odd.

One more complication can be the decision to make the ciphertext human readable. In other words, we can pretend that the resulting output is either ASCII or Unicode. It’s easier to pretend we’re dealing with ASCII. To do this we group the bits in bunches of 7 and prepend a 0 at the front to make 8-bit groupings. For the example, let’s go back to the bit flip,

1001 1010 1011 1110 0110 1111 (9A BE 6F)

We need groups of 7 bits,
1001101 0101111 1001101 1110000

Then we prepend the 0’s to get
01001101 00101111 01001101 01110000
4D 2F 4D 70 = “M/Mp”

The real drawback to this type of algorithm of making 7-bit groupings and prepending the 0 is just that we’re making the file roughly 12.5% larger, which is going to affect file transfer times. But, if the time for sending the file is not an issue, then we’re still good to go. The real reason for doing any of this is that it makes the bruteforce approach to cracking the cipher that much more time consuming. If our opponent has no idea exactly which algorithm we’re using, they have to test every single one, in all conceivable combinations.

Theoretically speaking, of course.

I’m not really sure how well my original point holds up that encryption/decryption has an analogue to coordinate switching in math. But it serves as an introduction to this blog entry well enough.

Advertisements

Wiggles, part 6


Just a little ongoing story to give you something to play with until the next blog post.

E VTX T NOD OP PWYY DERY EZ DVY KZTMB ATW, DVOJFV, TK GYNN TK OZ-ATKY. T MOJCNY OP RH XTX’K PWEYZXK VTX T ATZX, TZX DVYH DTJFVD RY DO CNTH FJEDTW. TZX, RH ROR GTZDYX RY DO NYTWZ YZFNEKV, KO KVY KYZD RY DO T MOJCNY MWTR KMVOONK. DVY POWYEFZ DYTMVYWK TD DVOKY KMVOONK JKJTNNH UJKD KDTHYX EZ UTCTZ POW 6 ROZDVK DO T HYTW TZX T VTNP, TZX DVYH MOJNXZ’D AY AODVYWYX DO AWEZF TNN DVYEW AYNOZFK GEDV DVYR GVYZ DVYH GYZD ATMB VORY TD DVY YZXK OP DVYEW KDTHK VYWY. DVTD RYTZD DVTD DVYH’X UJKD XJRC GVTDYQYW DVYH XEXZ’D GTZD DO BYYC, GVEMV EK VOG E FOD T CTEW OP PWYY TMOJKDEM FJEDTWK PWOR DVYR. E’X BYYC OZY OZ ATKY, TZX DVY ODVYW TD DVY KZTMB ATW, TZX E’X CNTH EP ZODVEZF YNKY GTK FOEZF OZ. GVYZ E PEZEKVYX KMVOON, E MOJNX VTQY FOZY OZ DO JZEQYWKEDH, OW E MOJNX VTQY PONNOGYX RH XTX DO DVY J.K. EZKDYTX, E JKYX RH MOZZYMDEOZK GEDV DVY ODVYW ATWK EZ DVY ZYEFVAOWVOOX DO CEMB JC AOJZMYW GOWB. GEDV DVTD, E’X DTNB DVY OGZYWK EZDO NYDDEZF RY CNTH RJKEM T NEDDNY, TZX RTBY KORY ROWY ROZYH DVTD GTH. YQYZDJTNNH, E FOD FOOX YZOJFV DO KJWQEQY OZ AJKBEZF, AOJFVD RHKYNP T RODOWMHMNY, TZX AYFTZ GTZXYWEZF DVY MOJZDWH TK T PONB KEZFYW.

Thinking About Encryption, Part 8


Building on the algorithm from the last post, because I still haven’t had time to add more ciphers to En/De yet, say we just want to use Vigenere, Scytale and Rail Fence. For the moment, let’s just stick with upper case letters in the plaintext, because that’s what Vigenere was designed around. But, there’s nothing that says we have to. It’s trivial to treat the source file as binary, and build up a Vigenere table for the entire 256-character ASCII set. Our key could then be converted to 2-byte hex, and each number (0-255) would point to a row of the table where the ASCII table is shifted left that many positions. Then, for enciphering, we could use straight code instead of a look-up table, as in:

(pseudocode)

for i :=0 to plainText.length – 1 do
chPlain := plainText[i];
chKey := keyText[mod(i, length(keyText))];
chCipher[i] := mod(chPlain + chKey, 255);
next;

Instead, I’ll just use the normal Vigenere table, which is the upper case alphabet Caesar-shifted one position to the left for each row.

ABCDEFGHIJKLMNOPQRSTUVWXYZ
BCDEFGHIJKLMNOPQRSTUVWXYZA
CDEFGHIJKLMNOPQRSTUVWXYZAB
DEFGHIJKLMNOPQRSTUVWXYZABC
… etc.

As a reminder, if the keyword is JACK, then we use the row that starts with “J” to look up the first letter of the plaintext, the row that starts with “A” for the second letter, the “C” row for the third letter, the “K” row for the fourth letter, then cycle back to “J” for the 5th letter, and so on.

To build on this, let’s allow for a key phrase, which can be as many words long as we like. If the word is “A”, or when we hit a space, we just skip it. Otherwise, we can apply the following rules:

1) If the length of the phrase is even, apply the Vigenere cipher before applying transposition; otherwise, apply Vigenere after transposition.
2) Strip out the spaces from the phrase and use the resulting string as the Vigenere key.
3) Step through the key phrase one word at a time.
4) If the word length is even, use the Scytale cipher to transpose the working text; otherwise, use rail fence.
5) Strip out any recurring letters from the current word, and use the resulting length as the transposition key.
6) Apply the transposition to the working text, but only for making the rows/columns.
7) Stepping through the stripped keyword, if the letter is even (b, d, f…) reverse the column; otherwise, leave the order alone (columns for Scytale, rows for rail fence).
8) After step 7 is done, rearrange the stripped keyword so the columns are in alphabetic order.
9) Complete the transposition algorithm now, pulling the columns out and putting the working text into one long string.
10) Go back to step 3 as long as there are words left the key phrase.

This looks kind of wordy and unwieldy. So, we need an example.
Say the plaintext is:
“This is a plaintext message.”
-> “THISISAPLAINTEXTESSAGE”

And the key phrase is “Wish you were here.”
The length is 19 characters, which is odd, so step one says the order will be Vigenere then transpose.

The stripped key is:
“WISHYOUWEREHERE”
which becomes the Vigenere key.

Applying the Vigenere to the plaintext, we get for the working text:
PPAZGGUTCWQFACLNQVOASNC

The first keyphrase word is “WISH”, which is four letters long (even), so we use the Scytale transposition. Let’s use padding (x’s) on the last row for simplicity, only.

WISH
—-
PPAZ
GGUT
CWQF
ACLN
QVOA
SNCX

W: PGCAQS
I: PGWCVN
S: AUQLOC
H: ZTFNAX

Even letters: BDFHJLNPRTVXZ
“H” gets reversed

H: XANFTZ

Sorting the letter order,

HISW: XANFTZPGWCVNAUQLOCPGCAQS
—-

Apply the Vigenere again with the “WISHYOUWEREHERE” key:

TIFMRNJKNYDFHSEFSTLOUHOG

“YOU” is an odd length, so we use rail fence. Again, padding the end of the string isn’t really necessary, but I’ll do it for simplicity.

T—R—N—H—S—U—X
-I-M-N-K-Y-F-S-F-T-O-H-G
–F—J—D—E—L—O

Y:TRNHSUX
O:IMNKYFSFTOHG
U:FJDELO

“YOU” -> “OUY”, and all the letters are odd, so none of the rows are reversed.

OUY: IMNKYFSFTOHGFJDELOTRNHSUX

Apply Vigenere again,

EUFRWTMJKKPYMHRYPFPZFOQIR

Rinse and repeat. “were” and “here” turn into “WER” and “HER”, for two more rail fences. “R” and “H” are both even letters, so those rows get reversed in both cases.

This post is getting long enough, so I’ll stop here. The point is that someone who has the key phrase can just reverse the process and get the plaintext back out (minus spaces and punctuation). Anyone else can never be sure that what they’ve obtained was really the same message.

That is, when you attack this cipher, you have to brute force test everything against the following possible cases:
Vigenere is applied before the transposition.
Vigenere is applied after the transposition.
The key length is unknown and there are no reliable repetitions.
The transposition can be either Scytale or Rail Fence.
The transposition keys can be anything between 2 and cipher length/2.
Any given row of the transposition can be reversed, or not.
Any of the above combinations for the Vigenere and transposition ciphers could be applied anywhere between 1 and 1 million times, depending on the length of the key phrase. (More practically, the phrase might be between 2 and 30 words, the longer the better.)

In a way, this situation calls up the “many worlds interpretation” from quantum mechanics, in that not knowing my original key phrase, someone trying to break my message could get back:

“THISISAPLAINTEXTESSAGE”
“THEEYEOFTHETIGERISBLUE”
“INAREDGALAXYFARFARAWAY”

And any other of an infinite combination of readable English words of the same length, even if they get the number of padding characters right. And in at least one of the many worlds, every single possible output would be correct.

This also relates to one strength of the Vigenere – for very long keys, it’s pretty much unbreakable. If you use a one-time pad with keys made of 1000-2000 randomly selected letters, you only use any given key once before throwing it away, and you keep your plaintext shorter than the key length.

That is, say your key is 13 characters long, and the plaintext is “I like cheese”.
If the key is “BDFHJLNPRTVXZ”, you get:

JONRNNUTVLZ

The key doesn’t cycle, and there’s no combination of cipher characters that repeat. Someone not knowing the key could obtain “theoretical” (also 11 letters), because they can’t be sure that the key WASN’T “QHJDWJBLTLO” (because, maybe you didn’t strip out the duplicated letters from the key – and you wouldn’t if you required a 1,000 character-long key).

Speaking strictly theoretically, of course. As always, the weakness in the above algorithm is in how you get the key phrase to the recipient of your message, and in how you ensure the same key is never used twice.

Wiggles, part 5


JI OMJB PY CPJUCVI IMJMFWGVP. P FQBZ WK UO MPQ SUQGBY XMYBY MQUWOA NMKMO WOCPD P CWQOBA 17. JI AMA ZMY M JBGVMOPG, MOA JI JUJ YCPDD QWOY M YJMDD XMQ OBMQ CVB XMYB PO IURUYWRM. P YKBMR BOFDPYV XBCCBQ CVMO P AU NMKMOBYB, XWC P GMO JMRB JIYBDS WOABQYCUUA PO BPCVBQ DMOFWMFB. P’J 6 SUUC, MOA 220 KUWOAY, MOA P YKBOA M GUWKDB VUWQY M ZBBR PO CVB XUEPOF FIJY US ZVMCBLBQ GPCI P’J PO. ZVBO P CBDD YUJBUOB CU KPYY USS, CVBI WYWMDDI FBC JI AQPSC. CVPY VMKKBOY JUQB USCBO CVMO P’A DPRB, XBGMWYB P ZUQR KMQC CPJB MY M XUWOGBQ. JI JUJ’Y KDMGB ZMY M YOMGR XMQ, ZVBQB YMDMQIJBO ZUWDA GUJB PO MC OPFVC MSCBQ ZUQR, MOA CVBO FBC AQWOR CU XDUZ USS YCBMJ. YVB VMA CZU “VUYCBYYBY” ZUQRPOF SUQ VBQ, MOA YUJB US CVB GDPBOCBDB ZUWDA YUJBCPJBY FBCCPOF M DPCCDB CUU CUWGVI-SBBDI. ZVBO P ZMY LPYPCPOF VBQ UO CVB ZBBRBOAY, P’A UGGMYPUOMDDI FBC GMWFVC WK PO CVB POBLPCMXDB YRPQJPYVBY, MOA P DBMQOBA SMYC CVMC NWYC YMIPOF “KPYY USS” ZMYO’C BOUWFV. IUW VMLB CU XB ZPDDPOF CU CMRB M KWOGV UQ CZU CU CVB SMGB XBSUQB IUWQ YOBBQ, KQBSBQMXDI M XDUUAPBA UOB, QBMDDI FBCY CVQUWFV CU CVB UCVBQ FWI. PS PC AUBYO’C, ZBDD… CVMC’Y ZVI P XUE.

Dreaming – papercraft


In the land of Godzilla and giant robots…
Dreaming mechanical monster dreams.

Thinking About Encryption, Part 7


I want to build on the point made at the end of the last blog entry.
There are two separate processes for creating new ciphers, and for attacking them. For developing a new cipher, traditionally the factors have been: ease of use, ease of getting the key to the recipient, ease of reversing the cipher without making mistakes, and the difficulty in breaking it. The Caesar and Scytale ciphers were fine for their times because the “enemy” often didn’t know the language being used all that well, and writing out all of the different possible combinations on parchment or animal skin brute-force was too much work. Additionally, it would be hundreds of years before the idea of frequency analysis would occur to anyone.

When Caesar and Scytale were no longer viable sources of security, the idea became one of identifying weaknesses in existing ciphers and then eliminating them. With simple substitution ciphers, we can check letter frequencies, so one obvious change is to flatten the letter counts out, either by adding more choices for the more common letters, or by using polyalphabetic tables as with Vigenere. Now, though, with Vigenere, and anything that uses repeating keywords, we get recognizable cycles, leading to cycle counting. The Enigma machine was a brilliant invention in that it essentially acted like a never-ending Vigenere table generator.

The Enigma machine might have remained a worthy cipher option, except that banks and companies started sending messages back and forth at the rate of hundreds a day, with even greater sums expected in the future. Private keys were no longer feasible. That led to the development of RSA and public keys in the late 70’s.

Conversely, when attacking a cipher, you want to look for weaknesses, repetitions, the appearances of certain words in the plaintext that you can rely on (such as “submarine,” “troops,” and “commander”), and even quirks introduced by the operators sending the messages. If it’s a brand new cipher, you need to amass a mountain of ciphertext that hopefully uses the same key. If the message uses an existing, known cipher, you can sometimes just apply brute force, trying all possible combinations of the key with the known algorithm until the plaintext comes back out.

One of the most common analogies used when discussing physics and ciphers is that of mixing paint. If you have a can of red, and a can of green paint and you pour them together, you can still tell at the least that you have red and green in the same tray. But then, you start stirring, and you keep stirring for an hour. When you’re done, the “enemy” is not going to know exactly what proportions of red and green you started with, and they definitely aren’t going to be able to pull all the red paint back out of the tray, leaving only the green behind. But, the recipient of a cipher message that has the key is doing exactly that, which is possible specifically because they know the algorithm that was used to do the “stirring” and can then “unstir it.”

With private key ciphers, a lot of the initial security came by keeping the algorithm secret. Even the Engima machine would have been impregnable, were it not for the fact that Polish agents were able to get their hands on a working machine, with user instructions.

So, if the algorithm is known, what next? The answer is to make the brute force approach too time consuming to be practical. In the case of the Engima machine, there were 4-5 wiring wheels that acted like hardwired random substitution ciphers, and the first wheel would advance one position as the operator pressed a key. When the first wheel made one full rotation, the second wheel would advance one position; same for the third wheel. Three wheels were used of the 5, and they could be in any order, and start on any letter. That is, you could use wheels starting at 3=Z, 1=K, 5=M for one message, and 3=B, 1=Y, 5=B for the next message, and so on. Later, there were up to 8 wheels to choose from. That’s like 26!*26!*26!*26!*26!*26!*26!*26! combinations to work through, even if you do know the wiring of the individual wheels and the machine itself. The hope is that the universe would end before the enemy landed on the right combination of wheels for one message.

But, by knowing the algorithm, you can find weaknesses that the designers may have missed. Such as the rule that the wheels can not code a letter to itself (A!=A), and that wheels give 1-to-1 correspondence to the wiring (if A=R on wheel 1, then R=A on the same wheel). Or, you look at the people sending the messages. People get lazy. They don’t change their keys regularly, so that multiple messages have the same key, making them more vulnerable to frequency analysis. At the beginning of the use of Engima, the operators would send the key twice (e.g. – “KLAKLA”) to avoid mistakes in receiving the key. The cipher text could then start with JMYUVT, and we’d now know that J and U are the same letter three letters apart, even if you don’t know what J is. And, that would point a finger at which wheel was in the first slot.

We’re very slowly working backwards on that paint tray, getting closer to our original values for the red and green paints. The recipient of the message is doing this exact unmixing process which is easy for them because they do have the key. Knowing the algorithm lets you attack it. For private key ciphers, this is bad,

Enter public keys. It doesn’t matter if the enemy knows the algorithm for mixing the paints, or even that they know the recipient’s public key. The idea here is the recipient is given two very large prime numbers, which they use as their keys for deciphering whatever message they are sent. They then multiply these two primes together and make that public to anyone that wants to send them messages. The sender uses a different algorithm to encipher their plaintext message using that public key. An “enemy” intercepting the message now needs to factor the public key to get the recipient’s original two prime numbers back out. And this step just takes a REALLY, REALLY long time if the key is 1024 bits long or more. So, you’re not attacking the algorithm now, you’re attacking the key. If you have a server farm, you can do this, but maybe not for more than one or two keys at a time.

One idea I’ve been playing with is applying a multi-word key phrase on a Scytale cipher. Say you use the phrase “July August”, on the plain text “This is a plaintext message”. In the first pass, using “July” as the key, you get (adding padding for simplicity):

This
isap
lain
text
mess
agex

Then, sorting “july” to get “jluy”

Tihs
iasp
lian
txet
mses
aegx

For the string
Tiltmaiaixmsehsaeegspntsx

Now, for “augst” (removing the duplicated “u”):
Tiltm
aiaix
msehs
aeegs
pntsx

“agstu”:

Tltmi
aaixi
mehss
aegse
ptsxn

We get,
Tamaplaeettihgsmxssxiisen

If we allow for key phrases of any length, then the first paragraph of a Dan Brown novel would stir that paint can pretty well. Someone intercepting the ciphertext would never know when to stop unmixing. Was the key phrase something like words of 5, 7, 3, and 4 letters, or of 24 and 17 letters? What orders were the columns each time? And were any of the columns reversed? We could apply reversing rules based on even and odd word lengths plus whether the letter in the key word was a vowel or consonant. Finally, we could use the key phrase, minus spaces, to apply a Vigenere substitution either to the plaintext before transposition, or after the transposition is finished, again based on whether the number of key words in the phrase is even or odd.

This is one way to remove obvious letter or word repetitions, cycle counts, etc., while mixing the paints up really good. Knowing the algorithm won’t make a brute force approach any faster. This just leaves the question of how to distribute the key, and that’s an exercise left to the student.

Wiggles, part 4


Just a little ongoing story to give you something to play with until the next blog post.

GSY AZIPGEC LGXYBM QP QEASLFYBQRZ AZPXLXGLPR ZM ZJYE 6,800 LXBQPTX, GSY MZIE BQERYXG ZM DSLAS QEY SZPXSI, SZKKQLTZ, XSLKZKI QPT KCIXSI. GZKCZ, ZXQKQ QPT KCZGZ QBB BLY ZP SZPXSI. XQFFZEZ LX LP SZKKQLTZ, DSLAS LX QAGIQBBC NZXGBC MQENX, NZIPGQLPX QPT FQEKBQPT. XSLKZKI SQX Q AZIFBY XNQBBYE ALGLYX, HIG LX NZXGBC NZIPGQLPX QPT GYNFBYX. KCIXSI LX Q NLO ZM MQENX, MZEYXGX, NZIPGQLPX, QPT NYTLIN-HLR ALGLYX. GSY XZIGSYEP SQBM ZM GSY LXBQPT SQX Q BZG ZM JZBAQPLA QAGLJLGC, QPT LX SZNY GZ NQPC FZFIBQE SZG XFELPR XFQX (AQBBYT “ZPXYP” LP WQFQP). ZPY ZGSYE GSLPR KCIXSI SQX LX GIPPYBX. YLGSYE HYAQIXY ZM FBQCYT ZIG XLBJYE QPT AZFFYE NLPYX, ZE GSQG GSYEY DYEY NQPC QLE EQLT XSYBGYEX TIR LPGZ GSY SLBBX TIELPR GSY HLR DQE GSQG DYEY PYJYE MLBBYT HQAK LP. NQPC ZM GSYXY GIPPYBX QPT AQJYX QEY LRPZEYT, HIG PZG PYAYXXQELBC MZERZGGYP, QPT QEYP’G EYQBBC GSY HYXG FBQAYX GZ JLXLG HC CZIEXYBM LM CZI’EY PZG FEZFYEBC YUILFFYT. QPT, GSYC QEY DLGSZIG UIYXGLZP LPSQHLGYT HC REIYX.

The Last Mechanical Monster


Brian Fies is a physics major that went on to become a webcomic artist. His two big titles are Mom’s Cancer and A Fire Story  (which documents his losing his house in the California fires last year). But, he also wrote and drew The Last Mechanical Monster, which takes the evil genius from the original Fleischer Brothers’ Superman cartoon, The Mechanical Monsters and shows what happens to him after he gets out of jail 65 years later. In the story, the genius (later named “Sparky”) returns to his lab and assembles one last monster from the scrap left over from the others. At the end of the story, Brian added a link to a Mechanical Monster papercraft. I like making papercraft models, and I had extra cardstock left over, so, this is what it looks like.

It took me at least 10 hours to cut out all the pieces, form them and glue them together, but I wasn’t really timing myself. The finished model is a huge 16″ tall. It would be even taller if I had the arms raised in rampage mode.

Brian suggested using brads for the hip-leg, neck, and shoulder joints if you wanted them to move, but I didn’t feel like buying a package of brads just for this. So, I made paper rivets. I cut out disks from the torso joint locations for the neck, shoulders and hips, then I trimmed the disks down a bit smaller than the diameter of the holes. I glued one side of the disk to the glue point for the appendage, positioned the appendage so that the disk was inside the matching hole on the body, then glued a larger piece of paper to the disk on the inside of the body. The limbs and head swivel just fine, but yeah, the rivets loosen up over time and the monster falls over under its own weight. If I were to do this again, I’d consider using small pieces of rubberized magnetic sheets.

I also didn’t glue down the tabs holding the neck collar and hips to the torso, because it’s easier to carry the monster if it’s disassembled. And I left the chest flap and back of the hips free so I could get to the paper rivets from the inside if I needed to. But, otherwise, yeah, LMM is fully ready to go out and monsterize people.

As Brian says, it’s not bullet-proof, and it won’t fly unless you throw it really hard. But, it can dream. In the land of giant mobile suits, Godzilla, and Super Saiya-jin, it can dream.

The world runs on Dunkin’.

Thinking About Encryption, Part 6


I want to tackle a couple transposition ciphers in En/De next, but for various real-life reasons I’m probably not going to be able to get the time for that for a while. I expect it’d only take a few hours each, but I have other stuff of higher priority to get out of the way first. On the other hand, I can scrounge out a few minutes per day to write up this blog entry. So I’ll do this now, and then get to actually writing code later.

There are two very simple transposition ciphers – the Rail Fence, and the Scytale. I’ll start with the Scytale. This was originally a cylinder of wood of a particular diameter, around which the sender would wrap a length of parchment or animal skin. They’d write the message along the length of the cylinder in rows, rotating the cylinder as they went to the next row. When they unwrapped the parchment from the Scytale, the letters were all jumbled. The recipient, though, just needed to wrap the parchment around a second cylinder of the same diameter to read the message back off.

In essence, this cipher can be treated like an array, with a width matching the length of the Scytale (or, the number of wrappings around the cylinder), and a depth corresponding to the length of the message divided by the width. If we pick a width of 4, and use the message “This is a Scytale cipher”, the array looks like:

This
isaS
cyta
leci
pher

To build up the finished cipher message, we just read down the columns from left to right.

TiclphsyehiatcesSair

The first indication that this is a transposition, rather than a substitution, cipher is that a frequency analysis will show that the most common letters are still ETAOIN SHRDLCU. We can break this either through brute force, trying all possible widths up to length/2 (after that we start getting the plain text back out again), or we can look at the spacings between letter combinations for 3- and 4- letter words. For example, the spacings between “t” and “h” are 5 and 9; between “h” and “i” are 1, 5, 9 and 13; between “h” and “e” are 3, 5 and 9; and between “i” and “s” are 5, 5, 6, 14, 15. 5 and 9 show up a lot, so we then go to “the” and “this”. The first “t” is followed by “h”s 5 and 9 letters later, but with a spacing of 9 we get “thi” before running out of letters. For the same “t”, a spacing of 5 gives us “this”, and by taking the length of the message = 20/5, we get a width of 4, which then cracks the cipher.

For the rail fence (or zigzag), we again pick a width up to 1/2 the length of the message, but now we alternate directions, kind of like we’re drawing w’s. For a width of 4, and a message of “This is a rail fence cipher”:

T—–a—–e—–p
-h—s-r—f-n—i-h
–i-i—a-l—c-c—e
—s—–i—–e—–r

and reading off the lines left to right, top to bottom, we get:
Taephsrfnihiialccesier

I’d say that knowing the algorithm, it’s much simpler to brute force the Rail Fence, and pick the width that provides you the greatest number of 3- and 4-letter words when you do a frequency count. The other approach is to look at repeating cycles. From left to right, top to bottom and then back up, “Thisis” represents one full cycle of the zigzag (a length of 6), which gives us the formula: cycle = (# of rails * 2) – 2. (i.e. – (4 * 2) – 2 = 6)

The length of the message becomes important now. At 22 characters, we could have 2 rows of 11 characters each; 3 of 7 each; 4 with 4-7-7-4 characters; 5 with 3-5-5-6-3; etc. We could pad the message so that it always ends on the bottom rail, but it’s not necessary, and actually padding would make it easier to guess the rail width.

If we look at letter spacing, the “t” is followed by “h”s 4 and 10 characters later. The first “h” is followed by “i”s 5, 7, 8 and 15 characters later; and second “h” is preceded by a “i” by one character, and followed by “i”s at 1, 2 and 9 characters. The first “i” is preceded by “s” 4 characters earlier, and followed by “s” 9 characters later. The point of this exercise is that if the message is too short, or lacks common 3- and 4- letter words, there’s no obvious pattern to the letter spacing. Compare this to a width of 5:

T——-a——-c
-h—–r-i—–e-i
–i—a—l—c—p
—s-s—–f-n—–h-r
—-i——-e——-e

Tachrieiialcpssfnhriee

The spacing for “t” and “h” is 3 and 17. For the first “h” and “i” is 2, 4, 5 and 16. For the “i”s and “s”s, it’s between 5 and 8. At least this time, none of the numbers are negative.

If we know that the first word is “this” and we count the spacings between consecutive letters, for the 4-rail, we get 4-7-7 (we have to ignore the first “i” and the first “s”). For the 5-rail, it’s 3-5-5 (we need to ignore the first two “i”s). Of course, if “this” occurred a little later in the plain text, the spacings between the letters would change.

A——-r——-e
-t—–a-a—–c-c
–h—s—i—n—i—r
—i-i—–l-e—–p-e
—-s——-f——-h

Aretaacchsiniriilepesfh

“this” = 5-6-6″.

Meh. With a message this short, just try every value for width from 2 to 11 and do a visual check for the width that gives you a readable result.

Again, both of these ciphers are easy to crack as-is. But, there are a few ways to harden them. First, we could reverse the letters on alternating rows. Or we could use a keyword to set the width of the cipher (i.e. – “maple” (12345), and then rearrange the associated rows so the keyword letters are in alphabetic order (“aelmp” (25413)), Or, we could do both.

These two variants alone would throw off the letter spacings, and if you didn’t know this was an option, brute-forcing the ciphers would be a waste of time, because you’d only be expecting the letters to be going one way, and you’d miss all the other combinations the algorithm knows about.

Why go through this? Well, if we look at the Vigenere cipher, it is attackable because the keyword used creates a predictable cyclic pattern. So, if we break up that cyclic pattern by applying a rail fence where alternating rows are in reverse order, and the rows themselves are rearranged like “maple” (12345) -> “aelmp” (25413), the Vigenere keyword cycles get disguised.

If we were to try doing something like this by hand to encipher and decipher our messages, it would be time-consuming and prone to error. But, it’s trivial in software. We just need to let our recipient know what algorithm we’re using, and what the keywords are.

And this is where the use of private key ciphers falls apart.

As long as I don’t tell anyone what my algorithm is, I’m fine. If I just use it without publicizing the details (“Hey, recipient, I’m giving you a message using Cipher-1 from the program menu. Use keyword “xyz” to read it.”) and I change the the keyword all the time, I’m going to be pretty safe. But if I try to patent or trademark my method, I have to release the details of the algorithm. Then, if I say, my method combines Vigenere with a rail fence, and the keyword allows me to choose whether to reverse row directions and row orders, then I’m back to square one again, because now someone intercepting my messages knows which methods to brute force prior to checking letter frequencies.

In other words, most algorithms (especially private key ciphers) can be broken if you know the approach, even if you don’t know the key. The only option is to make brute forcing or frequency analysis so time-consuming that the universe would suffer from heat death before a third party could make the cipher text readable. Or, keep the messages short and never repeat your keys.

Wiggles, part 3


Just a little ongoing story to give you something to play with until the next blog post.

OPP YX YWJI YWS YSMPSMEA XD NXPSUM COUZI OMP XBYPXXU EXNNBMJYA ICOESI YX NOZS BIS XD OTT OLOJTOVTS TOMP. J’TT VS HOTZJMQ OTXMQ O COLSP COYW JM O WBQS COUZ HJYWJM YWS TJNJYI XD O EJYA TJZS YXZAX, OMP HJYW O TJYYTS EBULS JM YWS COYW J’TT DJMP NAISTD JM O YJMA, POUZ, POMZ YUSS-TJMSP EBT-PS-IOE TJYYSUSP HJYW IJQMI IOAJMQ “VSHOUS XD YWS HJTP VXOU,” OMP “TXXZ XBY DXU CJY LJCSUI.” YWS SMYJUS EXBMYUA JI TJZS XMS WBQS 1980’I YSGY OPLSMYBUS QONS. AXB MSLSU ZMXH HWOY AXB’TT DJMP MSGY HWSM AXB UXBMP O COUYJEBTOU EXUMSU, XU SMYSU O MSH MSJQWVXUWXXP. JM YWS VJQ EJYJSI, SMYJUS VBJTPJMQI, DJTTSP HJYW HSJUP XDDJESI, IWXCI OMP VOUI, EXBTP VS BISP OI YWS CTOAQUXBMP DXU O DBTT OPLSMYBUS QONS KBIY VA YWSNISTLSI. OMP, SLSUA IJMQTS CTOES AXB QX YX, YWSUS JI O VSYYSU YWOM FSUX EWOMES XD VSJMQ SOYSM VA O QUBS.