There’s a concept in mathematics that revolves around changing domains, or changing systems. By this, I mean that you can have an equation that you map in a Cartesian coordinate system, which is essentially the x-y grid, and the equation can be in the form of y = a*x + b. Quite often, though, the problem you want to solve may not be that easy in a Cartesian system, but it could be trivial when using polar coordinates, which are represented by a radius and an angle. Or in vector space, with a magnitude and a direction. To go from one system to the other, you need a conversion formula. An alternative example could be using the Fourier transform to switch between the time and frequency domains.

The question then becomes, are there “systems” in cryptography? Is there a moral equivalent to Cartesian and polar coordinates? In a gross sense, I think we could say that are two types. The first is on the message level, and the other is on the media level.

For anyone new to ciphers, it would seem that enciphering is either a matter of moving the plaintext letters around (transposition), or changing them to something else (substitution). I’ve only really addressed rail fence, Scytale, Caesar and Vigenere, and if those were all the algorithms anyone had ever come up with, we could be forgiven in thinking that the point is to just manipulate individual letters.

But, look at the English language. We have phonemes, letters, syllables, words, sentences and paragraphs. And, on a meta level, we have nouns, verbs, adjectives, adverbs and everything else. On a message level, we could (and many other people already have) develop algorithms that encrypt plaintexts on a non-letter basis. It’s a bit tricky trying to define phonemes for individual words if the sender and the receiver have different accents. You’d need to go with the dictionary definitions or fixed 2- and 3-letter combinations, but it is still possible to develop a random substitution table for phonemes, either to other phonemes, or to numbers. As an example, “cram” would turn into “‘kram”, and “through” would become “thru” (where “u” has the umlauts). The Dyslexia Reading Well lists 44 phonemes in the English language, so we could either code them as upper or lower letters (the 26 upper plus 26 lower letters, which would allow for some duplications), or simply as the numbers 1-46 (if you don’t use delimiters, then 1-9 would need leading “0” padding). If you want to get nasty, use hexadecimal pairs (00-FF), which would give you 0-255 potential assignments.

Syllable substitutions would work pretty much the same way. The Quora page answering the question “How many syllables are there in English” gives two different answers: over 10,000, and exactly 15,831. Given that, to do the encryption, you’d need an agreed upon substitution table, you can simply pick the syllables, or letter groupings, you care about, and use those. In this case, it would probably be easier to use number values. That is, “gi” = 1, “ven” = 2, “tt” = 3, “le” = 4, “ers” = 5. “Given letters” would then be 1 2 4 3 5. Any text not encrypted at the syllable level could be treated as individual letters and given numeric values like 100-200, if you like.

Ok, when we come to words, or word groupings, it might seem that we should give up. Merriam-Webster had 470,000 entries in their 1993 Addenda edition, and there’s the question of what’s an English word? Do we include “teriyaki,” “sushi,” and “cannoli”? But, what if we mix systems? Pick the 100 most common words and give them number values. (I.e. – “the” = 1, 6 and 9; “if” = 2, 10 and 20; “may not” = 7 and 8). Everything else can be represented as letters, phonemes and syllables.

Technically, there’s no point in trying to substitute entire sentences or paragraphs, but there’s no reason we can’t transpose them. We could resort to Scytale and rail fence on a sentence-by-sentence, or paragraph-by-paragrah basis. Alternatively, the substitution table we use above could change based on a key from one sentence or paragraph to the next (hello again, Vigenere).

For the second system, the “media,” we have the computer. As mentioned above, it can be easy for a beginner to get trapped in the idea that a plaintext message is simply whatever you have written down on paper, or typed with a typewriter, as a string of letters, digits and symbols. Even if we make the transition to a computer, we see these characters as represented by the ASCII table. “A” = decimal 65, hex 41; “Z” = 90 decimal, hex 5A. In fact we are performing a random substitution all the time when we type messages up on a computer, because 54 68 65 = “The”. Applying a simple Caesar cipher with a shift of one gives use 55 69 66, or “Uif”. Cracking a random substitution cipher can be done as easily for “Uif” as it is for 55 69 66, through the application of frequency analysis.

The mistake is to not change number systems. ASCII is based on 8-bit binary. 00000000 = 0, and 11111111 = 255 (if we use hex, then 00-FF). General ASCII just uses 0-127 (00-7F), while the upper half of the range (80-FF) can be any graphics characters you like. This is fine for the English alphabet, but languages like Chinese and Japanese use non-phonetic pictogram systems that can easily require 2,000-5,000 characters for high school-level literacy. Which is where Unicode comes in. One of the most commonly-used Unicode versions is UTF-16, which is a 16-bit variable-width encoding system. Unicode is important to understand if you’re dealing with “letter level” substitutions as described above. Otherwise, at the bit level, we don’t care.

This next part is a bit off-topic, but the second we go binary, we can start talking about filename extensions. .doc indicates that the file is a Word document; .jpg is a JPEG image file, and .mov is a video file that includes audio. Up to this point, I’ve only been concerned with .txt, or ASCII textfiles. But these extensions just represent format conventions for data files used by specific applications. Ignore the application that created the file and now you just have an arbitrarily long string of 1’s and 0’s in sequence. Who cares what the “plaintext” file was? Just move the bits around and the application (or the person trying to access the data in the encrypted file through that application) can’t read it.

To do this, we still have substitution and transposition. The most obvious choices would be Scytale or rail fence, but we can also use swapped pairs.

“The” was 65 41 90, which is also

0110 0101 0100 0001 1001 0000

Swapping pairs of bits, we get

1001 1010 1000 0010 0110 0000 (9A 82 60)

Both 9A and 82 are outside the regular ASCII character range, and will appear as strange graphics shapes. Unfortunately, this is no different than a regular random substitution, and will fall to frequency analysis.

Scytale with a width of 5 gives us,

01100

10101

00000

11001

00000 (and adding 0 for padding)

Or,

01010 10010 11000 00000 01010

Going back to hex pairs, with more 0 padding:

01010100 54

10110000 B0

00000101 03

00000000 00

But, this is where things get a tricky. By adding padding, we’re kind of giving away the width value. Because computers use widths of 2, 4, 8, 16, 32, 64, etc., if the ciphertext is not a multiple of one of those numbers, we can assume padding was added to bring the number of bits up to a multiple of the cipher width. The ciphertext is 25 bits long, and that’s kind of an obvious clue. Next, though, we need to put our ciphertext back into a format the computer can recognize, and that means padding up to 32 bits. We can do that easily by appending 0’s at the beginning or end of the string, but how is our recipient going to know how much padding we’ve added?

Actually, having the key width will help. If the ciphertext is 32 bits long, but the key width is 5, then our recipient can argue as follows: The plaintext length is going to be a multiple of 8 bits, giving me 8, 16, 24 and 32 bits as options. If the plaintext was 8 bits long, then the Scytale portion needs to be padded to 10 bits. For 16, padded to 20. For 24, padded to 25, and for 32, padded to 35. 35 bits is right out, and the closest number to the current 32 bits of the ciphertext is 25. Obviously, there’s 7 bits of padding, either at the front or the back, and then 1 more bit for the Scytale portion. Now, anyone trying to crack this cipher can use the same logic, but they have to apply a bruteforce approach to the ciphertext for every value from 2 to length/2. And, naturally we can choose to reverse some of the columns of the Scytale following whatever rules we want, as well as doing that old “keyword” -> “dekorwy” column shuffling I’ve talked about before for Scytale.

That was transposition. For substitution, we can flip bits:

“The” was 65 41 90, which is also

0110 0101 0100 0001 1001 0000

Flipping 1 to 0, and 0 to 1:

1001 1010 1011 1110 0110 1111 (9A BE 6F)

Again, two of the characters will look like random graphics. And again, this will fall to frequency analysis because it still looks like a random substitution.

Instead, we can exclusive or (XOR) the plaintext against a key. If the key is the same length as the text, say CAT (43 41 54), and we know that the rule for XOR is that 0+1 and 1+0 = 1, otherwise 0+0 and 1+1 = 0, then

0110 0101 0100 0001 1001 0000 (The)

0100 0011 0100 0001 0101 0100 (CAT)

0010 0110 0000 0000 1100 0100 (36 00 B4)

A weakness here is that if we use only human-readable keywords (anything using A-Z, a-z, 0-9 and punctuation), then our keys won’t change the high bit (bit 8), and won’t include values 0-31, which are “non-printable” characters. We can overcome this by using simple hex values for the keys (00-FF) and not worry about trying to make words out of the keys.

And of course, we can combine Scytale with XOR as we like. This brings me back to the idea of making the length of the XOR keyword into the width of the Scytale or rail fence transposition, and reversing the order of the transposition columns if the length is even or odd.

One more complication can be the decision to make the ciphertext human readable. In other words, we can pretend that the resulting output is either ASCII or Unicode. It’s easier to pretend we’re dealing with ASCII. To do this we group the bits in bunches of 7 and prepend a 0 at the front to make 8-bit groupings. For the example, let’s go back to the bit flip,

1001 1010 1011 1110 0110 1111 (9A BE 6F)

We need groups of 7 bits,

1001101 0101111 1001101 1110000

Then we prepend the 0’s to get

01001101 00101111 01001101 01110000

4D 2F 4D 70 = “M/Mp”

The real drawback to this type of algorithm of making 7-bit groupings and prepending the 0 is just that we’re making the file roughly 12.5% larger, which is going to affect file transfer times. But, if the time for sending the file is not an issue, then we’re still good to go. The real reason for doing any of this is that it makes the bruteforce approach to cracking the cipher that much more time consuming. If our opponent has no idea exactly which algorithm we’re using, they have to test every single one, in all conceivable combinations.

Theoretically speaking, of course.

I’m not really sure how well my original point holds up that encryption/decryption has an analogue to coordinate switching in math. But it serves as an introduction to this blog entry well enough.