The Cryptosystem ME6 Encryption Process

 


Introduction

Plaintext is readable data (for example, a spreadsheet file), and ciphertext is the result of encrypting plaintext. A cryptosystem is a set of procedures and conventions for hiding and revealing information in a controlled manner. A cryptosystem generally has two distinct components: (a) the processes used to encipher and decipher data and (b) the set of keys used to influence the operation of these processes so that the ciphertext is dependent on the key used for encryption. The security of a cryptosystem lies not in the secrecy of the methods used to encipher and decipher the data but rather in the difficulty of decrypting ciphertext in the absence of knowledge of the key used to produce it.

In order to allow evaluation of the security of Cryptosystem ME6 an explanation will now be given of the encryption method. In the section below entitled Cryptographic Strength the security of the method against possible attacks will be discussed.

Cryptosystem ME6 encrypts data in files stored on disk. A file may be considered as a sequence of at least one byte and perhaps many millions of bytes. ME6 reads in plaintext from a file in blocks whose size is between 6 Kb and 10 Kb (the exact size of each block depends on the encryption key), encrypts each block and writes the resulting ciphertext to disk. This is done for each of the blocks making up the file. Each block is first compressed, if possible, before being encrypted, so normally the ciphertext blocks are smaller than the plaintext blocks, with the result that the file containing the encrypted data is usually smaller than the input file.

The Encryption Method

A file to be encrypted or decrypted is treated as a series of blocks (6 Kb to 10 Kb in size) and the bytes within each block are treated as an input stream of bytes. The algorithm is an autokey stream cipher with the keytext generation dependent on the foregoing plaintext (making use of the MD5 message digest algorithm).

The keytext stream is made up of segments. The first, the "priming key", is formed from the encryption key. The length of the priming key and the lengths of the subsequent keytext segments are not constant but depend on the encryption key.

Two pseudo random number generators are used (a linear congruential generator and a Park-Miller generator). These are seeded from the MD5 message digest of the user-specified encryption key.

The plaintext stream (the contents of the file being encrypted) is divided into segments of lengths equal to the keytext segments. A ciphertext stream is produced which consists of segments of lengths equal to the plaintext and keytext segments. The encrypted file consists of the entire ciphertext stream.

The input stream is read in a series of blocks which can be any size in the range 6 Kb to 10 Kb, the exact sizes being dependent on the encryption key.

Two encryption processes are used, an "inner" and an "outer" process. The outer process is applied to the blocks. Within each block the inner process is applied to sub-blocks which are 200 to 400 bytes in size (the plaintext segments).

Each block is subject to the outer encryption process, as described below.

The outer encryption process

The block is divided at a random point and the two halves are interchanged.

The block is then compressed in a way dependent on the encryption key.

The bytes in the compressed block are combined with the output of the random number generators.

The block is then treated as an input stream for the inner encryption process, as described below.

A random number of random bytes are prepended to the block before it is written.

The inner encryption process

For i >= 0 each ciphertext segment c(i) = k(i) XOR p(i), where: p(i) is the i-th plaintext segment within the block, k(0) is a priming key (produced anew for each block based on the encryption key and making use of the MD5 message digest algorithm), and k(i) for i > 0 is G(k(i-1),p(i-1)), where G() is a complex function which makes use of the MD5 algorithm and which acts on the previous keytext segment and the plaintext segment which has just been encrypted (or decrypted in the case of the inner decryption process described below).

The bytes in each ciphertext segment are shuffled randomly.

The size of each keytext segment (and thus of each ciphertext segment) is variable and depends on the encryption key.

The Decryption Method

The decryption process undoes what the encryption process does.

A priming key is formed in the same way as in the encryption process and a keytext stream is generated which is identical to the keytext stream used during encryption.

Each block of the ciphertext stream written during encryption is read and its constituent ciphertext segments are subjected to the inner decryption process, as described below. The outer decryption process is then applied to the block.

This is repeated for each block of ciphertext to recover the original file.

The inner decryption process

First the bytes in each ciphertext segment are unshuffled.

For i >= 0 each plaintext segment p(i) = k(i) XOR c(i), where c(i) is the i-th ciphertext segment within the block and k(0) and k(i) for i > 0 are as defined above for the inner encryption process.

The outer decryption process

The combination with the output of the random number generators is undone.

The block is decompressed and the two halves are interchanged to recover the original plaintext block.

Cryptographic Strength

We shall now consider how an attacker might attempt to break this system, that is, how he might attempt to recover plaintext from ciphertext without knowing the encryption key which was used.

Suppose that an attacker has some ME6-encrypted ciphertext. First he has to identify the ciphertext blocks and sub-blocks (the sub-blocks are the ciphertext segments) produced by the encryption process, but the sizes of these blocks are obtained from the random numbers generated by the two pseudo random number generators, which are seeded from the encryption key, so the segments vary considerably in size in a way dependent on the encryption key (which he does not know). Knowing the length of a particular plaintext or ciphertext segment gives no information about the length of the next segment (other than that it is from 200 to 400 bytes). If there are just seven segments of ciphertext in a block whose size is about 2 Kb then there are something like 2017, or more than 1016, possible combinations of segment sizes.

But let's suppose that the attacker can correctly divide the bytes in the ciphertext into the blocks, and the blocks into the segments (corresponding to segments of keytext and plaintext).

The last step of the inner encryption process is to shuffle the bytes of the ciphertext segment, so the attacker must unshuffle them. But they were shuffled in a key-dependent manner (using the pseudo random numbers), and he does not know the key, so does not know how to unshuffle them.

But suppose he unshuffles them correctly. Then he has ciphertext segments c(i) which were formed by XORing the corresponding keytext segments and the corresponding plaintext segments, i.e., c(i) = k(i) XOR p(i).

If the attacker knows any two of c(i), k(i) and p(i) then he can get the third. But even knowing all three does not allow him to recover k(i-1) or p(i-1). To get p(i-1) the attacker must know k(i-1). For i > 0 k(i) = G(k(i-1),p(i-1)) and G() is a one-way function, since it makes use of the MD5 message digest algorithm to form components of k(i) from k(i-1) and p(i-1). k(0) is formed from the encryption key (again using MD5) and the attacker does not know the encryption key, and so does not know k(0).

Ignoring the complications introduced by shuffling the bytes in each ciphertext segment in a random (key-dependent) way, we can say that: (i) If the attacker knows the priming key, k(0), for any given block then he can recover not only p(0) but all the plaintext segments, since each subsequent key segment is formed on the basis of the preceding key segment and the preceding plaintext segment. (ii) If the attacker does not know the priming key then he cannot recover any of the plaintext segments from the ciphertext. (iii) If the attacker does not know the encryption key then there is no way that he can discover the priming key.

Brute Force Attack

As noted above, the security of ME6 depends (in part) on an attacker not being able to discover the initial keytext segment, the priming key, used to encrypt the initial segment of each plaintext block. Could he somehow recover the priming key by means of a brute force attack on the initial ciphertext segment?

The attacker knows the initial ciphertext segment is 200 to 400 bytes in size, but since he does not know the encryption key he does not know the exact size, so he must try all possible sizes n with 200 <= n <= 400. The bytes in the ciphertext have been shuffled, and since he does not know the encryption key (knowledge of which is needed to unshuffle them), he must consider all possible permutations of the n bytes, of which there are n! (which is already an impossibly large number).

The inner decryption process is essentially XORing the plaintext with the ciphertext, so the attacker has to XOR the (unshuffled) initial ciphertext segment of n bytes with all possible putative n-byte priming keys to see what results. Even with the correct priming key the result may not be identifiable as the correct plaintext since ME6 compresses the plaintext block (if possible) before encryption.

Furthermore, there are a huge number of n-byte segments resulting from XORing with putative n-byte priming keys which are candidates for the correct plaintext, since any piece of n-byte natural language text (or n bytes of compressed natural language text) when XORed with the ciphertext produces a key segment which, when XORed with the ciphertext, results in that natural language text (or part of a compression of natural language text).

So if some result looks as if it might be correct, it has to be checked by using that priming key in an attempt to decrypt the following segments of the ciphertext block (an incorrect priming key will produce garbage when used to generate subsequent segments of keytext). But the location of the subsequent ciphertext segments is not known.

So a brute force attack must consider all possible putative n-byte priming keys and use them to attempt to decrypt the entire ciphertext block. And what are the possible putative priming keys? These are the priming keys derivable from the possible encryption keys in the way that ME6 does it.

To form the priming key ME6 takes the MD5 digest of the encryption key (expanded to 64 bytes), seeds the two pseudo random number generators using all 16 bytes of the digest, selects a random segment length between 200 and 400, expands the encryption key to this length and XORs it with output from the random number generators. It is reasonable to suppose that the number of priming keys which can be generated in this way is at least the number of random keys, i.e., c. 10150. (Thus Cryptosystem ME6 has a 500-bit keyspace. In comparison, DES has a 56-bit keyspace.)

So how long would it take an attacker to test all 10150 priming keys?

In the early 1990s Thinking Machines in Massachussets unveiled the fastest supercomputer on the market at that time, capable of 1,000,000,000,000 (1012) operations per second. Assume that with a supercomputer one billion times faster 1021 priming keys can be tested per second. Even assuming that the attacker gets lucky and finds the key after searching only 0.01% of the keyspace, it would still require about 10125 seconds to find the right key. However since the number of seconds which has elapsed (allegedly) since the origin of the universe is approximately 1017, the time required to find the correct key by this means would be about 10108 times the age of the universe.

Since this is just for one possible permutation of the bytes in a segment of one particular length, and the attacker would have to consider all permutations of segments from 200 to 400 bytes in length, it is clear that a brute force attack cannot succeed.