Development
and Analysis
of Block
Ciphers
and
the DES System

Dave Rudolf 2000
Defenses Against Attacks
1. Use of Key Space
2. Polyalphabetic Trnsformations
3. Diffusion and Confusion
4. One-Time Pad
5. Absolute SecuritySpecific Attacks
1. Meet in the Middle Attack
2. Aggregate Ciphers
3. Differential Cryptanalysis
4. Linear CryptanalysisThe Data Encryption Standard (DES)
1. Introduction to DES
2. DES Algorithm
3. Initial Permutation
4. Key Permutation and Subkey Selection
5. Cipher FunctionAnalysis of DES
1. Important Properties of DES
1.1 Bit Diffusion and Confusion
1.2 Weak Keys
1.3 Complimentation2. Attacks on DES
2.1 Differential Cryptanalysis
2.2 Linear Cryptanalysis
There is no such thing as a perfect encryption system. All the algorithms that have been developed to date can be broken, given enough time and/or space. Algorithms are considered strong if there are no known methods of simplifying an attack. Complex algorithms are generally considered stronger, but be aware that this is because they are harder to to find specific attacks for, not because they are truly stronger.
This brings up another dimension of security, namely that a system's security also depends on its application. For example, a system is thought to be appropriate for the particular application if:
These are rather important issues when designing a new system, and they are closely related to each other. If a system is to be generally useful, then it's cost and analysis time should be as high as possible.
There are several types of cipher systems currently in use. Block ciphers are the most basic, as they use a single key to transform a block of text with a specific length. For example, in the DES system, blocks are always 64 bits. A longer message is encoded by breaking the message up into many blocks and applying the cipher to each block. Block ciphers can be adapted to work as stream ciphers, which ensure that each block is dependant on previous blocks in the ciphertext (see Section 3.2 for more details on stream ciphers).
The most recent addition to the field of Cryptography is the notion of public-key systems. Such systems involve two keys: one that is used to encrypt (called the public key), and another to decrypt (the private key). Such systems distribute the public key freely (as it is generally useless to attackers). This way, clients can encrypt messages, but only the holder can decrypt them. Public key systems can be used as a block cipher, but they are very expensive to compute. As such, they are generally used to transmit keys for conventional block ciphers.
This document is meant to explain the fundamentals of block ciphers. As such, deals exclusively with these systems, except where issues cross over to other systems.
There are five main types of attacks that are considered when designing an encryption system. The types are listed below, in order of most time consuming to least.
| Ciphertext Only | The attacker has only the ciphertext, which he may want to decode, or he may want to determine the key that was used |
| Known Plaintext | The attacker has a plaintext and its corresponding ciphertext. The goal of this attack is purely to determine the key. |
| Chosen Plaintext | The attacker gets to choose a plaintext and can obtain the corresponding ciphertext. The goal is to determine the key. If the attacker has the opportunity to iteratively try different plaintexts, then he can choose each plain text, based on information that he received from the previous iteration. This is often referred to as an Adaptive Chosen Plaintext attack |
| Chosen Ciphertext | The attacker chooses a ciphertext and can obtain it's corresponding plaintext. An attacker may also use an adaptive version of this attack, as with Chosen Plaintext |
| Chosen Text | An attacker can choose both the ciphertext and the plaintext, and can get the corresponding texts to both. |
The above definitions also show another dimension. It relatively easy to get the ciphertext, as an attacker can just tap in on the communication line between the sender and receiver. It is more difficult to tap into the sender's system to get the plaintext. Still, this is easier than trying to "trick" the sender into encrypting a message that is chosen by the attacker, and then send it out. In general, a system is considered safe if it can survive a known plaintext attack.
It is assumed that an attacker always has knowledge of the cipher algorithm. A scheme should be able to thwart off an attack with other means, and any secrecy about the internal workings of the algorithm can be considered a bonus. One should never rely on the secrecy of the encryption scheme to be a main security factor. Besides, a scheme can be more thoroughly tested once it is in the public domain. Another assumption is that an attacker can easily identify the original plaintext when he sees it. Often, a message may be in some strange format, or might have been subjected to a Steganographic transformation, or other data hiding techniques. In terms of pure encryption, we ignore this.
To simplify our discussion, we can use the following variables:
| S | is the alphabet that we are using, with S * is our language | |
| ei | is a character in the alphabet | S = { e0, e1,
e2, ... en-1 }
|
| T | is our encryption transformation (algorithm) | |
| K | is the set of all possible keys (key space) | |
| || K || | is the size of the keyspace K | |
| k | is the key we choose to use | k Î
K
|
| M | is our plaintext message | M Î S* |
| mi | a character (sometimes referred to as a block) in the plaintext M | M = { m0, m1, ... } |
| C | is the resulting ciphertext | C = Tk ( M ) |
| ci | a character (also referred to as a block) in the cipher C | C = { c0, c1, ... } |
Any cipher can expressed as the following function C = Tk ( M ) and alternatively M = Tk-1 ( C ). Also, we can simplify our discussion by assuming that each character ei in the alphabet E corresponds to the number i. So for an alphabet of size n, we have S = { 0, 1, 2, ... , (n-1) }.
There are several basic concepts that we can take advantage of to increase the security of our system.
The first, and most obvious thing we can do to make our encryption stronger is to increase the key-space. Our key k belongs to a set of possible keys K. If the size of K is small enough, then an attacker can simply try all keys. So the solution is to increase the size of key-space K, in order to increase the time required to try all possible keys. Table 1 illustrates how different key lengths, and different classes of hardware affect an attacker's ability to find the key.
| Bits per Key |
Average Time required |
||
| LSI-Chip (103 keys/s) | VLSI-Chip (106 keys/s) | 106 Parallel VLSI-Chips (1012 keys/s) | |
| 24 | 2.3 hours | 8.4 sec. | negligible time. |
| 32 | 25 days | 36 min. | 2.2 ms. |
| 48 | 4463 years | 4.5 years | 2.4 min. |
| 56 | 1.1 x 106 years | 1.1 x 103 years | 10 hours |
| 64 | 2.9 x 108 years | 2.9 x 105 years | 107 days |
| 96 | 1.3 x 1021 years | 1.3 x 1018 years | 1.3 x 1012 years |
| 128 | 5.4x1027 years | 5.4 x 1024 years | 5.4 x 1018 years |
Table 1. Time required to search half of the key-space for various key sizes, and for different classes of hardware. [VDL97]
Table 1 is an expanded version of a table taken from J.C.A Van der Lubbe's Basic Methods of Cryptography [VDL97]. In 1997, Van der Lubbe believed that the parallel machine mentioned above would cost tens of millions of dollars to build. In 1998, a distributed system was implemented that comes close to this class of computer, for a cost of about $250,000 US [EFF99]. In other words, these machines are expensive, but not impractical for high-security purposes. Schemes like DES that use 56-bit keys are still thought of as secure for civilian applications, but it may not be long before workstation-class machines can break it in a reasonable amount of time.
The good news is that, as technology increases, so does our ability to create efficient systems with larger key spaces, but key space grows faster than technology. For example, if we double the number of bits in our key, we have squared the number of possible keys. As long as we keep introducing new systems that make use of this concept, then we should continually be able to create better ciphers, simply by expanding the key space (and their block size to match the key size) of the algorithms.
Another important note is that each key has an equal probability of being used. If the key space is not uniformly distributed, then attackers can use this information and guess the key faster than if they simply had to search for it. This means that keys should not be based on natural languages or other statistically rich alphabets, but generated randomly.
Up until the 20th century, all encryption systems were substitution ciphers (also called monoalphabetic), meaning that each letter in the alphabet always substituted with another. There are two reasons why this technique is weak. First, the the size of the key space is dependant on the size of the alphabet. If we have an alphabet S, then the size of the key space K can be no larger than the factorial of the size of the alphabet.
|| K || £ ( || S || ) !
The obvious solution to limitation is to increase the size of the alphabet, which can be done by treating groups of characters as individuals. In practice, the key space is often the same size as the alphabet, but both are so large that a brute-force attack is infeasible.
The second problem with substitution ciphers is that attackers can use information about the particular language to guess the key. For instance, in any natural language, the letter 'e' has a certain frequency of appearance (about 12.5% in the English language). An attacker can analyze the entire message and see which characters have the approximately the same frequencies as 'e' (and other characters in the alphabet) and can guess the key based on this information. The same concept holds for the formats of data files. In order to fend off such attacks, a cipher must be polyalphabetic, meaning that each character of our alphabet will map to one of many characters.
Simply mapping a character to a set of other characters is not enough, though. A given character should have an equal probability of being mapped to each character in the alphabet (including itself). In other words, if probability distribution is non-uniform, then the cipher allows an attacker to estimate a set of keys which would produce the given distribution. One simple way to do this is to make each character of the ciphertext ci dependant on the character before it ci-1, or on some arbitrary character before it ci-q , so that ci = G [ Tk ( M ), ci-q ] for some binary function G. Such systems are referred to as stream ciphers, since each character is dependant on all previous characters in the message stream. Another way to accomplish this is to make each cipher character dependant on some pseudorandom number. This is only partially effective, since the receiver must be able to generate the same sequence of random numbers to decode the message, it must also be assumed that the attacker could do so as well. Of course, the seed of the pseudorandom generator can be a function of the key, which is supposedly secret.
In terms of binary block ciphers, Claude Shannon [SHA49] introduced the concepts of bit diffusion and confusion. The amount of diffusion is rated by the average number of plaintext bits that each ciphertext bit is dependant on. Alternatively, confusion refers to the number of ciphertext bits that are dependant on a single plaintext bit. The two measures are highly correlated. The goal of diffusing and confusing systems is to make the statistical properties of the plaintext more uniform, and thus maximize the cost of a probabilistic attack on the system. The essential goal if these concepts is the same as with polyalphabetic ciphers, but applies to block ciphers as well as stream ciphers.
A certain class of polyalphabetic systems is known as a one-time pad. It gets its name because it uses a key (also called a pad) that is only ever used once. Suppose our message M is composed of characters (m1, m2, ... mi) . We could generate a random sequence of numbers (k1, k2, ... ki), then we can encipher each character of the message by adding it to its corresponding key character, and take the modulus by the size of the alphabet. Of course, we could also use other binary functions besides addition.
ci = ( ki + mi ) mod n
The there are two drawbacks to this system. First, its security diminishes fast if the same key is used multiple times. A new key needs to be generated for every message, and must be generated by a true random number generator. Conventional pseudorandom number generators generally repeat too often to be useful for this system. Since the sequence cannot be repeated, the key must be as long as the message. The true randomness of the key also implies that it must be transmitted to the receiver, since they generally cannot generate the same key on their own. Despite these setbacks, the one-time pad is still widely used.
Measures of absolute security are a more formalized analysis of the strength of a polyalphabetic system. Van der Lubbe [VDL97] talks about a concept that he calls the measure of uncertainty (also known as the measure of information) that a system affords to attackers. He defines such measures for both plain and ciphertexts, and for the key. For example, the measure of information in the plaintext is expressed as follows:
| n | |||
| H (M) = - | S | p (mi) log p (mi) | |
| i = 1 |
where p (mi) represents the uncertainty of occurrence of possible plaintexts. Van der Lubbe's model assumes that all mi are independent. This may not always be true, but it his concept does generalize to dependant cases. He defines the measure of information for the ciphertext H (C) and for the key H (K) in the same manner.
| l | n | |||||
| H (K) = - | S | p (kh) log p (kh) | H (C) = - | S | p (cj) log p (cj) | |
| h = 1 | j = 1 |
From these, we can derive conditional dependencies between measures of information. First, we derive what Van der Lubbe called the key equivocation, H (K | C) , as follows:
| l | n | |||
| H (K | C) = - | S | S | p (kh, cj) log p (kh | cj) | |
| h = 1 | j = 1 |
The above is a measure of uncertainty with respect to the key, for a given ciphertext C. In other words, this is an expression of the feasibility (or rather, lack of feasibility) of a ciphertext-only attack to determine the key. The lower this value is, the more trivial it is to determine the key. Similarly, the message equivocation, H (M | C), is a measure of uncertainty with respect to the message M, for a ciphertext C. This quantity is inversely related to the feasibility of an attack to determine the message, rather than the key. Message equivocation is defined as follows:
| n | n | |||
| H (M | C) = - | S | S | p (mi, cj) log p (mi | cj) | |
| i = 1 | j = 1 |
Finally, we define the key appearance equivocation as a measure of uncertainty with respect to the key, given both a plaintext and its corresponding ciphertext. It portrays the feasibility of a known-plaintext attack. We express it as:
| l | n | n | |||
| H (K | M, C) = - | S | S | S | p (kh, mi, cj) log p (kh | mi, cj) | |
| h = 1 | i = 1 | j = 1 |
All of the above expressions deal with the case when an attacker has only one plaintext and/or ciphertext. Van der Lubbe [VDL97] shows that the uncertainties of these quantities decrease geometrically as the attacker gets more texts. As the length of the message L increases, the attacker essentially has more text to work with. As Figure 1 shows, all aspects of uncertainty dissipate as the message gets longer. The only exception is for H (ML | CL), which approaches 0 as L approaches 0. This makes sense, since there are fewer possible messages if the length is small, so there is less uncertainty about what the message is. Notice in Figure 1 that the maximum uncertainty is at H (K), which is purely the uncertainty of the key, without any knowledge of either plaintext or ciphertext.
Figure 1. Uncertainty VS. Message Length for various equivocations.
This type of attack only relates to systems that use the same algorithm twice with two different keys.
C = Tk2 ( Tk1 ( M ) )
If our transformation T uses keyspace K, then a brute-force search on both keys would cost on the order of ||K||2 time. An attack is mentioned in Wagstaff's work [WAG99] reduces this to ||K|| time, though it also requires ||K|| space. This method also needs at least two plaintexts and their corresponding ciphertexts, one which we use to test, the others to verify a possible set of keys. The basic idea of this attack is to compute all Mi' = Tki(M), and store the pair (Mi', ki) in a large array, indexed by the value of Mi'. Then compute all Cj' = Tkj-1(C) and compare to the stored value of Mi'. If they are equal, then ki and kj are a possible key, and we can test them with other known text.
This is not so much an attack as it is a property of block ciphers in general. It is a common fallacy to assume that aggregating two cipher functions, each using different keys, will make the result stronger than the individual ciphers. For instance, one might think that C = Fk1 ( Gk2 ( Hk3 ( M ) ) ) would create a stronger encryption. The flaw in this concept is that one could find a single transformation C = Tk4 ( M ) that serves the same purpose as the aggregate. In practice, aggregating functions may make development of attacks more complicated, but once an attack is formalized, the aggregate function is no better than the strongest of its parts.
The details of differential cryptanalysis varies for different ciphers, as does the effectiveness of this type of attack. The basic idea is the same for all algorithms. It is a chosen plaintext attack which uses bit-wise differences between different texts to gain some information about the key. Suppose we have plaintexts M1 and M2, and we define M' = M1 XOR M2. We encrypt our texts and define C' = C1 XOR C2. The difference between M' and C' suggests a set of possible keys. With enough text, an attacker can narrow down the precise key.
This type of attack has the same underlying structure as differential cryptanalysis [BIH99]. It involves making linear approximations for non-linear functions. It is concentrated mostly on the DES system, and other non-linear systems. As such, I will reserve discussion of it until I present attacks on DES.
In 1974, the US Federal Department of Commerce called for a standard in general-purpose cryptographic systems. IBM responded with an algorithm that they had been working on since 1968. Their new system was a variation of the Lucifer system, which enciphers 128 bit blocks with a key of equal length. The new system used 64 bit blocks, and a 56 bit key (the actual transmitted key size is 64 bits, but 8 bits are used for parity). The reduction of key space does reduce the analysis time for brute-force attacks, but the complexity of the algorithm serves well against other types of attacks, often making them nearly as expensive as a brute-force attack. Much of the strength of DES comes from its use of unusual bit-displacement functions. As such, the DES lends itself nicely to hardware implementation, but makes software implementation rather tedious and inefficient.
IBM has decided not to release it's exact implementation details in an attempt to keep the system secure. The cryptographic community considers this decision rather devious, and generally wonders if there is a simple loophole that would allow the US Government to trivially crack the systeml. Despite this speculation, DES is widely used in North America.
Many algorithms, including IDEA and FEAL, have been developed on the principles introduced by DES. Also, there are several versions of DES that are simplified, with less iterative rounds and more simplistic permutations. Although they may parallel DES in their strength against brute force attacks, they are much more susceptible to statistical attacks like differential cryptanalysis.
The DES system is essentially a loop with 16 iterations, called rounds. There is also an initial permutation IP which the entire block is subjected to before entering the first round. Similarly, there is an inverse permutation IP-1 that is performed on the block after the last round.
Each round breaks the message block into two halves, L and R, and concentrates on only one half of the message block (say, the right half R). Each round also generates a 48 bit subkey Ki from the original 56-bit key. To do so, it uses a subkey function SK. The round will subject R to a transformation, which is a function F of the subkey and R. It then sets the left side L to XOR of itself and the result of F(Ki, R). Finally, the two halves are swapped so that the other half can be processed in the next round. Below is a block diagram of the algorithm, and shows how it is usually represented in terms of hardware.
Figure 2. Block diagram of the DES algorithm
Alternatively, the following pseudo-code shows how it DES is often implemented in software.
|
function DES_Encrypt (M, K) where M = (L, R)
swap(L, R) |
Notice that the left and right halves are swapped again after the for-loop and before the final permutation. Many explanations of DES omit this fact, and I only discovered it while testing my own code. In most hardware implementations (as in Figure 2), this swap, and the one after the last round are left out, since they cancel each other out. In software, this either increases the code size (if the final round is placed outside the loop, without the swap), or it increases the run-time (if an if-statement is used to check for the round number). One can easily add the final swap by reversing the order of the inputs to IP-1. For instance:
M ¬ IP-1({R, L})
rather than
swap(L, R)
M ¬ IP-1({L, R})
The algorithm for decrypting is similar. The only difference is the order in which the keys are used.
|
function DES_Decrypt (C, K) where
C = (L, R)
swap(L, R) |
To complete the algorithm, we have to define a few extra functions. First is the initial permutation IP, and its inverse. As well, we have to define a cipher transformation F to be used by the rounds. Lastly, we need to select a subkey, with the aid of the function SK. Some of these permutations have been reduced to simple functions, but some have only been explained in terms tables and matrices.
The initial permutation adds no strength to DES. It is a constant function, so undoing the permutation and its inverse is trivial for an attacker. The initial permutation can be described in a couple of ways. The first is in matrix form:
|
IP |
IP-1 |
|||||||||||||||
| 58 | 50 | 42 | 34 | 26 | 18 | 10 | 2 | 40 | 8 | 48 | 16 | 56 | 24 | 64 | 32 | |
| 60 | 52 | 44 | 36 | 28 | 20 | 12 | 4 | 39 | 7 | 47 | 15 | 55 | 23 | 63 | 31 | |
| 62 | 54 | 46 | 38 | 30 | 22 | 14 | 6 | 38 | 6 | 46 | 14 | 54 | 22 | 62 | 30 | |
| 64 | 56 | 48 | 40 | 32 | 24 | 16 | 8 | 37 | 5 | 45 | 13 | 53 | 21 | 61 | 29 | |
| 57 | 49 | 41 | 33 | 25 | 17 | 9 | 1 | 36 | 4 | 44 | 12 | 52 | 20 | 60 | 28 | |
| 59 | 51 | 43 | 35 | 27 | 19 | 11 | 3 | 35 | 3 | 43 | 11 | 51 | 19 | 59 | 27 | |
| 61 | 53 | 45 | 37 | 29 | 21 | 13 | 5 | 34 | 2 | 42 | 10 | 50 | 18 | 58 | 26 | |
| 63 | 55 | 47 | 39 | 31 | 23 | 15 | 7 | 33 | 1 | 41 | 9 | 49 | 17 | 57 | 25 | |
Table 2. Initial and final permutation matrices for DES.
The values in each matrix identify where each bit of the input message is mapped to in the output message. For example, The matrix for IP shows that the 58th bit from the input gets mapped to the first bit of the output; the 50th of the input maps to the second of the output, and so on. These matrices can be generated by simple algorithms, or alternatively the permutation can be done with simple routines, rather than with a table lookup algorithm.
|
IP |
|
|
|
|
IP-1 |
|
These algorithms are very easy to implement with hardware, as they are simply re-routed pairs of inputs and outputs. The software equivalent on a standard microcomputer takes many cycles to compute the tables (or to apply the permutation).
The first operation on the key is to reduce it from 64-bits to 56. Every eighth bit of the key is used for parity, so they are removed before we use the key for encryption. Then the key is subjected to a permutation similar to the initial permutation that is applied to the message. Since the removal of parity bits is trivial, it is often done in the same operation as the permutation. Table 3 describes the matrix used in this permutation. Again, this is easy to do in hardware, but not so efficient in software.
|
Initial Key Permutation |
||||||
| 57 | 49 | 41 | 33 | 25 | 17 | 9 |
| 1 | 58 | 50 | 42 | 34 | 26 | 18 |
| 10 | 2 | 59 | 51 | 43 | 35 | 27 |
| 19 | 11 | 3 | 60 | 52 | 44 | 36 |
| 63 | 55 | 47 | 39 | 31 | 23 | 15 |
| 7 | 62 | 54 | 46 | 38 | 30 | 22 |
| 14 | 6 | 61 | 53 | 45 | 37 | 29 |
| 21 | 13 | 5 | 28 | 20 | 12 | 4 |
Table 3. Initial key permutation matrix and removal or parity bits.
After this permutation, the key is split into two halves, C and D. After each round, each half is independently shifted to the left by either one or two bits, depending on which round is executing (see Table 4). The shift is rotational, so that bits that get shifted off of one end get placed back on the other end.
| Round | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| Shifts | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 |
Table 4. Key Shifting Factors for each round of DES.
Finally, the subkey function is used to convert the key into a 48 bit block, to be used in the actual encryption. Again, this is expressed in matrix form, as shown in Table 5. Note that the subkey is 48 bits, but is used to transform a 32 bit block of text. This will be explained further in the next section.
|
Subkey Permutation |
|||||
| 14 | 17 | 11 | 24 | 1 | 5 |
| 3 | 28 | 15 | 6 | 21 | 10 |
| 23 | 19 | 12 | 4 | 26 | 8 |
| 16 | 7 | 27 | 20 | 13 | 2 |
| 41 | 52 | 31 | 37 | 47 | 55 |
| 30 | 40 | 51 | 45 | 33 | 48 |
| 44 | 49 | 39 | 56 | 34 | 53 |
| 46 | 42 | 50 | 36 | 29 | 32 |
Table 5. Subkey permutation matrix.
As mentioned earlier, each round works on only the right half of the block, R. First, this 32 bit field is expanded to 48 bits to meet the size of the subkey. Again, there is a matrix to describe the expansion, but the following diagram is probably easier to understand.
| Input | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ... | 29 | 30 | 31 | 32 | ||||||||
| Output | 32 | 1 | 2 | 3 | 4 | 5 | 4 | 5 | 6 | 7 | 8 | 9 | 8 | ... | 29 | 28 | 29 | 30 | 31 | 32 | 1 |
Figure 3. Cipher expansion.
The output of the expansion is then XORed with the subkey for this round, the result of which we shall denote as R'. This 48 bit field is then broken up into eight groups of 6 bits each, which we will denote R1, ..., R8. Each group is an input for a unique substitution box, often abbreviated as S-box, which maps the 6-bit input to a 4-bit output. The exact derivation of these boxes are unknown, but tables have been used to qualitatively explain their behavior. Table 6, shows one of these tables, specifically for the first S-box. It's interpretation is rather simple. The first and last bits of Ri are used to determine a row number, and the four bits in the middle determine the column. The output of the S-box is then the value at that cell in the table. Van der Lubbe [VDL97] provides all eight tables.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
| 0 | 14 | 4 | 13 | 1 | 2 | 15 | 11 | 8 | 3 | 10 | 5 | 12 | 5 | 9 | 0 | 7 |
| 1 | 0 | 15 | 7 | 4 | 14 | 2 | 13 | 1 | 10 | 6 | 12 | 11 | 9 | 5 | 3 | 8 |
| 2 | 4 | 1 | 14 | 8 | 13 | 6 | 2 | 11 | 15 | 12 | 9 | 7 | 3 | 10 | 5 | 0 |
| 3 | 15 | 12 | 8 | 2 | 4 | 9 | 1 | 7 | 5 | 11 | 3 | 14 | 10 | 0 | 6 | 13 |
Table 6. Table representation of S-box 1.
The 4-bit outputs of all the S-boxes are concatenated and then subjected to a final permutation P, described by Table 7. This is the final step in the cipher function. The final output is then XORed with the left half of the text block L, and then the two halves are swapped for the next round.
| 16 | 7 | 20 | 21 |
| 29 | 12 | 28 | 17 |
| 1 | 15 | 23 | 26 |
| 5 | 18 | 31 | 10 |
| 2 | 8 | 24 | 14 |
| 32 | 27 | 3 | 9 |
| 19 | 13 | 30 | 6 |
| 22 | 11 | 4 | 25 |
Table 7. Cipher permutation
There is some speculation that DES has some crippling flaws in it, but any serious weaknesses have yet to be uncovered. Since DES has not yet been described by a system of linear functions, analysis of it is harder that other with other algorithms, especially for known-plaintext attacks. There are some attacks that can be done rather quickly on certain keys of DES, but any general attack is currently considered to be impractical. This will change in the near future.
There is one particular characteristic that gives DES its strength, but in return there are a few that weaken the system. This was a decided design trade-off, and seems to have paid off, as its weaknesses are either small or avoidable. These properties are explained below.
In DES, every bit of every block is dependant on every other bit of the block, as well as every bit of the key. It has been shown after the fourth round, this interdependence is complete [VDL97]. This effect serves two purposes. First, it raises uncertainty of the key, given a known plaintext. In other words, it increases the cost of a known plaintext attack. Second, one bit changes in the plaintext or key will potentially cause the entire ciphertext block to change, which.adds complexity to an attack based on differential analysis. This scattering effect is what gives DES its strength.
There is an interesting phenomena of most block ciphers, which allows an attacker to obtain the plaintext by repeatedly enciphering the ciphertext with the same key.
M = Tkn ( M ) for a specific value of n > 0
Such methods are generally as costly, or more so than brute-force attacks. The necessary value of n is dependent on the particular key used. For DES, most keys require a large value of n, making this type of attack expensive. However, two keys have been identified as being exceptionally weak, to the point where n = 2. These two weak keys are shown below, in hexidecimal and in their 64-bit form:
| 0101 0101 0101 0101 | 0101 0101 FEFE FEFE |
| FEFE FEFE FEFE FEFE | FEFE FEFE 0101 0101 |
DES also has a similar characteristic, in which two encryptions with two different keys yields the plaintext.
M = Tk1 ( Tk2 ( M ) ) for specific keys k1 and k2
These keys are referred to as semi-weak keys, and Bruce Schneier [SCH96] discusses six pairs of keys.
01FE 01FE 01FE 01FE and FE01 FE01 FE01 FE01
1FE0 1FE0 0EF1 0EF1 and E01F E01F F10E F10E
01E0 01E0 01F1 01F1 and E001 E001 F101 F101
1FFE 1FFE 0EFE 0EFE and FE1F FE1F FE0E FE0E
011F 011F 010E 010E and 1F01 1F01 0E01 0E01
E0FE E0FE F1FE F1FE and FEE0 FEE0 FEF1 FEF1
There are other possible weak keys that only produce four different subkeys, rather than the intended sixteen. To date, no specific attack has been found for these keys, but they are suspected to be weak. Schneier mentions 48 such keys in his work. Luckily, there are so few vulnerable keys that they can be avoided during key generation.
An interesting characteristic of DES is that if the plaintext and the key are negated before the encryption, then the ciphertext is the negation of what it would have been.
Tk ( M ) = C implies Tk ( M ) = C
Once an attackers has tested a key, he can trivially test its compliment. This, in effect, reduces the key space to half if its size.
This attack was the first to break DES faster than a brute-force attack. It was perfected by Biham and Shamir in 1991 [B&S91]. It concentrates on the S-box transformations in the last round to deduce groups of bits from the entire key. For the sake of brevity, the initial and final permutations of DES are ignored, as they have no effect on this analysis.
After the last round, we get the ciphertext C which is equal to the two output halves of the 16th round ( L16, R16 ). Also, the left side of the 16th round is equal to the right half of the 15th: L16 = R15. This gives us the plaintext for the last round, so an attacker can choose pairs of plaintext that have a fixed XOR difference, and compute their ciphertexts. If the output X of the S-box and the input E to the expansion function are known, then the key fragment for that S-box can be computed by K = E XOR X . The values of E and X become less uncertain as more pairs are tested. With the key fragments from each S-box, an attacker has 48 of the 56 key bits. The key can be completed by a brute-force attack on the last 8 bits.
A complication with this approach is that the attacker has to supply plaintexts to the first round that will eventually give the desired input to the last round. Since the key is unknown, this has to be done by trial and error. The attacker can initially choose random pairs of plaintexts to input to the first round. Some of these, referred to as right pairs, will suggest certain key bits to be present. Other wrong pairs will suggest random bits. With a sufficient number of trials, the true bits (signal) will begin to overcome the random bits (noise), and the attacker can effectively start choosing plaintexts that will yield the desired difference. Biham and Shamir found that, on average, 247 trials were needed for such a chosen plaintext attack.
Their contribution to the attack analyzes the way that certain blocks will travel through the first 15 rounds of DES. If the two halves of the input to any round is (Li, 0), then the output will always be (0, Li) after the two halves are swapped. Similarly, when input (0, 0x19 60 00 00) is subjected to a round, it will yield (0x19 60 00 00, 0) with probability 1/234, which has been found by trial encryptions with single round models of DES [B&S90]. Biham and Shamir referred to these as single round characteristics if DES rounds. If an attacker alternates between these two characteristics 15 rounds, probability of the texts remaining unchanged will be
(1/234)7(1)8 = 2.6031x10-17 ≈ 2-57
In other words, we will need an average of 257 chosen plaintexts to find the key this way. This is more expensive that an brute-force attack on 16 rounds, but DES-like algorithms with fewer rounds fall under such an attack. With some optimizations on the rounds, the attack could be reduced to about 255 chosen plaintexts.
A year later, Biham and Shamir found that they could look at the output of the first three S-boxes. The S-boxes would generate a total of 212 different outputs v0 ... v4095, so one could choose an arbitrary plaintext M and define cliques of plaintext (referred to as structures) which would satisfy the following:
| Mi = M XOR (vi,
0) Mi = M XOR (vi, 0) XOR (0, Y ) for some valueY 0 £ i £ 212 |
We are then interested in the pairs Mi and Mj, of which there are 224, and each pair has a difference of (vk, Y ). The left half of M XORed with Y will always produce the desired text for the next round of analysis, thus we choose Y to be whatever input we wish to give to the second round. The problem is that to analyze these pairs would tale too long. We can simply find Ci = Tk ( Mi ) and Ci = Tk ( Mi ) then sort each by the remaining 20 bits (that were not output from the first three S-boxes). We can then detect and discard all values that are not repeated for each of the pairs (Ci, Ci). Since they have a non-zero XOR, they cannot be right pairs. If we repeat this step for other structures (involving permutations of other S-boxes) then we can eliminate over 90% of the wrong pairs without removing any right pairs . This, in effect, raises the signal-to-noise ratio of the attack, and reduces the time requirement to 247 [B&S91] .
Another consideration is that Biham and Shamir's original attack consumed a huge amount of space, as it needed counters for each possible key (but only for 48 bits, as the rest are determined by brute force) in order to determine the most frequently suggested values. The new attack can be extended to use negligible space, as it determines sets of entire keys. These can be tested and discarded by trial encryption as they are generated. This method will also ensure that the true key is reported when the first right pair is found. Each structure can be analyzed independently, so the attack can be performed on up to 233 independent processors with linear speedup [B&S91]. Also, the probability of finding the key grows with more text, so structures can be analyzed as text is being gathered. The crippling factor of this attack is that it needs a huge amount of chosen plaintext.
This attack on DES was formalized by Matshuru Matshiu in 1992. Although this type of analysis is closely related to differential analysis, [BIH99]. Matshui described it as building up a system of linear equations and then solving the equations to find the key.
Suppose we have n plaintext pairs. Given 232 √(2n) random plaintexts and their ciphertexts, we then have (232 n))2 = 264m unordered pairs of plaintext. Since the block size is 64 bits, there are 264 different plaintext XORs. There are 264n / 264 = n pairs for each XOR value, so we need n pairs for each chosen plaintext. The XOR values are used to to form a linear approximation of the S-box output. The approximation is correct with some probability p. Irregularity in the S-boxes allows an attacker to choose bit positions from the text and key that give a probability of p ≠ 1/2. Whatever bias can be amplified by comparing XOR pairs of many known texts. Once the approximated system of equations is accurate enough, the key is found by solving this system.
Matshiu found that the key could be recovered with 243 known plaintexts. Although this is much better than differential analysis, the amount of text needed is still considered impractical.
DES is becoming an extinct encryption scheme. The technology that is currently available is rendering DES to be trivial to crack. In 1998, RSA Laboratories held their third contest to see how fast a ciphertext-only attack could be performed on DES. The Electronic Frontier Foundation and Distrubited.Net were able to produce a control computer which distributed operations to tens of thousands of machines across the Internet. The machine cost less than $250,000 US and took about a year to develop. The entire distributed system could perform 240 billion encryptions per second, and found the correct key in 20 hours and 15 minutes [EFF99]. The purpose of RSA's contest was to show that it won't be long before workstation-class computers will be able to crack DES.
There is need to develop new encryption systems. DES does have many strengths, but its small key space is inhibiting its security. It should not be difficult to expand DES to use a larger key. It would be simple to redefine the permutations and expansions of the DES algorithm. The challenge lies in developing new S-box mappings for larger key and block sizes. Also, increasing the number of rounds would hamper differential analysis. There has also been discussion of using new subkey algorithms that would use each key bit less often, though this would have no effect on Biham and Shamir's attack.
Another improvement is that the initial and final permutations could be replaced with a function that is co-dependant on the key and the text. This is similar to adding more rounds to the algorithm, and would add uncertainty to differential and linear analysis.
Here is my current implementation of DES. It's in Java, so as long as you have the Java Runtime Environment, you can run this. This implementation is being used in my current research on differential attacks to DES-like systems. I have tested it against Douglas Stinson's [STI95] example trials, and with NBF's test vectors, though I do not promise that it is bug free, or that it is very efficient. It's free, and you can only assume that it's worth as least as much as you paid for it. Sun Microsystems is distributing their own Java packages for a handful of mainstream encryption systems, including DES, all of which is probably more efficient and more thoroughly tested than my work.
In accordance with Canadian export regulations on cryptographic systems, the use of this code is not restricted in any way, provided that it is distributed freely and with the original source code with it.
With that, here is my DES source. Enjoy.
| Analysis Time | The expected time required for an attacker to break a cryptographic system |
| Brute-Force Attack | An attack on a system that involves trying every possible key until the correct one is found. This is the most expensive type of attack. |
| Round | A single iteration of an iterative cryptographic algorithm. |
| Steganography | The science of hiding messages so that an
outside person would not even know that the message exists. In contrast,
Cryptography is the science of making a message obscure or illegible. Some
simple examples of Steganography are:
|
| Substitution Box (S-Box) | A component of the DES system that maps 6 input bits to 4 output bits. In DES, there are eight such boxes, and they have only been explained in terms of look-up tables. |
| [B&S90] | Biham, Eli and Shamir, Adi; "Differential
Cryptanalysis of DES-Like Cryptosystems", Wiezmann Institute of
Science, 1990 http://www.cs.technion.ac.il/~biham/Reports/Weizmann/cs90-16.ps.gz |
| [B&S91] | Biham, Eli and Shamir, Adi; "Differential
Cryptanalysis of the Full 16-Round DES", Technion Department of
Computer Science, 1991 http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1991/CS/CS0708.ps.gz |
| [BIH99] | Biham, Eli; "On Matshiu's Linear
Cryptanalysis", Technion - Israel Institute of Technology http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1994/CS/CS0813.ps.gz |
| [EFF99] | "DES Challenge-III Broken In Record 22 Hours",
Electronic Frontier Foundation. http://www.eff.org/pub/Privacy/Crypto_misc/DESCracker/HTML/19990119_deschallenge3.html |
| [SCH96] | Schneier, Bruce; Applied Cryptography; John Wiley & Sons, 1996 |
| [SHA49] | Shannon, Claude; "Communication Theory of Secrecy Systems", Bell Systems Technical Journal, Issue 4, 1949 |
| [STI95] | Stinson, Douglas R.; Cryptography: Theory and Practice, CRC Press, London, 1995 |
[VDL97] |
van der Lubbe, J.C.A; Basic Methods of Cryptography; Cambridge University Press, New York, 1997 |
| [WAG99] | Wagstaff, Samuel S.; "Cryptanalysis", Algorithms and Theory of Computation Handbook, CRC Press, 1999 |
Schneier, Bruce; "Self-Study Course in Block Cipher Cryptanalysis",
Cryptologia, v.24, n.1, Jan 2000, pp. 18-34,
http://www.counterpane.com/self-study.html
Anonymous; "Cryptography FAQ", news.sci.crypt.org,
http://www.faqs.org/faqs/cryptography-faq/
Burbridge, Mark; "An introduction to Cryptography",
http://www.ftech.net/~monark/crypto/