Encryption and Cryptography

Encryption & Cryptography

Just prior to the whole, industrial/scientific revolution; there was another breakthrough and advancement in knowledge. [This citation needs to be added.] This breakthrough is cited as a necessary prerequisite for the industrial/scientific revolution. That breakthrough was the deciphering and translation of ancient languages. Attributing such a breakthrough as a major advancement in human intelligence may seem a bit much if not absurd. However, many if not all of the techniques of decryption were used to achieve these translation / decryption feats. This knowledge raises the level and ability to understand and communicate. Language does have a logical component to it. Things need to make sense. And, the goal of language is to explain or convey a message to others. There is a lot of math and science involved in language, linguistics and encryption. One must have this level of knowledge before further knowledge is possible.

Linguistics and encryption are intertwined. It is wise to have some knowledge of linguistics if one is going to pursue the study of cryptography.

I am not going to go in depth into encryption or cryptography. These are disciplines unto themselves. You can gain the basics and some intermediate/advanced theory/knowledge from very good existing texts. Phil Zimmerman’s web site is an excellent resource. This web page [A direct link needs to added], is an easy read and covers all the basics of encryption. I will mention some very significant topics that you should know. While you should know something about encryption and cryptography, you will probably not being doing this on your own. You can if you want to. Most likely, you will be using a tool, that will do this for you. If you are so inclined, be my guest.

This subject does come with some recommended readings. You should read the book, Codebreakers, by David Kahn. This is an excellent, authoritative history of encryption and cryptography. You will get the basics. You will learn about frequency analysis. The letter ‘e’ is the most common letter in the alphabet. Every language has it’s own frequency. You look for the frequency of symbols to crack the code. You will learn the fundamentals of code breaking. You will learn progressively about simple codes and then more complex codes, all the way along learning how those codes are cracked. Always, looking for the frequency. Then, you will learn about “One Time Pad”, the code that can not be cracked. I will explain that soon.

Caesar Code. This is a very simple and ancient code. I am going to describe it because of the fundamental knowledge involved. More advanced readers may skip this sub-section. The code substitutes one letter with another letter in the alphabet, so many letters before or after the original letter. An example.

This is a sentence.” Becomes: “Uijt jt b tfoufodf.”

Let us look at this from a different perspective.

If all the letters in the alphabet are assigned a number, so that A-Z corresponds to 1-26, then the code is to add one (+1) to the letter. In math terms, N+1.

In tabular form:

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

To crystallize your understanding, let us put the example of the Caesar code into a table:

T

h

i

s

i

s

a

s

e

n

t

e

n

c

e

.

20

8

9

19

9

19

1

19

5

14

20

5

14

3

5

92

U

i

j

t

j

t

b

t

f

o

u

f

o

d

f

21

9

10

20

10

20

2

20

6

15

21

6

15

4

6

93

You can see the basic pattern and get the idea. Most codes operate with some similar methodology. A mathematical equation alters the designated number for a symbol, to another number. The math can get very complex but. the basic idea remains the same. There are more intricacies than discussed here. Capitals, in computer codes, often have different numerical values than minuscule letters. Why is the period a 92? Because, in order to code any letter or symbol, then every letter and symbol must be given a number. I will explain this more in depth latter, when I discuss the ASCII code.

Another recommended reading, Applied Cryptography by Bruce Schneier. This is an excellent book that covers many types of encryption from a technical perspective. Program listings in C are included in the book. The book explains that the programs are printed on paper but not available on CD in order to comply with U.S. export laws. For the same reason, the book is not available digitally.

Another recommended reading is Phil Zimmerman’s book about PGP. [This section incomplete.] PGP stands for Pretty Good Privacy. It is an ecnryption originally developed for email, which can also be used for other purposes such as archival files. It is a very good encryption method. It is worth studying. You can get the source legally & for free, online.  [This section is incomplete.]

For this discussion, you need to know that plaintext means ordinary, unencrypted text, not encoded writing. Ciphertext is encrypted, encoded text. In the clear means not encrypted or encoded.

Some more notation. Encryption and decryption are sometimes considered as mathematical functions. So, the mathematical notation of functions is used to describe the process. E(P) = C means the function “E”ncrypting the “P”laintext equals [or produces] “C”iphertext. Conversely, D(C)=P means the function “D”ecrypting the “C”iphertext equals “P”laintext. “M” is often used for message. So, E(M)=C and D(C)=M.

As a hobby and for fun, you might want to do some cryptograms or; program some encryption or decryption routines. I would say, that such exercises are not a necessary on-going thing. Once you have the hang of it, that’s enough.

As you will learn, coming up with your own code is really, really hard. Unless you’re a math whiz, leave it to the professionals. There is however, a bigger reason to use existing encryption software and not to write your own private encryption programs. Encryption software is best crowd sourced.

In cryptography, there is a rule called “Kerckhoffs’ Principle”. This is also known as Kerckhoffs’ desiderata or assumption or axiom or law. Augustine Kerckhoffs was a 19th Century Dutch linguist and cryptographer. Linguistics and cryptography go together. More about that latter. He was a professor, taught German and in 1883 published 2 articles about cryptography. His article “Military Cryptography” [in the original French, “la Cryptographie Militaire”] appeared in the Journal of Military Science [“le Journal des Scien ces Militaires”].

Kerckhoffs discussed all the encryption schemes and cryptography equipment of his day. He listed 6 things that a good cryptographic system should have:

  1. The system should be theoretically or practically unbreakable.

  2. The design of the system should not require secrecy. Meaning, if someone finds out the code, the system should still be secret.

  3. The key should be memorable, without notes and should be easily changeable.

  4. Encrypted messages should be able to be sent by telegraph.

  5. The cryptographic device should be portable and require only 1 person to operate it.

  6. The encoding method should be easy to use. There should not be a long list of rules or a lot of mental energy required.  [This section requires a citation.]

Rule #1 seems like a bit of an oxymoron. It’s code. No one should be able to break it. That’s the whole idea. But, there is theoretically and practically unbreakable. As knowledge and machinery, especially computing machinery advances, “practically unbreakable” can and, often does, become obsolete. However, for the time being or, long enough, the code should be unbreakable.

For example, in the field, a military commander may need to pass on instructions, to his soldiers. This requires secrecy. But, the instructions only need to be secret until the instructions are executed. Once done, they are not a secret any more. So, the code need not necessarily be that difficult to crack. So long as, the time required to decode gives privacy for enough time. Archives, on the other hand, will require much more difficult codes.

As for theoretically unbreakable, there is a code, one code, that is theoretically unbreakable. David Kahn tells this story, in detail, in his book. In the 1930’s, a man discovered that if the key was as long as the message and; the key was picked randomly, it has no repeating sequences and; the key was unique and; the key was never reused, then, the message is theoretically not decryptable. This is called “One Time Pad” or OTP.

David Kahn relates that Nazi spies, in the USA, prior to WWII used OTP, correctly and successfully. None of their messages were ever decoded.

The Soviets also employed this technique of OTP. However, in one famous incident, they reused the pad. Twice is not once! With much effort, American agents decrypted the messages. This was referred to as Venona. People often refer to this incident and then say, OTP can be broken. Twice is not once. It was not a one time pad if it was used twice. Get it?

[For a detailed history of One Time Pad, I refer you to either “Codebreakers” by David Kahn or; the Wikipedia entry for “One Time Pad”.]

OTP does have drawbacks. Making sure you never reuse a key for one. Distribution of keys. Coming up with unique keys.

There are alternatives to OTP. These are cryptographic schemes that take so long to decrypt, even with computers, as to be impractical to decrypt.

What Kerckhoff stated as Rule #2 in his famous cryptographic article, is commonly referred to as Kerckhoffs’ Law. Everybody should know the cryptographic scheme. Because, even if people know how the code works, the code should be good enough to still keep things secret.

Look at OTP above. I just told you how it works. If you use a unique key, which is random, keep the key secret and use it only once, the message can not be decrypted. Even though I know the method, this does not help me decrypt the message. Without the key, I can’t decrypt the message.

There is another advantage to everyone knowing the method. If the method is not secure, if the method can be cracked, someone will figure it out—eventually. If you haven’t figured out how to crack your method, someone else will. This is a good thing. Because, then we all know the method is not a good one.

Also, it is quite common, for people to write their own cryptographic routines; think those routines are strong and secure; only to have someone else, a security researcher, figure out, that that routine is not secure.

Obscurity Security. To demonstrate the weakness of relying on secrecy of the method, instead of the strength of the key, I will now discuss, obscurity security. Obscurity security relies on little known or private facts or secrets. For example, medieval locks did not rely on strong locks as we now have. Medieval locks were often easy to pick. What was relied on was, ornate designs that hid the keyhole. But, if you looked long and hard enough, you could find the keyhole and pick the lock.

Another example of obscurity security, is relying on internal information such as a company’s phone directory, to verify that someone works for the company. Kevin Mitnick discusses this at length, in his book, Ghost In The Wires. One can tell by caller id, where someone is calling from. If the call appears to be coming from within the company, then the caller should be an employee and entitled to confidential information.

This is like someone telling you, “Joe sent me. I need the … for Sam’s surprise birthday party tomorrow. I have to decorate the room at … with …” That someone certainly appears to be hooked in to the party, doesn’t he? Or, is he a party pooper or practical joker who happened to have found out about the party from someone else?

What Kerckhoff stated as Rule #3 Is wise to remember. I will discuss this more, when I discuss the subject of passwords. There is a big problem with people not being able to remember their passwords. So, they write their passwords done on Post-Its and tape them to their computer screens. This has even been done by generals in the Pentagon. [This section is missing a citation.] This is like leaving the key under your doormat with a big sign on the front door, “Key Under Mat”. 🙂

What Kerckhoff stated as Rule #4 is no longer relevant.

What Kerckhoff stated as Rule #5 still applies. But, with computers and smartphones, it is commonly enforced.

What Kerckhoff stated as Rule #6 still applies. Again, with current technology, this is commonly enforced. The writing of the encryption software may be quite complicated. But, once in production, once it is being used; it is easy to operate. You just type in your password, hit enter and the computer or smartphone does the rest.

* * * * *

If you want to try writing your own cryptography programs, either known or common methods; be my guest. It is a very good learning experience. Just remember, as a rule, security professionals usually leave the writing of cryptography programs to specialists, cryptograhpers. It is best to rely on those who specialize in the writing of cryptography programs, rather than doing it yourself. If you are seeking security certification, you will need to study cryptography in depth.

I can recommend an online course, for as long as it runs. Cryptography I & II by Dan Boneh, on http://www.Coursera.Org You will get the basics. You need to know a bit of math. There are references to online books for number theory. This is college level. The course is repeated often. This course gives a very good foundation in cryptography. There are programming assignments.

You can get a certificate with or without taking the programming assignments.

There are no age requirements. In one of the discussions on the classes forum, a young teenage girl said she had taken the course 13 times! She was going to learn this stuff! Go for it girl! If you want to become a cryptographer, all the power to you.

What is important is that you know about the different types of security algorithms (methods) available. Have weaknesses been found in these methods? You do not necessarily need to know in-depth the internals of how the methods work. If there is a way to crack the code, that will probably be public. For example, there is software “John the Ripper”, that decrypts a lot of different coding methods.

You should know what the different security settings, encryption choices stand for. You should pay attention to what is being done in the field of cryptography; what are the current issues researchers are looking into. You should follow current events in the encryption field.

If new weaknesses have been found; if new methods of cracking codes have been found; you should know. [References to publications and other sources need to be added.]

What you need to know is what security settings are available from which products. For example, there are usually 2 kinds of encryption for Wi-Fi. One of the three, WEP, WPA, or WPA2. Do any of these have known weaknesses? [Yes.] Is 2 greater than 1? [A fair assumption.] If you find a network or software, using weak encryption, you should change that setting to something stronger.

You will work with encryption. You will need to know about encryption. But, you probably will not be doing cryptographic or encryption work yourself.

You should get up to speed on the subject of privacy. You should read and listen to both sides of the story, from different perspectives, from different individuals and different types of individuals. Personally, I feel “Data and Goliath” by Bruce Schneier is a good place to start.

Letter

Morse Code

E

.

T

O

– – –

A

. –

N

– .

I

. .

S

. . .

Q

– -.-

Huffman coding. The idea or basic concept of Huffman coding is ubiquitous in encryption and applied in many ways in computing. You should know what Huffman coding is. The idea is that when you code, the most frequent or common thing is encoded with the smallest replacement symbol. The next most common thing to encode uses a slightly bigger replacement symbol. And, so on.

Morse code is a good example of this. The letter ‘e’ is the most frequent letter in the alphabet. It is replaced with a dot in Morse code. The letter ‘t’ is the 2nd most frequent letter in the alphabet. It is replaced with a dash. Notice how the more frequent letters have smaller and shorter codes, while uncommon letters have longer codes.

This concept is applied often in encryption. It is used in the common picture encryption format JPEG.

Funny I should call JPEG encryption. It is compression. You are making things smaller. The file size shrinks. But, inherently, compression is a kind of encryption. Does a JPEG file look like a picture? No. You need a computer, with the right software, graphic software, to read a JPEG and render it, turn it, into a picture. So, it is encryption. It is a code. It just isn’t a secret code.

This idea of codes that are not secret is a very fundamental part of computers, email, Internet, and networking.

Codes Are an Essential Part of Computer Engineering

My pet introduction to computer science is, that computers are stupid. All they can do is tell you one thing: If a switch is on. They can’t even tell you if a switch is off. You just know it’s off because the computer can’t tell you it’s on.

So, we can get only 2 types of information from a computer. Is a switch on or off? Which is commonly referred to as 1 for on and 0 for off. Off is 0 because there is no electricity, no light bulb on. In the very first electronic computers, literally, there were light bulbs that were turned on. The pattern of off and on lights represented a code 0’s and 1’s that were translated in letters and numbers.

This is why computers are binary. Computers handle only 2 things. Computers speak a language of only 2 symbols or “sounds”. For us humans, this is easily represented as binary and base 2. However, base 2 gets unwieldy for us humans. Too many details for most of us to figure out. Base 2 is easily converted to base 16 or hexadecimal. So, we convert base 2 to base 16 so we humans can read “computerese” more easily. This requires 8 bits. Hence, a byte—the minimum recognizable computer data to humans—a byte is 8 bits.

Codes are fundamental to the functioning of computers. The translation of all those on and off switches into letters and numbers, is a code. Converting all those programs from a bunch of symbols—letters, numbers and other signs—into a bunch of on/off switches that the computer understands as instructions, is a code. Grouping all these on/off switches into files and directories, the grouping of on/off switches to make signals that appear as images on your screen or printouts with text or pictures—is a huge set of codes. Without codes, computers can’t work.

However, a computer can tell if those switches are on or off really, really fast and with lots and lots of switches. We take combinations of those switches and use them to refer to letters. The code table for the combination of switches to the alphabet and other symbols, such as numerals, punctuation marks and some control codes for the computer, is known as the ASCII table.

The ASCII Table – There are 256 symbols in the ASCII table. Every letter, numeral and punctuation symbol is assigned a number. Some symbols are reserved for computer operations, such as ringing a bell and advancing to the next line [on a screen or page]. Only half the table is used for common written language. The other half is reserved special characters [trademark, degree, mathematical symbols], graphic characters [bars and dots] and optional extra language alphabets [Hebrew, Greek, Cyrillic, etc.]

The capital letters A-Z are 65-90, respectively. The minuscule letters a-z are 97-122, respectively. Numerals are 48-57, respectively. Punctuation marks are scattered throughout. (The reason for this is a bit advanced data processing and not in scope of this discussion.)

Google “ASCII table” for a sample ASCII table. Most books on programming contain an ASCII table as a reference. There’s no need to memorize the ASCII table. But, you will be working with it. Recognition of common characters, letters, numbers, punctuation marks and certain common control codes will come to you as you work with computer data.

It is important that you understand what the ASCII table is. You will be working with it often.

Bits – Computers are built on bits. Bits are single units of information. 0 or 1. The “state” of a switch. If a switch is off or on.

Bit manipulation – is essential to encoding and decoding in computers. That should be intuitive. You have to be able to play with and swap the bits [or letters, or numbers, or symbols] in order to take intelligent information and make it unintelligible.

Let us return to the previous example of a Caesar code.

T

h

i

s

i

s

a

s

e

n

t

e

n

c

e

.

65

104

105

115

105

115

97

115

101

110

116

101

110

99

101

92

U

i

j

t

j

t

b

t

f

o

u

f

o

d

f

66

105

106

116

106

115

98

116

102

111

117

102

111

100

102

93

Notice that ‘T’ and ‘t’ have different numbers. Now, it should be clear why a period is a ’92’. Because, the numerical value of a ‘.’ is 92 in the ASCII table.

Using a 20 digit prime number, 48112959837082048697, as a unique, non-receptive key; adding the value of each digit to each number representing a character, we produce an encrypted message:

T

h

i

s

i

s

a

s

e

n

t

e

n

c

e

.

65

104

105

115

32

105

115

32

97

32

115

101

110

116

101

110

99

101

92

32

+4

+8

+1

+1

+2

+9

+5

+9

+8

+3

+7

+0

+8

+2

+0

+4

+8

+6

+9

+7

69

112

106

117

34

114

120

41

105

35

142

101

109

118

101

114

107

107

101

39

i

p

j

u

r

x

)

i

#

Ä

e

m

v

e

r

k

k

e

We learn from this a few things:

For one, a space is a thing in computers and has its own symbol. It is a common symbol as you can see.

Also, we see the value of using a random key.

Notice how common symbols are translated differently at different positions. No two spaces are translated to the same symbol. Sometimes the letter ‘e’ remains an ‘e’. Sometimes, it becomes a ‘b’. One time adding a zero. The other time adding a six.

Notice that the encrypted letter ‘k’ does not necessarily revert to the same letter. Sometimes it reverts to an ‘e’ and other time to a ‘c’. Also, notice the key is as long as the message. Just as OTP requires.

Hopefully, the value of using prime numbers in encryption is starting to congeal for you.

In Caesar code, what happens with the ‘Z’? What is the next letter? The letter after ‘Z’? Is an ‘A’. We cycle around.

Let us return to bits and see how we would use a Caesar code with bits.

There are only 2 bits: 0 & 1. This gives us the following possibilities:

0 0 1 1

+0 +1 +0 +1

― ― ― ―

0 1 1 0

XOR – is a logical or truth function. It is used in binary math, base 2, to ensure that 1+1=0 Only 1+0=1 or 0+1=1. As you can see, this achieves the same result as a Caesar code. XOR is used for the mathematical explanation of encryption and the study of the science of cryptography. Using base 2 simplifies the math from base 26. Encrypting the English alphabet uses base 26 because…there are 26 letters in the alphabet. The fundamental concepts are the same no matter what the base.

This is very rudimentary stuff just to have some basics idea about encryption. You need to do follow up readings to have an in depth knowledge of the subject.

Steganography

There is another kind of encryption that is worth mentioning and used in computers. It is hiding a message or picture within a picture. This is often achieved with placing dots in the right places or coloring parts of objects so a pattern will emerge. “Reading” the message is often done with a “mask” that blacks out the rest of the image.

One example is a book code. A certain book and page is selected. A card or piece of paper—as a mask, is laid over the page. Holes are cut into the mask so that the words of the message appear through the holes.

A computer example is to hide a JPEG picture within another JPEG picture. A computer screen is made up of pixels. Each pixel is a dot on a screen with a specific location and color. Each color is represented by a number. A mask will change all the colors in the correct locations to produce the “secret” picture. Often, the secret picture is embedded as only one or two bits of the color of each pixel.

Commonly, the “public” image will be a mandala or some other geometric “art” image.

I am not going to go into steganography. It is not a common method of encryption and not relevant to the bulk of cyber security. It’s use is commonly for the illegal transmission of child pornagraphy. It could be used to transmitting embedded secret documents or messages. However, as other methods are much more effective, such use, is not commonly heard of.

Watermarking. Steganography does have an application for the authentication of documents.

Hashing. Hashing is another way of providing authentication. Hashing sums up all the bytes, based upon some method. The end result is unique or, unique enough to identify an electronic document. Dan Boneh, in his course, gives a very good explanation of hashing. In “Applied Cryptography”, Bruce Schneier goes into hashing in depth.

As a cyber security professional or digital forensic analysis, a hash is something you would apply or verify; in order to authenticate a digital document. You would use some hash method made by a cryptographer to make sure that the document is legitimate. An example of this is registration forms on web pages with banks. Also, many online applications use this process to acknowledge “signing” the terms or agreement. You type a signature and enter the date. Based upon some algorithm (method), usually using your name and date and perhaps some information from the form itself, a unique number—a hash, is generated and used as a ” digital signature”.

Man in The Middle (MTM)

The “Man in the Middle” attack is very important. It is often used. It is so common, that any kind of computer or network system with passwords, should be tested for and be impervious to this kind of attack. You will use it! You will want to know this!

Think of MTM as an evil mail carrier imposter. Can you imagine what would happen, if someone, working at the post office would empty out packages and send on an empty box? What if they took out the checks from envelopes and replaced those checks with checks for less money? What if, instead of delivering Valentine’s to your desired sweetheart, they swapped the romantic invitation to another person? And, they keep on doing the same with the returns.

When keys are made, they need to be distributed. If an attacker, can intercept a transmission key from a sender, in the middle of the process, before the key reaches the receiver and; substitute the attacker’s key for the sender’s key; then, the attacker is distributing the keys; in control of the keys and; the attacker can read all the messages.

This very much applies to “Public Key Encryption” and other asymmetric key encryption methods. I will discuss more in the section on Passwords.

Brute Force. This means to guess every possible key until one works. Not a very sophisticated or a scientific approach. But, with the advent of computers, this technique works for previous manual forms of encryption like Vigenère encryption. The defense, of course, is to use encryption that a computer can’t guess either.   [This section is incomplete.]

Noise.   [This section is unwritten.]

If there’s a backdoor for you, there’s a backdoor for me.”

Givon Zirkind

Back doors.   [This section is unwritten.]