Matasano Solutions: Set 1

Original Challenge Description from matasano

Break repeating-key XOR

There's a file here. It's been base64'd after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
```
this is a test
```
and
```
wokka wokka!!!
```
is 37. Make sure your code agrees before you proceed.
For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
Solve each block as if it was single-character XOR. You already have code to do this.
For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.

My Solution

I created a small program with the help of wiki and it calculated the 37 correctly. Below are the program snippets.

Looking at the problem, we need to create hamming distance between different size blocks, so the above functions would need to be modified to accept the length as a parameter. Also, we can not rely on null byte at the end.

Here is the updated version

After getting the file 6.txt, I converted to rawbytes with below command

Now we are ready to run our hamming program to guess the key size. I fed the raw bytes to hamming program and it looked like this

Based on this I thought the key length is 3 or 5. I ruled out 2 because 2 was the minimum even when a test file was encrypted with "ICE". Anyways assuming the key length is 3 we need to get every fourth byte in a stream and try our "bruteforce" program used earlier.
We can write in C but this can be easily done in shell as well. Our friend "xxd"can used together with "cut" and "tr"

-p produces plain hex and -c indicates the bytes/octets

To get the transpose , we need to get only first bytes from every line and then the subsequent one.
We will start with just first one and check the "English Score" So "xxd", "cut" and "tr" combined like above give us what we wanted. We can change the "cut" parameter to 3-4, 5-6 etc for the remainder streams. Lets first brute-force this.

This doesn't look much like real message so lets run again with key size of 5

This is not a whole lot different. lets try another key size. Upon closer look at out hamming program above key size 29 is next, so lets try that.

All of them are valid English. I think we got it! btw, I modified the brute-force program to indicate the actual ascii character which is used to brute-force. This will help us reveal the key directly without any look ups.
Now lets try the second block

Based on the above two key starts with "Te". Now its time to write another shell script to do this 29 times and extract the key

The variable is the parameter to "cut" which needs to be repeated 29 times. The above script does its job and now we just need to extract the key which is at the end following 2 or 3 numbers.

This makes it easy to copy the key ( although its mixed with the shell prompt). Now that we have the key lets decrypt the file

7 comments:

HanFebruary 27, 2017 at 8:03 AM
Hi, Your solution is very clear and useful. But I wonder why the Normalized was computed by (j1+j2+j3 + **j4/4**)? Is it a mistake?
cryptovillaMarch 1, 2017 at 1:06 PM
Glad you found it useful.I think that was a typo which I will try to correct. Thanks for pointing that out.
UnknownMarch 8, 2017 at 12:42 AM
This comment has been removed by the author.
UnknownMarch 8, 2017 at 12:42 AM
Hi, I'm a newbie. I wonder why we can find the repeating-key by
"6.Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
7.Solve each block as if it was single-character XOR. You already have code to do this.
8.For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key."
So would you mind to help me know clearly?

-Haniz
MatthewJuly 31, 2020 at 9:33 PM
This comment has been removed by the author.

Matasano Solutions

Thursday, February 4, 2016

Set 1 - Challenge 6

Original Challenge Description from matasano

Break repeating-key XOR

My Solution

7 comments: