Thursday, February 4, 2016

Set 1 - Challenge 6

Original Challenge Description from matasano

Break repeating-key XOR

There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
Here's how:
  1. Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
  2. Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
    this is a test
    and
    wokka wokka!!!
    is 37. Make sure your code agrees before you proceed.
  3. For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
  4. The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
  5. Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
  6. Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
  7. Solve each block as if it was single-character XOR. You already have code to do this.
  8. For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.
This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.

My Solution


  • I created a small program with the help of wiki  and it calculated the 37 correctly. Below are the program snippets.



  • Looking at the problem, we need to create hamming distance between different size blocks, so the above functions would need to be modified to accept the length as a parameter. Also, we can not rely on null byte at the end.
  • Here is the updated version

  • After  getting the file 6.txt, I converted to rawbytes  with below  command




  • Now we are ready to run our hamming program to guess the key size. I fed the raw bytes to hamming program and it looked like this


  • Based on this I thought the key length is 3 or 5.  I ruled out 2 because 2 was the minimum even when a test file was encrypted with "ICE". Anyways assuming the key length is 3 we need to get   every fourth byte in a stream and try our "bruteforce" program used earlier. 
  • We can write in C but this can be easily done in shell as well. Our friend "xxd"can used together with "cut" and "tr"
-p produces plain hex and -c indicates the bytes/octets


  • To get the transpose , we need to get only first bytes from every line and then the subsequent one.
  • We will start with just first one and check the "English Score"  So "xxd", "cut" and "tr"   combined like above give us what we wanted. We can change the "cut" parameter to 3-4, 5-6 etc  for the remainder streams.  Lets first  brute-force  this.



  • This doesn't look much like real message so lets run again with key size of 5


  • This is not a whole lot different.  lets try another key size. Upon closer look at out hamming program above key size 29 is next, so lets try that.


  • All  of them are valid English.  I think we got it!  btw, I modified the brute-force program to indicate the actual ascii character which is used to brute-force. This will help us reveal the key directly without any look ups.
  • Now lets try the second block


  • Based on the above two key starts with "Te". Now its time to write another shell script to do this 29 times and extract the key


  • The variable is the parameter to "cut"  which needs to be repeated 29 times. The above script does its job and now we just need to extract the key which is at the end following 2 or 3 numbers.


  • This makes it easy to copy the key ( although its mixed with the shell prompt). Now that we have the key lets decrypt the file 




7 comments:

  1. Hi, Your solution is very clear and useful. But I wonder why the Normalized was computed by (j1+j2+j3 + **j4/4**)? Is it a mistake?

    ReplyDelete
  2. Glad you found it useful.I think that was a typo which I will try to correct. Thanks for pointing that out.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi, I'm a newbie. I wonder why we can find the repeating-key by
    "6.Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
    7.Solve each block as if it was single-character XOR. You already have code to do this.
    8.For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key."
    So would you mind to help me know clearly?

    -Haniz

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete