intelliproject logo

Location: Computer science - Algorithms    License: The Creative Commons Attribution-ShareAlike 2.5 License

Information security, privacy and secrecy

Posted by Seb Renauld

After reading this article, I hope that you will know why it is common practice to salt hash functions, which hash functions to use, and how to prevent data leaks from databases.

Skill: Advanced

Posted: 19/11/2008

Views: 597

Rating: 5.00 /5

Popularity: 0.00

Sign Up to vote for this article

Introduction

You would be amazed at the number of times someone asked me this question. I even asked myself this very question the first time, too: why, when comparing passwords in a database, are the passwords always hashed? Wouldn't it be so much better to just plaintext-compare them?

This question's answers are rooted both in complexity and security theories. This article should teach you why passwords – and other dangerous data – are hashed, which types of hashes exist, how to use hashes (intelligently), and more importantly when to use hashes.

What are hash algorithms?

A hash, or message digest, converts a text of variable length to a single, fixed-size “signature” of the message. This signature is unique for each file, and ideally should be hard enough to find so that no-one would be able to forge it. This hash function also has an added property: it is not a bijection – in other words, you cannot go from one digest to a unique “source” text.

This gives the function both incredible properties, and daunting consequences:

- The good bits:

First of all, since a hash algorithm has no parameters other than a message (no keys), the same algorithm will output the same value everywhere regardless of where, how, or when the algorithm is called. This is, I reckon the whole point: in cryptography, one uses a hash algorithm to digitally sign, or account for, a message.

Secondly, as a performance bonus, you can store long messages (or at least, their digest) in less database space!
And finally, it might not seem like much of a bonus, but the hash value can be an easy way to seed another algorithm. For example, algorithms such as WAKE require a 256bit key. Guess what? The length of a MD5 hash is of 256 bits. This saves us remembering a very long key.

- The bad bits:

Anyone who has done maths and set theory will know at least one of the bad bits. Imagine the perfect hashing system: a message of infinite length will be broken down to 2n  bits. Obviously, one cannot get a one-to-one mapping between an infinite and a finite set; ultimately, there will be two values for which the computed hash will be the same. This poses us a problem, especially when dealing with passwords. A user might “stumble” on a false password but still be allowed in!

A second problem is that, since a hash has no key, one can compute an endless stream of hashes on their own computer, and if they know the final hash in the database, they can find back which message was hashed. This is known as a known-ciphertext attack, and is probably the deadliest when dealing with popular hashes. Rainbow tables exist; these list dictionary hashes for most popular algorithms (MD5, SHA-1, Kerberos/WINDOWS encryption, etc...)

The last (and probably least) bad bit is that, usually, hash computations are quite intensive for large message lengths.

If there is one thing to remember from all this, is that hashes are not flawless, but they are still incredibly useful in that they are not directly decryptable.

How to use a hash intelligently when watching over sensitive data

A few rules:

  • ALWAYS seed the message. Seeding literally replaces the key, and consists in appending a unique string before, after, or even in between the message you're hashing. This causes the final hash to be totally different, and if the key is well chosen, provides an extra security against dictionary and rainbow table attacks. Rule of thumb: pick at least four character classes in the seed, if not more, and make it secret.
  • ALWAYS use secure algorithms. You'd be amazed at how many people still use MD5. This is not a good thing, as MD5 is flawed. SHA-0 and SHA-1 are not an option, either, I'm afraid. What is left? Depending on which programming language you are using, some algorithms might be avaiable. SHA-256 (Also known as SHA-2) is currently considered secure, making it a good choice. However, I personally tend to double the hash size and go for SHA-512. This effectively squares the number of possible message digests.
  • NEVER boast about how you encrypt stuff. If anything, it gives your attackers a new weapon. Imagine if someone got hold of your database; they could see a list of hashes, and guess from the size that it's this algorithm. This is very unlikely, as usually, there are more than one algorithm for a given digest size. Popular algorithms are usually tried first by hackers, too. Now, take the same scenario. Someone performed a physical attack on your server, and got hold of your database. The problem is, two weeks ago, you said on a forum that you used an unseeded MD5 algorithm. The said hacker would immediately fish out a MD5 rainbow table and make short work of your database.
  • ALWAYS (and I really mean always) use irreversible hash algorithms. I've seen some people use reversible encryption, even sometimes XORing the value with a key and then truncating the final value. This is really bad practice, for the reason given below:

bits.jpg

As, I hope, you can see, what you are effectively doing is creating what is called a non-avalanching cipher.  Each “letter” is encrypted and remains in the same place. So, if you truncate this, you'll end up in a lot of trouble. Suppose I truncate this final value to the first 16 bits. If you submit ABD, it'll still match.

Do the verification yourself if you do not believe me.

Using the article

This article is meant to be read as a guide rather than a proper tutorial. No code in there! Just explanations of various concepts that people often do not know about. I might cover this with a more detailed explanation some day.

In the meantime, my advice is to print it and scribble on it as much as possible. Make it your own. Add details that I wrote in undertones.

Conclusion

I hope this cryptographic teaser was not too maths-intensive. I did suppose that you knew how to XOR, perform basic operations, and knew about binary algebra. No details of ciphers were given, however, I am pretty sure you will find them on the internet if you look hard enough.

The sole point of this article was to cover the one thing that books never mention – why hashes are used. They're so often used that they almost became second-nature, and this is why people still use MD5: they think it's a true legacy.

Always remember: seed, use known stuff, and don't mention it.

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License

About the author

Seb Renauld

If you're reading this, you're probably wondering who the hell I am. I'll try to answer this one in as few words as possible.
I'm a 19-year-old student of the university of Bristol, studying physics and philosophy. Being natively french, I speak fluent french, english and german. My hobbies include programming, messing around with IT, building robots (true story), and cryptography/cryptanalysis.
Programming-wise, I have an extensive knowledge of VB, PHP, Perl, XHTML/CSS, JavaScript and a few other less-known languages. I am also proficient in most other "mainstream" languages: java, C/C++, python, ASP, ColdFusion, ActionScript etc.

However, I'm much more of a theoretical person than a programmer. I usually spend a relatively big amount of time in thinking of the best way to solve a problem, rather than hit-and-miss approaches.

Location: United Kingdom
Ocupation: Student / Web developper
Home page: http://To come

Sign up to post message on the article message board!