|
Location: Computer science - Algorithms License: The Creative Commons Attribution-ShareAlike 2.5 License Information security, privacy and secrecyPosted by Seb RenauldAfter reading this article, I hope that you will know why it is common practice to salt hash functions, which hash functions to use, and how to prevent data leaks from databases. |
Skill: AdvancedPosted: 19/11/2008Views: 597Rating: 5.00 /5Popularity: 0.00 |
| Sign Up to vote for this article |
You would be amazed at the number of times someone asked me this question. I even asked myself this very question the first time, too: why, when comparing passwords in a database, are the passwords always hashed? Wouldn't it be so much better to just plaintext-compare them?
This question's answers are rooted both in complexity and security theories. This article should teach you why passwords – and other dangerous data – are hashed, which types of hashes exist, how to use hashes (intelligently), and more importantly when to use hashes.
A hash, or message digest, converts a text of variable length to a single, fixed-size “signature” of the message. This signature is unique for each file, and ideally should be hard enough to find so that no-one would be able to forge it. This hash function also has an added property: it is not a bijection – in other words, you cannot go from one digest to a unique “source” text.
This gives the function both incredible properties, and daunting consequences:
- The good bits:
First of all, since a hash algorithm has no parameters other than a message (no keys), the same algorithm will output the same value everywhere regardless of where, how, or when the algorithm is called. This is, I reckon the whole point: in cryptography, one uses a hash algorithm to digitally sign, or account for, a message.
Secondly, as a performance bonus, you can store long messages (or at least, their digest) in less database space!
And finally, it might not seem like much of a bonus, but the hash value can be an easy way to seed another algorithm. For example, algorithms such as WAKE require a 256bit key. Guess what? The length of a MD5 hash is of 256 bits. This saves us remembering a very long key.
- The bad bits:
Anyone who has done maths and set theory will know at least one of the bad bits. Imagine the perfect hashing system: a message of infinite length will be broken down to 2n bits. Obviously, one cannot get a one-to-one mapping between an infinite and a finite set; ultimately, there will be two values for which the computed hash will be the same. This poses us a problem, especially when dealing with passwords. A user might “stumble” on a false password but still be allowed in!
A second problem is that, since a hash has no key, one can compute an endless stream of hashes on their own computer, and if they know the final hash in the database, they can find back which message was hashed. This is known as a known-ciphertext attack, and is probably the deadliest when dealing with popular hashes. Rainbow tables exist; these list dictionary hashes for most popular algorithms (MD5, SHA-1, Kerberos/WINDOWS encryption, etc...)
The last (and probably least) bad bit is that, usually, hash computations are quite intensive for large message lengths.
If there is one thing to remember from all this, is that hashes are not flawless, but they are still incredibly useful in that they are not directly decryptable.
A few rules:

As, I hope, you can see, what you are effectively doing is creating what is called a non-avalanching cipher. Each “letter” is encrypted and remains in the same place. So, if you truncate this, you'll end up in a lot of trouble. Suppose I truncate this final value to the first 16 bits. If you submit ABD, it'll still match.
Do the verification yourself if you do not believe me.
This article is meant to be read as a guide rather than a proper tutorial. No code in there! Just explanations of various concepts that people often do not know about. I might cover this with a more detailed explanation some day.
In the meantime, my advice is to print it and scribble on it as much as possible. Make it your own. Add details that I wrote in undertones.
I hope this cryptographic teaser was not too maths-intensive. I did suppose that you knew how to XOR, perform basic operations, and knew about binary algebra. No details of ciphers were given, however, I am pretty sure you will find them on the internet if you look hard enough.
The sole point of this article was to cover the one thing that books never mention – why hashes are used. They're so often used that they almost became second-nature, and this is why people still use MD5: they think it's a true legacy.
Always remember: seed, use known stuff, and don't mention it.
This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-ShareAlike 2.5 License
| Seb Renauld
| If you're reading this, you're probably wondering who the hell I am. I'll try to answer this one in as few words as possible. I'm a 19-year-old student of the university of Bristol, studying physics and philosophy. Being natively french, I speak fluent french, english and german. My hobbies include programming, messing around with IT, building robots (true story), and cryptography/cryptanalysis. Programming-wise, I have an extensive knowledge of VB, PHP, Perl, XHTML/CSS, JavaScript and a few other less-known languages. I am also proficient in most other "mainstream" languages: java, C/C++, python, ASP, ColdFusion, ActionScript etc. However, I'm much more of a theoretical person than a programmer. I usually spend a relatively big amount of time in thinking of the best way to solve a problem, rather than hit-and-miss approaches. Location: |
Sign up to post message on the article message board!