Hashing Algorithms and Security – Computerphile


Let’s say you want to transfer a file
from one computer to another and it is really important to know that it’s
got there intact in one piece erm, you could send it multiple times and
then compare them all – but what generally gets used is something called a hash
algorithm. A hash algorithm is kind of like the check digit in a bar code on a
credit card. I think James Grime talked about this a long long time ago on Numberphile. The last digit in a bar code or on a credit card is determined by all the
other digits on it and if you change one of those digits the last one changes as well so as you
typed into a computer – you can know instantly if you’ve
missed a key somewhere so a hash algorithm is kind of like that
– but for an entire file that might be megabytes or gigabytes in size what it gives you is a code 16 or 32 or
64 characters generally hexadecimal basically just one long number expressed
in that way that is a “sum up” of everything that’s in that file If you crushed it down if you do all
these manipulations to it and crush it down crush it down and crush it down and what it
comes out with this thing that says this is a summary of that file you can never
make it work backwards you can’t pull that data back out but it’s like a
signature it’s like a confirmation that this file
is really who it says it is the simplest hash algorithm I can think of I would just be something like that’s
five add up all the digits in the file which is 4, 9, 14, 23 that’s not a good hash
algorithm for a few reasons hash algorithms have three main
requirements the first one is speed it’s got to be reasonably fast it should
be able to churn through a big file in in a second or two at most but it also shouldn’t be too quick if
it’s too quick it’s easy to break and I’ll explain that later the second requirement is that if you
change one byte one bit anywhere in the file of the start of the middle at the
end then the whole hash should be completely different this is something
called the avalanche effect. If you’re interested in how this is
achieved do look up the actual algorithms themselves. It would take me an
hour to explain vaguely how they work in a in a friendly way but if it’s your
kind of thing do look it up but suffice it to say one bit gets flipped anywhere
in the message then the whole hash is completely and utterly different the
third requirement is that you’ve got to be able to avoid what are called hash
collisions this is where you have two documents
which have the same hash obviously there is a mathematical
principle called the pigeonhole principle you have it if you have 50
pigeons and 25 pigeonholes did you have to stuff two pigeons into one of the
pigeonholes that’s a terrible analogy when you say it like this but if I could
explain it there are incredible numbers of
documents out that possible with the hash meanwhile it’s just one fairly long
number that will be files out there which naturally have the same hash and
that’s okay because the odds against it are so unlikely that we can deal with
that it’s never going to happen naturally but if you can artificially create a
hash collision if you can say create a file and change
your name then we have a problem and that’s that’s
where security comes into these because if i can make a file that sums to a
certain hash then i can fake documents i can send
different things and have this signature match so let’s say I have an important
document something that’s i don’t know, that’s the “permission to to go to the moon” I don’t know why I said that erm… oh yeah “permission to go to the moon”
let’s say that – and it’s got someone’s name on it and that file is sent and along with it
through other channels comes this hash to verify that this is actually the
document now let’s say I can intercept that file
and I can change it but because the hash algorithm is broken i can change it and
change the name and change the data and change whatever i can send someone else
to the moon because I can make this hash the same through carefully tweaking the
bytes now it’s incredibly difficult to do that
in practice you’d want a massive file and a lot of computer code but there are old
hash algorithms like md5 which was used for many many years which now have these
collisions out in the wild and are considered broken because you can get a
file not document with text in but a computer code anything like that where
it’s possible to send something malicious and have it come out with the
same hash so this is important this is where speed
comes it if the hash is too slow no one will want to use it but if the
hash is too fast if you can create new ones in a few processor cycles then you
can fairly easily create documents that match a particular hash. it is in a very
real sense an arms race as I said for many years md5 was the accepted
algorithm and it’s still used for a few things but md5 is now thoroughly
broken because computers are fast enough and there are a few -sort-of- interesting
tricks you can use to try and create hash collisions deliberately. The other
problem with md5 is because it was used so much and it was used everywhere on
the web google has become an exceptionally good
resource for breaking them You wouldn’t want to store a
password this way i’ll talk about that in a later video don’t use something like this for
storing passwords but people did many for many years people did & in a lot of cases
a word will be stored next to its md5 hash for some reason if you type an md5
hash into google frequently the word it was hashing comes out which means that
for pretty much every word in the English language and a lot of other
passwords besides the md5 can be solved by typing it into google so md5 is is comprehensively,
constantly broken so everyone move to something called sha-1 and now there are
rumors that that might start to be broken soon if it hasn’t already because
computers keep getting faster hash collisions are easier to generate so
everyone is moving to sha-2 which for the time being is secure. sha-3 is going through
the process of being ratified by all the agencies now and in a few years that’ll be the standard – I mean
ultimately I should really emphasize this **Don’t use this for
storing passwords** I’ll talk about that in a later video these are used for verifying files for
verifying transmission and that’s all they should be useful there is one last thing which is that
occasionally you will see download sites offering software who say
that here’s the file we’re going to send you
and click here to download it and if you want to be safe here’s the hash of the file so you can
be sure it’s the right one – that’s a terrible idea I mean it will
verify you’ve gotta download intact but they’re selling this as we guarantee
that this software is safe and you can check it against that hash – which is
a bad idea because if someone has been able to get into their website and
change the software they’re sending its pretty trivial to change that hash as well so
they got that is hash algorithms that is taking a big chunk of data and turn it
into a small amount to verify it & in a later video i will talk about how that’s
used and how that shouldn’t be used for actually keeping things secure this episode of computer file was
brought to you by audible.com and you can go to audible.com / computerphile
and download a free book they’ve got a huge range that you can listen
to on all kinds of devices your phone or in the car things like that I was thinking about a book to recommend
and it made me think about the first audio book I ever listened to and that was
Treasure Island and I listened to it on a cassette next to my bed as i was going
to sleep each night I checked the audible website they do have treasure
island so that’s my recommendation today why
don’t you check it out audible.com/computerphile free book and
thanks to them for supporting our videos

100 thoughts on “Hashing Algorithms and Security – Computerphile”

  1. You can use hashes when you're distributing a file through various mirrors – you have some confidence that the version on your site is clean, but to ensure that the version on others' sites match your own, you use the hash to verify.

  2. I've been seeing all the videos. hope you talk about salting hashing in the future tkx. really loving this videos.

  3. You know, I've been thinking: Why haven't we switched to gallium phosphide for our CPU and GPU yet? I'm well aware that the material is inappropriate for FET's at the moment, that it is expensive, but I don't get why we can't just switch back to BJT's in order to accommodate the new material- surely TTL is adequate for the modern integrated circuit!
    PS: I just looked at the pertinent Wikipedia pages more closely, and it turns out they're experimenting with aluminium oxide for this stuff.

  4. Please let me explain something: Download websites doesn't use MD5 hash verification to guarantee that the file hasn't been changed by a hacker from their server, they know that if a hacker could've did that, then why he couldn't change the hash as well! The hash is only used to check whether there wasn't any network error while downloading the file that flipped or deleted some bits from the file, essentially corrupting it. This is especially important and widely used when downloading OS images (those things are large, take time to download, thus are vulnerable to network corruption) or when downloading files using the Torrent protocol, which downloads the file as chunks then the client glues them together again, so it's a check whether the network or the client didn't miss any piece of the file.

  5. Re: MD5
    I recently got taught how to change my Admin password for WordPress in phpMyAdmin  and when I paste in the new password I select MD5 to encrypt it.
    Is that a waste of time?
    Should I choose a different option than MD5 in the drop down list?

  6. Good video but that he seems to misunderstand that last bit regarding websites providing checksums. They're not providing a checksum as a guarantee that the file is what it claims to be; instead, it's for verifying that the file was downloaded correctly, as downloading 100mb+ files in the browser is quite unreliable.

  7. I was always told that hashing passwords was the safest way to do it, because you never actually store the original password in the server, so it can't be stolen.

  8. It depends on what you're trying to guard against.  If all you are worried about is that a few bits may be garbled in transmission, then a simple, fast algorithm will work great.  If you are worried that an attacker might deliberately modify your file en route, then collision resistance becomes imperative.

  9. Question…

    Suppose my system stored both the MD5 and SHA1 hashes of an input X.  Individually, MD5 and SHA1 are broken.  But is it possible to construct a separate input Y which matches both the md5 and sha1 hashes of input X?

  10. Sometimes webistes deny a password reset since the new password is "too similar" to the old one. How do they know this is all they have is a hash?

  11. I thought hashes for files on websites (like Microsoft Windows ISO images) are used for you to verify that your download did not corrupt the file.

  12. Just watched "Youtube doesn't know your password" on Tom's Channel… Now it's the same guy talking about similar stuff on another channel… I'm confused.

  13. If I make a hash algorithm in PHP or JS, how do I hide that algorithm securely from users? I could make a kind of secure hash algorithm, but that is useless if everyone can just read the instructions

  14. Can you just use multiple quick-cycle hashes, or is that just a really stupid, poorly thought out idea some runon-sentence-using, highly-allergic teenager types out on a poorly-constructed desktop computer in their bedroom at an hour far beyond his or her bedtime while under the influence of one of many mind-altering substances that exist in the world today?

  15. wow good explanation but i have this Q one of my boy ask me the 
    *. How i can Write a program that integers 1 to 20 to a binary search tree. Assume the root node is created with       value 10.
    **  Assume the data structure:
    StructNode{
    Int value;
    Node*next;
            };
          Node *head=NULL;
     Assume also that there is a value 10 in the linked list.Write a code that deletes a node with this value.Consider all the following cases:
    a. The node is at the head
    b. The node is at the middle
    c. The node is at the end
    Show less

  16. Writing hashes next to download buttons has never been intended to ensure that the software isn't maliciously altered. It's for people with crappy connections who want to make sure everything got through as intended.

  17. 4:10 Could this be used for, say instead of changing the name of the next lunar astronaut, which MAY get you on the fast track but probably won't. (after all they are bound to notice you are grossly unqualified for such a mission) but instead manipulate  troop moment orders in Pakistan. If I could get 6 or seven armor battalions to suddenly be ordered to the India- Pakistan border, well that's bound to get India to respond, which could begin a chain of events that ends in nuclear war.
    Even if it is discovered that it was fake orders that started it, it might go out of control before it could be stopped.

  18. The software or file download that has the hash along with it is actually secure. Provided they sign the hash. That is they run RSA on the hash using the Private key of the company. So, nobody can change the hash. If they should change the hash, they need the private key of the company.

  19. How do you verify the hash of a file on Windows?  It's not very easy is it?
    GCHQ in the UK routinely intercept people downloading files and send ones that have been tampered with.  They did this in 2013 to people using the Tor Project site.  When people requested the Tor Browser Bundle they sent their own modified version hoping to monitor people using that network.  It was only ever picked up by McAfee as it did something to trigger it.  They do it on other sites like BoingBoing and target people using LiveLeak.  Nothing is safe any more now we are all spied on!

  20. The hash for file downloads is usually used by open source projects, where the executable may be mirrored by countless universities which the software author doesn't have control over. In such a case, it certainly is not trivial to compromise both locations.

  21. Damn Tom, I'm amazed from your knowledge in every video of yours I watch here and on your personal channel, would love if you could recommend some good books/ resources other then this and your personal channel.

  22. if my md5 key is like a randomly generated string of 3000 characters and numbers, will that highly decrease the chance of something else hash collision it?

  23. When using hashes for file or packet verification, wouldn't using multiple hash types on the same file/packet and comparing all the hash types applied provide much greater reliability? The chances of multiple hash types having overlapping collisions is infinitesimally small with just 2 hash types let alone more.

    Thanks for the great videos!

  24. Giving the hash for a file is not intended to look "safe", at least I've never seen a site like that. Mostly when it's used it's to verify that your file didn't corrupt while downloading, which could be problematic if it's, say, a bootable disk file.

  25. Well I always thought that hash was there on those download sites for protection against network glitches rather than hacker attacks…

  26. I thought the verification hash offered by those websites was just to check that you got a complete successful download.

  27. I know this is super old but I always thought it was funny that Kali offered the hash for the exact same reason that you mentioned.

  28. hash codes on websites offering a download are also used to make sure the download went well and nothing got corrupted (or involuntarily changed by a machine error or noise)

  29. But what if you you 2 hashes? so i send a file, and it generates a 2 hashes using 2 different algorithms? Surely that lowers the chances of hash collisions astronomically.

  30. 1:58 – it should be different? it's just nice to have, hence the pigeon stuff. "Should" is not a word you can use in a definition.

Leave a Reply

Your email address will not be published. Required fields are marked *