Update 1st June 2009 – Added a note mentioning about case-insensitive comparisons in MySQL.
I’ve been looking at generating random identifiers in PHP, and making sure they are random enough. Looking at the PHP function uniqid(), and its suggested better token, I don’t think this is an adiquate enough way:
$better_token = md5(uniqid(mt_rand(), true));
Md5 is a 128 bit hash algorithm. Looking at the data inside the function, there doesn’t appear to be enough data (i.e. entrophy) for the 128 bits, so at best, the uniqueness of the resulting string will be less than the capacity of the md5.
uniqid(), this alone generates 13 hexadecimal characters, which is the equivallent of 52 bits,
rand(), generates a random integer number, which would fill 32 bits
uniqid(”, true), this appends additional entropy to the string, 9 decimal numbers and a dot, less than 20 bits of randomness
Adding these all up, 52+32+20 = 104 bits.
So what happens when you md5 a value that has 104 bits of randomness? You get at most 2^104 possible random values back in a md5 that has 2^128 values. Hashing algorithms can have multiple input values which generate the same output value, causing collisions, so the actual randomness would be less.
A better way
*nix systems (including Solaris, Linux, OSX and FreeBSD) have special device files usually /dev/random, which can generate as much randomness as required by reading in the number of bytes needed. /dev/urandom is a non-blocking version which sacrifices entrophy in order for the file to not lock.
$uniqueId = bin2hex(file_get_contents('/dev/urandom', 0, null, -1, 16));
This will return a random 32 character hexadecimal (128 bits). You shouldn’t need to md5 this.
Generating a shorter (yet just as random) string suitable for urls
The hexadecimal format consists of the numbers 0-9 and a-f, 16 symbols in total, yet there are many more acceptable acceptable characters available in a url.
You can have a shorter identifier if you encode the binary data as base 64 instead, which as its name suggests, has 64 symbols. However, the standard ascii symbols used in this encoding include + and /, which have special meaning in URLs. You can convert these to – and _ respectively to make them more compatible.
Also, base-64 encoding pads the end with = if there isn’t enough bytes in a block. These are not needed by an identifier, so can be stripped.
If the data string is already in binary, you can just do the following to convert it to base 64:
$base64encodedString = str_replace( array('+','/','='), array('-','_',''), base64_encode(md5($data, true)));
If however the data was in hexadecimal, you can convert it to binary using the php function pack() beforehand:
$base64encodedString = str_replace( array('+','/','='), array('-','_',''), base64_encode(pack('H*', md5($data))));
Doing this will shorten a 32 character hexadecimal into a 22 character base-64 encoded string.
Please note that when doing MySQL comparisons, your text field’s collate will most likely be case-insensitive. Comparing a base64 string against one in the database when the collation is case-insensitive (e.g. latin1_*_ci, utf8_*_ci) will dramatically increase the chance of collisions as 26 of the 64 characters will match the other 26 case ones, giving a significant reduction in bits.
So would you ultimately recommend:
$random_url_string = str_replace(
array(‘+’,'/’,'=’),
array(‘-’,'_’,”),
base64_encode(`dd if=/dev/urandom bs=16 count=1`);
Actually, I guess it would be better not to open a child process to read the file, I was just being lazy. I’ve updated the post.
So it would be:
$random_url_string = str_replace(
array(’+’,’/’,’=’),
array(’-’,’_’,”),
base64_encode(file_get_contents(‘/dev/urandom’, FILE_BINARY, null, -1, 16)));
I originally wrote the article thinking PHP integer’s were 16 bit, that would have made 88 bits, but obviously they are 32 bits, so is a lot stronger.
Either way, md5ing seems a strange way to obfuscate into a 128 bit hash a random string with less randomness than 128 bits of random data.
FILE_BINARY is only available in php 5.2.7 and above, so you get a PHP Notice:
Message: Use of undefined constant FILE_BINARY – assumed ‘FILE_BINARY’
It looks like you can simple replace that flag with null, as it’s the default behaviour anyway, e.g.
$uniqueId = str_replace(
array(‘+’,'/’,'=’),
array(‘-’,'_’,”),
base64_encode(file_get_contents(‘/dev/urandom’, null, null, -1, 16)));
Ah didn’t notice that it was the default.
The php.net documentation usually says what version the functions and constants are available from. Not sure why it doesn’t mentioned it.