Make md5 calculation always go through an the buffer so that A) we don't
handle packets out of sequence if some data goes through the buffer and
some doesn't, B) it works on systems that can't handle aligned access,
C) we just have one code path to worry about.
While we're at it, sizeof() and RESERVE_CONFIG_BUFFER() really don't combine
well, which is why md5sum has been reading and processing data 4 bytes at a
time. I suspect that the existence of CONFIG_MD5_SIZE_VS_SPEED to do loop
unrolling and such in the algorithm was an attempt to work around that bug.