This *significantly* improves parallel performance of regex.
Currently if I have a large number of threads all using regexes; even if
they are using idependent regex objects, performance is still extremely poor
due to the lock inside of the mem_block_cache.