Test run using Mach-based sempahores in Python on Mac OS X

Just did a quick hack to make Python use native Mach semaphores for the GIL instead of the mutex/condition variable pair, and got sizable speed up (13%) on David Beazley’s thread test from his presentation (I have 4 cores instead of 2 though). Guess I need to create a patch to add support for it.