The initial idea for safe points (the one actually in my GSoC proposal) was to implement it via polling. This method, as already outlined in the previous post, involved emitting polling code at certain points in the code. This polling code then checked a condition and suspended the current thread based on the same. However, this approach has the bad effect of imposing a runtime overhead or about 2 % in best cases. Note that this is as opposed to a collection time overhead - the polling slowed down the executing code, even when there no collections.
One solution to this problem is implementing safe points via patching. Patching incurs an slight overhead as well, but that is O(n) on the number of threads. I observed a slowdown of 0.6 % to 1.6 % (details later) with 50 threads.
The core idea is to record the places where a thread is allowed to halt and, when a thread needs to be stopped safely, patch just enough of those sites to ensure that the code executes at least one of them. The patched (modified) instructions halt the thread when executed (how that happens is described below). When the thread needs to resume execution, the instructions that were patched are 'un-patched' by replacing the new halting code with the old instructions it initially replaced.
Implementing safe points by patching is, at best, non-trivial and personally it is one of the most difficult things I've done so far. There are several issues that that complicate the situation:
- Multiple threads executing the same method.
- Variable length instructions.
- Deciding which site(s) to actually patch.
Combine this with the fact that debugging, as we know it, is not possible (more so because of certain implementation decisions, mentioned below) and you have nice little task to keep you busy for two weeks. My initial approach towards the third point was incorrect - I was trying to get around by patching just one site. This complicated matters since I then had to make various exceptions for methods in which execution could not safely halt (managed allocators, write barriers et al). My mentor, Mark, then showed my how things would simplify if I patched multiple sites. It did simplify things by an order of magnitude and that is the approach the current implementation uses.
Firstly, all threads need to be stopped before any patching is done - it is possible, otherwise, for a particular thread to attempt execution of instructions currently being modified. This may lead to incorrect behavior. Parallel patching may be worth looking at some point in the future, especially if it has the potential to improve performance. Nevertheless, this implementation follows the following stop-world logic:
- Stop all threads.
- If a thread is currently executing native code, halt it right away. Otherwise patch the required sites and restart the thread.
- The managed threads then subsequently halt themselves as they 'hit' a patched site.
This is more or less straightforward to implement - the current implementation uses POSIX signals and two semaphores for synchronization. I could not figure out a way to get this to work predictably using only one semaphore.
Coming back to the 'issues' mentioned earlier, issues (1) and (2) synergise (in a very weird use of the term) and cause a problem. Initially the patching involved emitting code that would cause a SIGSEGV and this is what triggered the problem. Upon executing such code, the thread would receive a SIGSEGV and then the signal could be caught and the thread halted. Or so I thought. Now, this instruction takes up 13 bytes (8 bytes if the page to be dereferenced is mapped to a 32 bit address, but let's keep things simple here). Suppose we have two threads, one with an IP 70, the other with 80. There is a safe point at 75, all numbers mentioned being in base 10. For the first thread, the obvious choice will be to patch the instruction at address 75 (rather to at least patch that instruction). Since the SIGSEGV takes up 13 bytes, everything from 70 to 83 is overwritten. When the execution is resumed (for the patches to actually take effect and the threads to halt), the second thread executes a malformed instruction and receives a SIGILL.
The easiest solution I could find was to insert a x86 trap instruction (0xCC). This generates a SIGTRAP instead of a SIGSEGV. Since a trap instruction is only one byte long it automatically takes care of issues (1) and (2). It does have the disadvantage of debugging difficult - I intend to ultimately remove SIGTRAPS and make the patches generate SIGSEGVs instead.
The are several issues with the current approach:
- AOT is not supported. While it may take a while before I can support, I might just be able to hack up support for pre-compiled assemblies. Perhaps this is something I should pursue post GSoC. Currently it might make sense to have a policy which detects if a method has safe points and then decides appropriately.
- Bugfixes. While I've tested the patch as much as I can, I'm sure it has bugs. Dirty, hairy, ugly, non-trivial, multi-threaded, throw-your-monitor-out-of-the-window bugs.
I will elaborate further on any progress (and more on the current state) in later posts.