Oracle bug 7708133 FP corruption running a JRockit test program Chuck Anderson chuck.anderson@oracle.com 06/10/2009 This bug was discovered by running a JRockit test program that did roughly the following: - alloc a page; mark it as not writeable. - create a bunch of threads that have a SIGSEGV signal handler - the threads do the following in a loop: - store a marker value in xmm0 - attempt to write to the non-writeable page to trigger a #GP. The kernel's #GP handler calls the thread's SIGSEGV signal handler. - the SIGSEGV signal handler: - writes a different marker value in xmm0 - marks the page as writeable - on return from the signal handler: - if xmm0 does not have the original marker value then issue an error message and cause the program to exit - else mark page as not writeable The test program running under EL5U3 fails within 10 seconds in both a PV guest and on bare metal. With the fixes described below, three concurrent instances of the test program ran without error for several hours. TBD: There are other issues that need to be fixed - cleaning up FP state after an error saving/restoring it. - sequences that I believe need to be made non-preemptable TBD: Need to patch 32-bit code. ****************** i387.c save_i387() ****************** A test program causes the following scenario: - user thread writes to xmm0. This will cause PF_USED_MATH to be set in struct task_struct.flags indicating that this task has used the FPU (there is state in FP regs such as xmm0). used_math() will return a non-zero value if the current task's PF_USED_MATH is set. - user thread attempts to write to a non-writeable page. The user thread has a SIGSEGV handler. - kernel general protection exception handler is invoked. - kernel #GP path sets up to call user thread SIGSEGV handler. - used_math(), called from setup_rt_frame(), returns non-zero (current task's PF_USED_MATH is set) so save_i387() is called to save the user thread's FP state so that it can be restored after returning from the signal handler. This code sequence in save_i387() is not correct: call clear_used_math() save FP state in a user buffer return 1 clear_used_math() clears PF_USED_MATH indicating that the task does not have FP state that needs to be saved before switching to another context. If we get preempted before we save our FP state, our FP state would be cleared because PF_USED_MATH is not set. For example, we could get a page fault attempting to save our FP state to the user's buffer. When the page fault handling completes and returns to the interrupted context (the #GP handler), the user thread FP state is not restored (because it wasn't saved) and the xmm registers are cleared. The original #GP path continues, saving the now cleared FP state. That cleared state will be "restored" when we return to the user thread from the signal handler. This causes the test program to complain that xmm0's value is NULL rather than the non-zero value it had prior to the general protection fault. This problem affects 32 and 64-bit flavors of EL3, EL4 and EL5 (at least through 2.6.18-128.1.1.0.1), both bare metal and OVM guests. The fix is to move the call to clear_used_math() after the save of the FP state. ***************************** signal.c restore_sigcontext() ***************************** If we needed to save FP state prior to calling a signal handler (used_math() was true) then we saved it to the address pointed to by struct sigcontext.fpstate. If !used_math() then we set fpstate = NULL. restore_sigcontext() is called to restore that context after returning from the signal handler. If fpstate != NULL then restore_i387() is called to restore FP state. The problem is that TS_USEDFPU may not be set. When set, TS_USEDFPU indicates that the FP state for the task is in the FP registers. We just restored state to the FP registers so TS_USEDFPU must also be set. Otherwise, if we are preempted after restoring FP state then context switching code will assume that there is no context in the FP registers that needs to be saved. It will assume that the FP context for this task has been saved to struct task_struct.thread.i387.fxsave, which has a stale context. It could be from the signal handler causing us to inherit the signal handler's FP values. The correct state is in the FP registers. The fix is to check if TS_USEDFPU is set. If not, set it and make sure CR0.ts is not set (clts()) to indicate that the FPU has this task's context and is ready for use. We must also clear TS_USEDFPU when we save our FP context in save_i387() and set CR0.ts (stts()) to indicate that the FPU must be initialized on first use by the signal handler. diff -up linux-2.6.18.x86_64/arch/x86_64/kernel/i387.c.orig linux-2.6.18.x86_64/arch/x86_64/kernel/i387.c --- linux-2.6.18.x86_64/arch/x86_64/kernel/i387.c.orig 2009-05-29 13:15:35.000000000 -0700 +++ linux-2.6.18.x86_64/arch/x86_64/kernel/i387.c 2009-06-10 10:09:41.000000000 -0700 @@ -93,17 +93,18 @@ int save_i387(struct _fpstate __user *bu if (!used_math()) return 0; - clear_used_math(); /* trigger finit */ if (task_thread_info(tsk)->status & TS_USEDFPU) { err = save_i387_checking((struct i387_fxsave_struct __user *)buf); if (err) return err; + task_thread_info(tsk)->status &= ~TS_USEDFPU; stts(); - } else { + } else { if (__copy_to_user(buf, &tsk->thread.i387.fxsave, sizeof(struct i387_fxsave_struct))) return -1; } - return 1; + clear_used_math(); /* trigger finit */ + return 1; } /* diff -up linux-2.6.18.x86_64/arch/x86_64/kernel/signal.c.orig linux-2.6.18.x86_64/arch/x86_64/kernel/signal.c --- linux-2.6.18.x86_64/arch/x86_64/kernel/signal.c.orig 2009-05-27 13:15:01.000000000 -0700 +++ linux-2.6.18.x86_64/arch/x86_64/kernel/signal.c 2009-06-10 10:07:47.000000000 -0700 @@ -95,14 +95,22 @@ restore_sigcontext(struct pt_regs *regs, { struct _fpstate __user * buf; + struct task_struct *me = current; + err |= __get_user(buf, &sc->fpstate); if (buf) { if (!access_ok(VERIFY_READ, buf, sizeof(*buf))) goto badframe; + if (!used_math()) { + init_fpu(me); + } + if (!(task_thread_info(current)->status & TS_USEDFPU)) { + clts(); + task_thread_info(current)->status |= TS_USEDFPU; + } err |= restore_i387(buf); } else { - struct task_struct *me = current; if (used_math()) { clear_fpu(me); clear_used_math();