[Ocfs2-devel] Bug 48 "[kernel 2.6 porting] System halt during reboot after mount an OCFS volume." in bugzilla is fixed.

Sonic Zhang sonic.zhang at intel.com
Wed Mar 24 15:37:50 CST 2004


Hi all,

I successfully root cause and fix bug 48 "[kernel 2.6 porting] System halt
during reboot after mount an OCFS volume.".

In current OCFS v2 driver, ocfs_volume_thread, ocfs_recv_thread and
ocfs_commit_thread are assumed to be terminated by the ocfs_dismount_volume
routine. But, if the system reboots, all processes and kernel threads will
receive signal SIGTERM before ocfs_dismount_volume routine is called. 

These kernel threads don't exit correctly. For example, they don't know 
they
should exit loop after received signal SIGTERM and clear their task_struct
pointers in ocfs_super to indiate their status. That's the cause of the 
system
halt in ocfs_dismount_volume routine when system reboots.

I attach a patch to fix this bug. Please review.

Thank you

This patch is against svn version 807.
----------------------------------------------------------------
--- ocfs2.old/src/journal.c    2004-03-22 16:02:55.000000000 +0800
+++ ocfs2/src/journal.c    2004-03-22 16:09:57.000000000 +0800
@@ -1034,12 +1034,13 @@
    /* The OCFS_JOURNAL_IN_SHUTDOWN will signal to commit_cache to not
     * drop the trans_lock (which we want to hold until we
     * completely destroy the journal. */
-    if (osb->commit && osb->commit->c_task) {
-        /* Wait for the commit thread */
-        LOG_TRACE_STR ("Waiting for ocfs2commit to exit....");
-        send_sig (SIGINT, osb->commit->c_task, 0);
-        wait_for_completion(&osb->commit->c_complete);
-        osb->commit->c_task = NULL;
+    if (osb->commit) {
+        if(osb->commit->c_task) {
+            /* Wait for the commit thread */
+            LOG_TRACE_STR ("Waiting for ocfs2commit to exit....");
+            send_sig (SIGINT, osb->commit->c_task, 0);
+            wait_for_completion(&osb->commit->c_complete);
+        }
        ocfs_free(osb->commit);
    }
    
@@ -1808,7 +1809,7 @@
            break;
    }

-
+    commit->c_task = NULL;

        /* Flush all scheduled tasks */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)


--- ocfs2.old/src/nm.c.old    2004-03-23 17:09:29.000000000 +0800
+++ ocfs2/src/nm.c    2004-03-24 10:18:35.000000000 +0800
@@ -118,6 +118,8 @@
        OcfsIpcCtxt.recv_sock = NULL;
    }

+    OcfsIpcCtxt.task = NULL;
+
    /* signal main thread of ipcdlm's exit */
    complete (&(OcfsIpcCtxt.complete));

@@ -249,6 +251,8 @@
    __u64 cfg_seq_num;
    int which, pruned, prune_iters = 0;
    struct buffer_head *bh = NULL;
+    int signr;
+    siginfo_t info;

    LOG_ENTRY ();

@@ -258,6 +262,7 @@

    sprintf (proc, "ocfs2nm-%d", osb->osb_id);
    ocfs_daemonize (proc, strlen(proc));
+    allow_signal(SIGTERM);

    osb->dlm_task = current;

@@ -437,7 +442,11 @@
            osb->hbt = 50 + j;
        }
        set_current_state (TASK_INTERRUPTIBLE);
-        schedule_timeout (osb->hbt - j);
+        if( schedule_timeout (osb->hbt - j) < osb->hbt -j ) {
+            signr = dequeue_signal_lock(current, &current->blocked, &info);
+            if(signr == SIGTERM)
+                OcfsGlobalCtxt.flags |= OCFS_FLAG_SHUTDOWN_VOL_THREAD;
+        }
    }

        /* Flush all scheduled tasks */
@@ -447,6 +456,8 @@
        flush_scheduled_tasks ();
#endif

+    osb->dlm_task = NULL;
+
    complete (&(osb->dlm_complete));
eek:
    LOG_EXIT_LONG (0);








More information about the Ocfs2-devel mailing list