[Ocfs2-users] hung process -- sles10 sp2

Charlie Sharkey charlie.sharkey at bustech.com
Wed Jan 13 13:15:08 PST 2010


Here's the result of the command.
I'll check for a newer version of tools. 


PID STAT COMMAND         WIDE-WCHAN-COLUMN
    1 S    init            -
    2 S    migration/0     migration_thread
    3 SN   ksoftirqd/0     ksoftirqd
    4 S    migration/1     migration_thread
    5 SN   ksoftirqd/1     ksoftirqd
    6 S    migration/2     migration_thread
    7 SN   ksoftirqd/2     ksoftirqd
    8 S    migration/3     migration_thread
    9 SN   ksoftirqd/3     ksoftirqd
   10 S    migration/4     migration_thread
   11 SN   ksoftirqd/4     ksoftirqd
   12 S    migration/5     migration_thread
   13 SN   ksoftirqd/5     ksoftirqd
   14 S    migration/6     migration_thread
   15 SN   ksoftirqd/6     ksoftirqd
   16 S    migration/7     migration_thread
   17 SN   ksoftirqd/7     ksoftirqd
   18 S<   events/0        worker_thread
   19 S<   events/1        worker_thread
   20 S<   events/2        worker_thread
   21 S<   events/3        worker_thread
   22 S<   events/4        worker_thread
   23 S<   events/5        worker_thread
   24 S<   events/6        worker_thread
   25 S<   events/7        worker_thread
   26 S<   khelper         worker_thread
   27 S<   kthread         worker_thread
   37 S<   kblockd/0       worker_thread
   38 S<   kblockd/1       worker_thread
   39 S<   kblockd/2       worker_thread
   40 S<   kblockd/3       worker_thread
   41 S<   kblockd/4       worker_thread
   42 S<   kblockd/5       worker_thread
   43 S<   kblockd/6       worker_thread
   44 S<   kblockd/7       worker_thread
   45 S<   kacpid          worker_thread
   46 S<   kacpi_notify    worker_thread
  327 S    pdflush         pdflush
  328 S    pdflush         pdflush
  329 S    kswapd0         kswapd
  330 S<   aio/0           worker_thread
  331 S<   aio/1           worker_thread
  332 S<   aio/2           worker_thread
  333 S<   aio/3           worker_thread
  334 S<   aio/4           worker_thread
  335 S<   aio/5           worker_thread
  336 S<   aio/6           worker_thread
  337 S<   aio/7           worker_thread
  582 S<   cqueue/0        worker_thread
  583 S<   cqueue/1        worker_thread
  584 S<   cqueue/2        worker_thread
  585 S<   cqueue/3        worker_thread
  586 S<   cqueue/4        worker_thread
  587 S<   cqueue/5        worker_thread
  588 S<   cqueue/6        worker_thread
  589 S<   cqueue/7        worker_thread
  590 S<   kseriod         serio_thread
  623 S<   kpsmoused       worker_thread
 1056 S<   ata/0           worker_thread
 1057 S<   ata/1           worker_thread
 1058 S<   ata/2           worker_thread
 1059 S<   ata/3           worker_thread
 1060 S<   ata/4           worker_thread
 1061 S<   ata/5           worker_thread
 1062 S<   ata/6           worker_thread
 1063 S<   ata/7           worker_thread
 1064 S<   ata_aux         worker_thread
 1093 S<   scsi_eh_0       scsi_error_handler
 1218 S<   scsi_eh_1       scsi_error_handler
 1232 S<   qla2xxx_1_dpc   144669341936254977
 2061 S<   scsi_eh_2       scsi_error_handler
 2111 S<   qla2xxx_2_dpc   18446604440027791361
 2190 S    kjournald       kjournald
 2251 S<s  udevd           -
 3469 S<   khubd           hub_thread
 4474 S<   scsi_eh_3       scsi_error_handler
 4475 S<   usb-storage     -
 4620 S<   kmpathd/0       worker_thread
 4621 S<   kmpathd/1       worker_thread
 4622 S<   kmpathd/2       worker_thread
 4623 S<   kmpathd/3       worker_thread
 4624 S<   kmpathd/4       worker_thread
 4625 S<   kmpathd/5       worker_thread
 4626 S<   kmpathd/6       worker_thread
 4627 S<   kmpathd/7       worker_thread
 5768 S    kjournald       kjournald
 5770 S    kjournald       kjournald
 5823 S<   kauditd         kauditd_thread
 6117 Ss   resmgrd         -
 6249 Ss   acpid           -
 6326 Ss   dbus-daemon     -
 6494 Ss   hald            -
 6695 S    hald-addon-acpi -
 7050 S<   bond            worker_thread
 7244 S    hald-addon-stor -
 7495 Ss   syslog-ng       -
 7499 Ss   klogd           syslog
 7524 SLl  multipathd      stext
 7529 Ss   portmap         -
 7547 Ss   slpd            -
 7626 Ss   irqbalance      1
 7658 SN   kipmi0          -
 7725 S    snmpd           -
 7950 S<   CID_control     OS_cidWait
 7951 D<   CID_timer       -
 7952 S<   CID_sched_0     OS_cidWait
 7953 S<   CID_sched_1     OS_cidWait
 7975 S    btitool         OS_cidWait
 7989 Ss   startpar        -
 8094 Sl   qlremote        stext
 8146 Ss   sshd            -
 8191 S<   user_dlm        worker_thread
 8206 Ss   ntpd            -
 8217 S<   o2net           worker_thread
 8250 Ss   cron            -
 8261 S<   o2hb-D5304888F9 -
 8272 Ss   httpd2-prefork  -
 8273 S    httpd2-prefork  -
 8274 S    httpd2-prefork  -
 8275 S    httpd2-prefork  -
 8276 S    httpd2-prefork  -
 8277 S    httpd2-prefork  -
 8324 S<   ocfs2_wq        worker_thread
 8325 S<   ocfs2dc         ocfs2_downconvert_thread
 8326 S<   dlm_thread      -
 8327 S<   dlm_reco_thread -
 8328 S<   dlm_wq          worker_thread
 8329 S    kjournald       kjournald
 8330 S<   ocfs2cmt        ocfs2_commit_thread
 8336 S<   o2hb-B98C95FB4B -
 8353 S<   ocfs2dc         ocfs2_downconvert_thread
 8354 S<   dlm_thread      -
 8355 S<   dlm_reco_thread -
 8356 S<   dlm_wq          worker_thread
 8357 S    kjournald       kjournald
 8358 S<   ocfs2cmt        ocfs2_commit_thread
 8364 S<   o2hb-B3EE601AEB -
 8381 S<   ocfs2dc         ocfs2_downconvert_thread
 8382 S<   dlm_thread      -
 8383 S<   dlm_reco_thread -
 8384 S<   dlm_wq          worker_thread
 8385 S    kjournald       kjournald
 8386 S<   ocfs2cmt        ocfs2_commit_thread
 8392 S<   o2hb-2043DFCC18 -
 8409 S<   ocfs2dc         ocfs2_downconvert_thread
 8410 S<   dlm_thread      -
 8411 S<   dlm_reco_thread -
 8412 S<   dlm_wq          worker_thread
 8413 S    kjournald       kjournald
 8414 S<   ocfs2cmt        ocfs2_commit_thread
 8420 S<   o2hb-6B6685A881 -
 8437 S<   ocfs2dc         ocfs2_downconvert_thread
 8438 S<   dlm_thread      -
 8439 S<   dlm_reco_thread -
 8440 S<   dlm_wq          worker_thread
 8441 S    kjournald       kjournald
 8442 S<   ocfs2cmt        ocfs2_commit_thread
 8538 S    logger          pipe_wait
 8540 Ss   startpar        -
 8542 Dsl  vt              ocfs2_wait_for_mask
 8555 Ss+  mingetty        -
 8556 Ss+  mingetty        -
 8557 Ss+  mingetty        -
 8558 Ss+  mingetty        -
 8559 Ss+  mingetty        -
 8560 Ss+  mingetty        -
 8615 S<   dlm_thread      -
 8616 S<   dlm_reco_thread -
 8617 S<   dlm_wq          worker_thread
 9369 R+   ps              -
10405 Ss   sshd            -
10407 Ss+  vtcon           -
26609 Ss   sshd            -
26611 Ss   bash            wait
26698 S+   gdb             wait
26894 Ss   sshd            -
26896 Ss+  bash            -
29881 Ss   sshd            -
29883 Ss   bash            wait


-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Wednesday, January 13, 2010 4:04 PM
To: Charlie Sharkey
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] hung process -- sles10 sp2

Charlie Sharkey wrote:
>
> version info
>
> ---------------
>
> n1 kernel: OCFS2 Node Manager 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLMFS 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> ocfs2-tools-1.4.0-0.5
>
> ocfs2console-1.4.0-0.5
>
> Linux n1 2.6.16.60-0.34-smp #1 SMP Fri Jan 16 14:59:01 UTC 2009 x86_64 
> x86_64 x86_64 GNU/Linux
>
> ============================================================================
>
> One of the nodes of a six node cluster got a hung process. The 'ps 
> -elf' command shows it as:
>
> 5 D vtape 8542 1 6 77 0 - 77376 ocfs2_ Jan12 ? 01:34:31 
> /opt/bti/mas/bin/vt -d -p /var/run/vt.pid
>
> The system isn't hung, I can ssh into the system and ls each ocfs2 
> directory. I have run the debugfs.ocfs2
>
> command: debug.ocfs2 -R "stats" and it shows no errors. I ran the 
> 'scanlocks2' script and it didn't show
>
> any hung locks. It did create some files (/tmp/_fsl_dm-22 à 
> /tmp/_fsl_dm-26). The contents of those files
>
> are: "Debug string proto 2 found, but 1 is the highest I understand."
>

You have an old debugfs.ocfs2. See if sles has a newer ocfs2-tools.
With it, rerun scanlocks2. That will tell us if dlm is involved or not.

Meanwhile what does this say.
ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN




More information about the Ocfs2-users mailing list