[Ocfs2-users] hung process -- sles10 sp2
Charlie Sharkey
charlie.sharkey at bustech.com
Wed Jan 13 13:15:08 PST 2010
Here's the result of the command.
I'll check for a newer version of tools.
PID STAT COMMAND WIDE-WCHAN-COLUMN
1 S init -
2 S migration/0 migration_thread
3 SN ksoftirqd/0 ksoftirqd
4 S migration/1 migration_thread
5 SN ksoftirqd/1 ksoftirqd
6 S migration/2 migration_thread
7 SN ksoftirqd/2 ksoftirqd
8 S migration/3 migration_thread
9 SN ksoftirqd/3 ksoftirqd
10 S migration/4 migration_thread
11 SN ksoftirqd/4 ksoftirqd
12 S migration/5 migration_thread
13 SN ksoftirqd/5 ksoftirqd
14 S migration/6 migration_thread
15 SN ksoftirqd/6 ksoftirqd
16 S migration/7 migration_thread
17 SN ksoftirqd/7 ksoftirqd
18 S< events/0 worker_thread
19 S< events/1 worker_thread
20 S< events/2 worker_thread
21 S< events/3 worker_thread
22 S< events/4 worker_thread
23 S< events/5 worker_thread
24 S< events/6 worker_thread
25 S< events/7 worker_thread
26 S< khelper worker_thread
27 S< kthread worker_thread
37 S< kblockd/0 worker_thread
38 S< kblockd/1 worker_thread
39 S< kblockd/2 worker_thread
40 S< kblockd/3 worker_thread
41 S< kblockd/4 worker_thread
42 S< kblockd/5 worker_thread
43 S< kblockd/6 worker_thread
44 S< kblockd/7 worker_thread
45 S< kacpid worker_thread
46 S< kacpi_notify worker_thread
327 S pdflush pdflush
328 S pdflush pdflush
329 S kswapd0 kswapd
330 S< aio/0 worker_thread
331 S< aio/1 worker_thread
332 S< aio/2 worker_thread
333 S< aio/3 worker_thread
334 S< aio/4 worker_thread
335 S< aio/5 worker_thread
336 S< aio/6 worker_thread
337 S< aio/7 worker_thread
582 S< cqueue/0 worker_thread
583 S< cqueue/1 worker_thread
584 S< cqueue/2 worker_thread
585 S< cqueue/3 worker_thread
586 S< cqueue/4 worker_thread
587 S< cqueue/5 worker_thread
588 S< cqueue/6 worker_thread
589 S< cqueue/7 worker_thread
590 S< kseriod serio_thread
623 S< kpsmoused worker_thread
1056 S< ata/0 worker_thread
1057 S< ata/1 worker_thread
1058 S< ata/2 worker_thread
1059 S< ata/3 worker_thread
1060 S< ata/4 worker_thread
1061 S< ata/5 worker_thread
1062 S< ata/6 worker_thread
1063 S< ata/7 worker_thread
1064 S< ata_aux worker_thread
1093 S< scsi_eh_0 scsi_error_handler
1218 S< scsi_eh_1 scsi_error_handler
1232 S< qla2xxx_1_dpc 144669341936254977
2061 S< scsi_eh_2 scsi_error_handler
2111 S< qla2xxx_2_dpc 18446604440027791361
2190 S kjournald kjournald
2251 S<s udevd -
3469 S< khubd hub_thread
4474 S< scsi_eh_3 scsi_error_handler
4475 S< usb-storage -
4620 S< kmpathd/0 worker_thread
4621 S< kmpathd/1 worker_thread
4622 S< kmpathd/2 worker_thread
4623 S< kmpathd/3 worker_thread
4624 S< kmpathd/4 worker_thread
4625 S< kmpathd/5 worker_thread
4626 S< kmpathd/6 worker_thread
4627 S< kmpathd/7 worker_thread
5768 S kjournald kjournald
5770 S kjournald kjournald
5823 S< kauditd kauditd_thread
6117 Ss resmgrd -
6249 Ss acpid -
6326 Ss dbus-daemon -
6494 Ss hald -
6695 S hald-addon-acpi -
7050 S< bond worker_thread
7244 S hald-addon-stor -
7495 Ss syslog-ng -
7499 Ss klogd syslog
7524 SLl multipathd stext
7529 Ss portmap -
7547 Ss slpd -
7626 Ss irqbalance 1
7658 SN kipmi0 -
7725 S snmpd -
7950 S< CID_control OS_cidWait
7951 D< CID_timer -
7952 S< CID_sched_0 OS_cidWait
7953 S< CID_sched_1 OS_cidWait
7975 S btitool OS_cidWait
7989 Ss startpar -
8094 Sl qlremote stext
8146 Ss sshd -
8191 S< user_dlm worker_thread
8206 Ss ntpd -
8217 S< o2net worker_thread
8250 Ss cron -
8261 S< o2hb-D5304888F9 -
8272 Ss httpd2-prefork -
8273 S httpd2-prefork -
8274 S httpd2-prefork -
8275 S httpd2-prefork -
8276 S httpd2-prefork -
8277 S httpd2-prefork -
8324 S< ocfs2_wq worker_thread
8325 S< ocfs2dc ocfs2_downconvert_thread
8326 S< dlm_thread -
8327 S< dlm_reco_thread -
8328 S< dlm_wq worker_thread
8329 S kjournald kjournald
8330 S< ocfs2cmt ocfs2_commit_thread
8336 S< o2hb-B98C95FB4B -
8353 S< ocfs2dc ocfs2_downconvert_thread
8354 S< dlm_thread -
8355 S< dlm_reco_thread -
8356 S< dlm_wq worker_thread
8357 S kjournald kjournald
8358 S< ocfs2cmt ocfs2_commit_thread
8364 S< o2hb-B3EE601AEB -
8381 S< ocfs2dc ocfs2_downconvert_thread
8382 S< dlm_thread -
8383 S< dlm_reco_thread -
8384 S< dlm_wq worker_thread
8385 S kjournald kjournald
8386 S< ocfs2cmt ocfs2_commit_thread
8392 S< o2hb-2043DFCC18 -
8409 S< ocfs2dc ocfs2_downconvert_thread
8410 S< dlm_thread -
8411 S< dlm_reco_thread -
8412 S< dlm_wq worker_thread
8413 S kjournald kjournald
8414 S< ocfs2cmt ocfs2_commit_thread
8420 S< o2hb-6B6685A881 -
8437 S< ocfs2dc ocfs2_downconvert_thread
8438 S< dlm_thread -
8439 S< dlm_reco_thread -
8440 S< dlm_wq worker_thread
8441 S kjournald kjournald
8442 S< ocfs2cmt ocfs2_commit_thread
8538 S logger pipe_wait
8540 Ss startpar -
8542 Dsl vt ocfs2_wait_for_mask
8555 Ss+ mingetty -
8556 Ss+ mingetty -
8557 Ss+ mingetty -
8558 Ss+ mingetty -
8559 Ss+ mingetty -
8560 Ss+ mingetty -
8615 S< dlm_thread -
8616 S< dlm_reco_thread -
8617 S< dlm_wq worker_thread
9369 R+ ps -
10405 Ss sshd -
10407 Ss+ vtcon -
26609 Ss sshd -
26611 Ss bash wait
26698 S+ gdb wait
26894 Ss sshd -
26896 Ss+ bash -
29881 Ss sshd -
29883 Ss bash wait
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Wednesday, January 13, 2010 4:04 PM
To: Charlie Sharkey
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] hung process -- sles10 sp2
Charlie Sharkey wrote:
>
> version info
>
> ---------------
>
> n1 kernel: OCFS2 Node Manager 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLMFS 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> ocfs2-tools-1.4.0-0.5
>
> ocfs2console-1.4.0-0.5
>
> Linux n1 2.6.16.60-0.34-smp #1 SMP Fri Jan 16 14:59:01 UTC 2009 x86_64
> x86_64 x86_64 GNU/Linux
>
> ============================================================================
>
> One of the nodes of a six node cluster got a hung process. The 'ps
> -elf' command shows it as:
>
> 5 D vtape 8542 1 6 77 0 - 77376 ocfs2_ Jan12 ? 01:34:31
> /opt/bti/mas/bin/vt -d -p /var/run/vt.pid
>
> The system isn't hung, I can ssh into the system and ls each ocfs2
> directory. I have run the debugfs.ocfs2
>
> command: debug.ocfs2 -R "stats" and it shows no errors. I ran the
> 'scanlocks2' script and it didn't show
>
> any hung locks. It did create some files (/tmp/_fsl_dm-22 à
> /tmp/_fsl_dm-26). The contents of those files
>
> are: "Debug string proto 2 found, but 1 is the highest I understand."
>
You have an old debugfs.ocfs2. See if sles has a newer ocfs2-tools.
With it, rerun scanlocks2. That will tell us if dlm is involved or not.
Meanwhile what does this say.
ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN
More information about the Ocfs2-users
mailing list