[Ocfs2-users] hung process -- sles10 sp2

Sunil Mushran sunil.mushran at oracle.com
Wed Jan 13 13:03:59 PST 2010


Charlie Sharkey wrote:
>
> version info
>
> ---------------
>
> n1 kernel: OCFS2 Node Manager 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> n1 kernel: OCFS2 DLMFS 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC 2008
>
> ocfs2-tools-1.4.0-0.5
>
> ocfs2console-1.4.0-0.5
>
> Linux n1 2.6.16.60-0.34-smp #1 SMP Fri Jan 16 14:59:01 UTC 2009 x86_64 
> x86_64 x86_64 GNU/Linux
>
> ============================================================================
>
> One of the nodes of a six node cluster got a hung process. The ‘ps 
> –elf’ command shows it as:
>
> 5 D vtape 8542 1 6 77 0 - 77376 ocfs2_ Jan12 ? 01:34:31 
> /opt/bti/mas/bin/vt -d -p /var/run/vt.pid
>
> The system isn’t hung, I can ssh into the system and ls each ocfs2 
> directory. I have run the debugfs.ocfs2
>
> command: debug.ocfs2 –R “stats” and it shows no errors. I ran the 
> ‘scanlocks2’ script and it didn’t show
>
> any hung locks. It did create some files (/tmp/_fsl_dm-22 à 
> /tmp/_fsl_dm-26). The contents of those files
>
> are: “Debug string proto 2 found, but 1 is the highest I understand.”
>

You have an old debugfs.ocfs2. See if sles has a newer ocfs2-tools.
With it, rerun scanlocks2. That will tell us if dlm is involved or not.

Meanwhile what does this say.
ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN




More information about the Ocfs2-users mailing list