[Ocfs2-users] Dlm question

Charlie Sharkey charlie.sharkey at bustech.com
Wed Feb 13 13:27:00 PST 2008



I have upgraded to ocfs2 1.2.8 and am getting the same lock problem.
Here is the var/log/messages entries from:  echo R mas PAF1A9
>/proc/fs/ocfs2_dlm/debug
I'm not sure how to decode this, is this lock still held ?


N1 kernel: (13416,1):dlm_dump_one_lock_resource:259 struct dlm_ctxt:
mas, node=0, key=4125434387

N1 kernel: (13416,1):dlm_print_one_lock_resource:294 lockres: PAF1A9,
owner=0, state=0

N1 kernel: (13416,1):__dlm_print_one_lock_resource:309 lockres: PAF1A9,
owner=0, state=0

N1 kernel: (13416,1):__dlm_print_one_lock_resource:311   last used:
20693697, on purge list: no

N1 kernel: (13416,1):dlm_print_lockres_refmap:277   refmap nodes: [ ],
inflight=0

N1 kernel: (13416,1):__dlm_print_one_lock_resource:313   granted queue:

N1 kernel: (13416,1):__dlm_print_one_lock_resource:325     type=5,
conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n),
bast=(empty=y,pend=n)
N1 kernel: (13416,1):__dlm_print_one_lock_resource:328   converting
queue:

N1 kernel: (13416,1):__dlm_print_one_lock_resource:343   blocked queue:

                                                                    

Thank you,

charlie


-----Original Message-----
From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] 
Sent: Friday, February 01, 2008 2:29 PM
To: Charlie Sharkey
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Dlm question

There are 3 issues. I'll address them in the reverse order.

debugfs, not to be confused with debugfs.ocfs2, is a kernel component.
It used to be shipped with the ocfs2 kernel module package as
RHEL4/SLES9 did not bundle it.

RHEL5/SLES10 build/ship it as part of the kernel (not as a module) and
hence the scanlocks check fails. Solution is to comment out the section
that checks whether is is loaded. (Section commented with with "#is
debugfs loaded?")

The second issue is the oops. File a bugzilla with NOVELL.
We will handle this via them as we need to see what patches your
kernel/ocfs2 has.

The first issue indicates that the lock is busy. (-16 is EBUSY).
Meaning there are holders. As the locks are files, you can use fuser to
see which pid is using it. If you want to see the state of the lock, you
will have to dump it via the dlm proc interface.
# echo R domain lock >/proc/fs/ocfs2_dlm/debug Here domain will the
directory in /dlm and lock the file in it.
The state will be dumped in /var/log/messages.

No, scanlocks cannot dump dlmfs locks.

Sunil

Charlie Sharkey wrote:
>  
>
> Hi,
>
> I'm having some dlm issues on a system. It looks like the scenario 
> went something like:
>
>   19:45:29  -->  19:47:31   16 dlm locks are released using
> o2dlm_unlock(). 
>                             ocfs2 logs an error into 
> /var/log/messages, but returns
>                             ok to the application
>
>   19:45:15                  a dlm lock (o2dlm_lock()) is put on P00000
> -- ok
>
>   19:49:37                  lock on P00000 is released -- ok
>  
>   19:49:40                  a lock is attempted P00000. and the lock
> fails. Returned error
>                             is "Trylock failed"
>   
>
> Here is the data from /var/log/messages:
>
> Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink 
> P50005, error -16 from destroy Jan 31 19:45:43 N1 kernel: 
> (25038,1):dlmfs_unlink:512 ERROR: unlink P20010, error -16 from 
> destroy Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR: 
> unlink P20002, error -16 from destroy Jan 31 19:45:59 N1 kernel: 
> (25034,3):dlmfs_unlink:512 ERROR: unlink P60006, error -16 from 
> destroy Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR: 
> unlink P70015, error -16 from destroy Jan 31 19:46:08 N1 kernel: 
> (25035,0):dlmfs_unlink:512 ERROR: unlink P70007, error -16 from 
> destroy Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR: 
> unlink P50013, error -16 from destroy Jan 31 19:46:25 N1 kernel: 
> (25040,1):dlmfs_unlink:512 ERROR: unlink P40012, error -16 from 
> destroy Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR: 
> unlink P60014, error -16 from destroy Jan 31 19:46:30 N1 kernel: 
> (25032,1):dlmfs_unlink:512 ERROR: unlink P40004, error -16 from 
> destroy Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR: 
> unlink P00000, error -16 from destroy Jan 31 19:47:08 N1 kernel: 
> (25036,1):dlmfs_unlink:512 ERROR: unlink P00008, error -16 from 
> destroy Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR: 
> unlink P10001, error -16 from destroy Jan 31 19:47:19 N1 kernel: 
> (25037,0):dlmfs_unlink:512 ERROR: unlink P10009, error -16 from 
> destroy Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR: 
> unlink P30011, error -16 from destroy Jan 31 19:47:31 N1 kernel: 
> (25031,1):dlmfs_unlink:512 ERROR: unlink P30003, error -16 from 
> destroy
>
>
> Here is data from the application dlm log file
>
> 01/31/2008 19:42:50  C000: Dlm Lock fd/id 150/P00000, returning: ok
> 01/31/2008 19:42:50  C001: Dlm Lock fd/id 152/P10001, returning: ok
> 01/31/2008 19:42:50  C002: Dlm Lock fd/id 154/P20002, returning: ok
> 01/31/2008 19:42:51  C003: Dlm Lock fd/id 156/P30003, returning: ok
> 01/31/2008 19:42:51  C004: Dlm Lock fd/id 158/P40004, returning: ok
> 01/31/2008 19:42:52  C005: Dlm Lock fd/id 160/P50005, returning: ok
> 01/31/2008 19:42:52  C006: Dlm Lock fd/id 162/P60006, returning: ok
> 01/31/2008 19:42:52  C007: Dlm Lock fd/id 164/P70007, returning: ok
> 01/31/2008 19:42:53  C008: Dlm Lock fd/id 166/P00008, returning: ok
> 01/31/2008 19:42:53  C009: Dlm Lock fd/id 168/P10009, returning: ok
> 01/31/2008 19:42:53  C00A: Dlm Lock fd/id 170/P20010, returning: ok
> 01/31/2008 19:42:54  C00B: Dlm Lock fd/id 172/P30011, returning: ok
> 01/31/2008 19:42:54  C00C: Dlm Lock fd/id 174/P40012, returning: ok
> 01/31/2008 19:42:54  C00D: Dlm Lock fd/id 178/P50013, returning: ok
> 01/31/2008 19:42:55  C00E: Dlm Lock fd/id 180/P60014, returning: ok
> 01/31/2008 19:42:58  C00F: Dlm Lock fd/id 182/P70015, returning: ok
> 01/31/2008 19:45:29  C005: Dlm UnLock.  fd/id 160/P50005, returning ok
> 01/31/2008 19:45:43  C00A: Dlm UnLock.  fd/id 170/P20010, returning ok
> 01/31/2008 19:45:44  C002: Dlm UnLock.  fd/id 154/P20002, returning ok
> 01/31/2008 19:45:59  C006: Dlm UnLock.  fd/id 162/P60006, returning ok
> 01/31/2008 19:46:07  C00F: Dlm UnLock.  fd/id 182/P70015, returning ok
> 01/31/2008 19:46:08  C007: Dlm UnLock.  fd/id 164/P70007, returning ok
> 01/31/2008 19:46:10  C00D: Dlm UnLock.  fd/id 178/P50013, returning ok
> 01/31/2008 19:46:25  C00C: Dlm UnLock.  fd/id 174/P40012, returning ok
> 01/31/2008 19:46:30  C00E: Dlm UnLock.  fd/id 180/P60014, returning ok
> 01/31/2008 19:46:30  C004: Dlm UnLock.  fd/id 158/P40004, returning ok
> 01/31/2008 19:47:07  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
> 01/31/2008 19:47:08  C008: Dlm UnLock.  fd/id 166/P00008, returning ok
> 01/31/2008 19:47:09  C001: Dlm UnLock.  fd/id 152/P10001, returning ok
> 01/31/2008 19:47:19  C009: Dlm UnLock.  fd/id 168/P10009, returning ok
> 01/31/2008 19:47:30  C00B: Dlm UnLock.  fd/id 172/P30011, returning ok
> 01/31/2008 19:47:31  C003: Dlm UnLock.  fd/id 156/P30003, returning ok
> 01/31/2008 19:49:15  C000: Dlm Lock fd/id 150/P00000, returning: ok
> 01/31/2008 19:49:37  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
> 01/31/2008 19:49:40  C000: Dlm Lock fd/id 150/P00000, returning: 
> Trylock failed
>
>
> I also had a problem with this system the day before this. Here is The

> data from that:
>
>   SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
> DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
>     DUMPFILE: vmcore
>         CPUS: 4
>         DATE: Wed Jan 30 17:44:49 2008
>       UPTIME: 9 days, 00:55:51
> LOAD AVERAGE: 1.10, 1.06, 1.01
>        TASKS: 341
>     NODENAME: N1
>      RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
>      VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
>      MACHINE: i686  (2327 Mhz)
>       MEMORY: 2 GB
>        PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
>          PID: 31585
>      COMMAND: "masx"
>         TASK: dab912d0  [THREAD_INFO: d5840000]
>          CPU: 0
>        STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 31585  TASK: dab912d0  CPU: 0   COMMAND: "masx"
>  #0 [d5841d78] crash_kexec at c013bb1a
>  #1 [d5841dbc] die at c01055fe
>  #2 [d5841dec] do_invalid_op at c0105ce2
>  #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
>     EAX: 00000051  EBX: ca668280  ECX: 00000000  EDX: 00000296  EBP:
> da0c1c00
>     DS:  007b      ESI: da0c1c00  ES:  007b      EDI: ca668280
>     CS:  0060      EIP: fb835e95  ERR: ffffffff  EFLAGS: 00010296
>  #4 [d5841ed0] dlm_empty_lockres at fb835e95
>  #5 [d5841ee0] dlm_unregister_domain at fb827305
>  #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
>  #7 [d5841f24] clear_inode at c0175dfe
>  #8 [d5841f30] generic_delete_inode at c0175eee
>  #9 [d5841f3c] iput at c0175838
> #10 [d5841f48] dput at c01744e0
> #11 [d5841f54] do_rmdir at c016e63d
> #12 [d5841fb8] sysenter_entry at c0103bd4
>     EAX: 00000028  EBX: 08299988  ECX: 00000000  EDX: 08273be4
>     DS:  007b      ESI: 00000000  ES:  007b      EDI: 082ebf28
>     SS:  007b      ESP: bf999f7c  EBP: bf999fa8
>     CS:  0073      EIP: ffffe410  ERR: 00000028  EFLAGS: 00000246
>
>
> This is a Suse Sles10 SP1 system, with a suse nfs patch.
> Ocfs2 tools version 1.2.3-0.7
> Ocfs2 version  1.2.5-SLES-r2997
>
> I was hoping you would have some ideas on this.
>
> Also, another question. I have been trying to run one of the debugging

> Scripts, for example, scanlocks.  I keep getting the message 'Module 
> debugfs not loaded'. I don't see any debugfs.ko on the system. Isn't 
> it a part of The ocfs2 tools ?
>
> Thank you,
>
> Charlie
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list