[Ocfs2-users] Dlm question

Sunil Mushran Sunil.Mushran at oracle.com
Fri Feb 1 11:29:00 PST 2008


There are 3 issues. I'll address them in the reverse order.

debugfs, not to be confused with debugfs.ocfs2, is a kernel
component. It used to be shipped with the ocfs2 kernel module
package as RHEL4/SLES9 did not bundle it.

RHEL5/SLES10 build/ship it as part of the kernel (not as a module)
and hence the scanlocks check fails. Solution is to comment out
the section that checks whether is is loaded. (Section commented
with with "#is debugfs loaded?")

The second issue is the oops. File a bugzilla with NOVELL.
We will handle this via them as we need to see what patches
your kernel/ocfs2 has.

The first issue indicates that the lock is busy. (-16 is EBUSY).
Meaning there are holders. As the locks are files, you can use
fuser to see which pid is using it. If you want to see the state
of the lock, you will have to dump it via the dlm proc interface.
# echo R domain lock >/proc/fs/ocfs2_dlm/debug
Here domain will the directory in /dlm and lock the file in it.
The state will be dumped in /var/log/messages.

No, scanlocks cannot dump dlmfs locks.

Sunil

Charlie Sharkey wrote:
>  
>
> Hi,
>
> I'm having some dlm issues on a system. It looks like
> the scenario went something like: 
>
>   19:45:29  -->  19:47:31   16 dlm locks are released using
> o2dlm_unlock(). 
>                             ocfs2 logs an error into /var/log/messages,
> but returns 
>                             ok to the application
>
>   19:45:15                  a dlm lock (o2dlm_lock()) is put on P00000
> -- ok
>
>   19:49:37                  lock on P00000 is released -- ok
>  
>   19:49:40                  a lock is attempted P00000. and the lock
> fails. Returned error
>                             is "Trylock failed"
>   
>
> Here is the data from /var/log/messages:
>
> Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink
> P50005, error -16 from destroy
> Jan 31 19:45:43 N1 kernel: (25038,1):dlmfs_unlink:512 ERROR: unlink
> P20010, error -16 from destroy
> Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR: unlink
> P20002, error -16 from destroy
> Jan 31 19:45:59 N1 kernel: (25034,3):dlmfs_unlink:512 ERROR: unlink
> P60006, error -16 from destroy
> Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR: unlink
> P70015, error -16 from destroy
> Jan 31 19:46:08 N1 kernel: (25035,0):dlmfs_unlink:512 ERROR: unlink
> P70007, error -16 from destroy
> Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR: unlink
> P50013, error -16 from destroy
> Jan 31 19:46:25 N1 kernel: (25040,1):dlmfs_unlink:512 ERROR: unlink
> P40012, error -16 from destroy
> Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR: unlink
> P60014, error -16 from destroy
> Jan 31 19:46:30 N1 kernel: (25032,1):dlmfs_unlink:512 ERROR: unlink
> P40004, error -16 from destroy
> Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR: unlink
> P00000, error -16 from destroy
> Jan 31 19:47:08 N1 kernel: (25036,1):dlmfs_unlink:512 ERROR: unlink
> P00008, error -16 from destroy
> Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR: unlink
> P10001, error -16 from destroy
> Jan 31 19:47:19 N1 kernel: (25037,0):dlmfs_unlink:512 ERROR: unlink
> P10009, error -16 from destroy
> Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR: unlink
> P30011, error -16 from destroy
> Jan 31 19:47:31 N1 kernel: (25031,1):dlmfs_unlink:512 ERROR: unlink
> P30003, error -16 from destroy
>
>
> Here is data from the application dlm log file
>
> 01/31/2008 19:42:50  C000: Dlm Lock fd/id 150/P00000, returning: ok
> 01/31/2008 19:42:50  C001: Dlm Lock fd/id 152/P10001, returning: ok
> 01/31/2008 19:42:50  C002: Dlm Lock fd/id 154/P20002, returning: ok
> 01/31/2008 19:42:51  C003: Dlm Lock fd/id 156/P30003, returning: ok
> 01/31/2008 19:42:51  C004: Dlm Lock fd/id 158/P40004, returning: ok
> 01/31/2008 19:42:52  C005: Dlm Lock fd/id 160/P50005, returning: ok
> 01/31/2008 19:42:52  C006: Dlm Lock fd/id 162/P60006, returning: ok
> 01/31/2008 19:42:52  C007: Dlm Lock fd/id 164/P70007, returning: ok
> 01/31/2008 19:42:53  C008: Dlm Lock fd/id 166/P00008, returning: ok
> 01/31/2008 19:42:53  C009: Dlm Lock fd/id 168/P10009, returning: ok
> 01/31/2008 19:42:53  C00A: Dlm Lock fd/id 170/P20010, returning: ok
> 01/31/2008 19:42:54  C00B: Dlm Lock fd/id 172/P30011, returning: ok
> 01/31/2008 19:42:54  C00C: Dlm Lock fd/id 174/P40012, returning: ok
> 01/31/2008 19:42:54  C00D: Dlm Lock fd/id 178/P50013, returning: ok
> 01/31/2008 19:42:55  C00E: Dlm Lock fd/id 180/P60014, returning: ok
> 01/31/2008 19:42:58  C00F: Dlm Lock fd/id 182/P70015, returning: ok
> 01/31/2008 19:45:29  C005: Dlm UnLock.  fd/id 160/P50005, returning ok
> 01/31/2008 19:45:43  C00A: Dlm UnLock.  fd/id 170/P20010, returning ok
> 01/31/2008 19:45:44  C002: Dlm UnLock.  fd/id 154/P20002, returning ok
> 01/31/2008 19:45:59  C006: Dlm UnLock.  fd/id 162/P60006, returning ok
> 01/31/2008 19:46:07  C00F: Dlm UnLock.  fd/id 182/P70015, returning ok
> 01/31/2008 19:46:08  C007: Dlm UnLock.  fd/id 164/P70007, returning ok
> 01/31/2008 19:46:10  C00D: Dlm UnLock.  fd/id 178/P50013, returning ok
> 01/31/2008 19:46:25  C00C: Dlm UnLock.  fd/id 174/P40012, returning ok
> 01/31/2008 19:46:30  C00E: Dlm UnLock.  fd/id 180/P60014, returning ok
> 01/31/2008 19:46:30  C004: Dlm UnLock.  fd/id 158/P40004, returning ok
> 01/31/2008 19:47:07  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
> 01/31/2008 19:47:08  C008: Dlm UnLock.  fd/id 166/P00008, returning ok
> 01/31/2008 19:47:09  C001: Dlm UnLock.  fd/id 152/P10001, returning ok
> 01/31/2008 19:47:19  C009: Dlm UnLock.  fd/id 168/P10009, returning ok
> 01/31/2008 19:47:30  C00B: Dlm UnLock.  fd/id 172/P30011, returning ok
> 01/31/2008 19:47:31  C003: Dlm UnLock.  fd/id 156/P30003, returning ok
> 01/31/2008 19:49:15  C000: Dlm Lock fd/id 150/P00000, returning: ok
> 01/31/2008 19:49:37  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
> 01/31/2008 19:49:40  C000: Dlm Lock fd/id 150/P00000, returning: Trylock
> failed
>
>
> I also had a problem with this system the day before this. Here is
> The data from that:
>
>   SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
> DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
>     DUMPFILE: vmcore
>         CPUS: 4
>         DATE: Wed Jan 30 17:44:49 2008
>       UPTIME: 9 days, 00:55:51
> LOAD AVERAGE: 1.10, 1.06, 1.01
>        TASKS: 341
>     NODENAME: N1
>      RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
>      VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
>      MACHINE: i686  (2327 Mhz)
>       MEMORY: 2 GB
>        PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
>          PID: 31585
>      COMMAND: "masx"
>         TASK: dab912d0  [THREAD_INFO: d5840000]
>          CPU: 0
>        STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 31585  TASK: dab912d0  CPU: 0   COMMAND: "masx"
>  #0 [d5841d78] crash_kexec at c013bb1a
>  #1 [d5841dbc] die at c01055fe
>  #2 [d5841dec] do_invalid_op at c0105ce2
>  #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
>     EAX: 00000051  EBX: ca668280  ECX: 00000000  EDX: 00000296  EBP:
> da0c1c00
>     DS:  007b      ESI: da0c1c00  ES:  007b      EDI: ca668280
>     CS:  0060      EIP: fb835e95  ERR: ffffffff  EFLAGS: 00010296
>  #4 [d5841ed0] dlm_empty_lockres at fb835e95
>  #5 [d5841ee0] dlm_unregister_domain at fb827305
>  #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
>  #7 [d5841f24] clear_inode at c0175dfe
>  #8 [d5841f30] generic_delete_inode at c0175eee
>  #9 [d5841f3c] iput at c0175838
> #10 [d5841f48] dput at c01744e0
> #11 [d5841f54] do_rmdir at c016e63d
> #12 [d5841fb8] sysenter_entry at c0103bd4
>     EAX: 00000028  EBX: 08299988  ECX: 00000000  EDX: 08273be4
>     DS:  007b      ESI: 00000000  ES:  007b      EDI: 082ebf28
>     SS:  007b      ESP: bf999f7c  EBP: bf999fa8
>     CS:  0073      EIP: ffffe410  ERR: 00000028  EFLAGS: 00000246
>
>
> This is a Suse Sles10 SP1 system, with a suse nfs patch.
> Ocfs2 tools version 1.2.3-0.7
> Ocfs2 version  1.2.5-SLES-r2997
>
> I was hoping you would have some ideas on this.
>
> Also, another question. I have been trying to run one of the debugging 
> Scripts, for example, scanlocks.  I keep getting the message 'Module
> debugfs
> not loaded'. I don't see any debugfs.ko on the system. Isn't it a part
> of 
> The ocfs2 tools ?
>
> Thank you,
>
> Charlie
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list