[Ocfs2-users] Dlm question

Charlie Sharkey charlie.sharkey at bustech.com
Fri Feb 1 08:00:54 PST 2008


 

Hi,

I'm having some dlm issues on a system. It looks like
the scenario went something like: 

  19:45:29  -->  19:47:31   16 dlm locks are released using
o2dlm_unlock(). 
                            ocfs2 logs an error into /var/log/messages,
but returns 
                            ok to the application

  19:45:15                  a dlm lock (o2dlm_lock()) is put on P00000
-- ok

  19:49:37                  lock on P00000 is released -- ok
 
  19:49:40                  a lock is attempted P00000. and the lock
fails. Returned error
                            is "Trylock failed"
  

Here is the data from /var/log/messages:

Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink
P50005, error -16 from destroy
Jan 31 19:45:43 N1 kernel: (25038,1):dlmfs_unlink:512 ERROR: unlink
P20010, error -16 from destroy
Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR: unlink
P20002, error -16 from destroy
Jan 31 19:45:59 N1 kernel: (25034,3):dlmfs_unlink:512 ERROR: unlink
P60006, error -16 from destroy
Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR: unlink
P70015, error -16 from destroy
Jan 31 19:46:08 N1 kernel: (25035,0):dlmfs_unlink:512 ERROR: unlink
P70007, error -16 from destroy
Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR: unlink
P50013, error -16 from destroy
Jan 31 19:46:25 N1 kernel: (25040,1):dlmfs_unlink:512 ERROR: unlink
P40012, error -16 from destroy
Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR: unlink
P60014, error -16 from destroy
Jan 31 19:46:30 N1 kernel: (25032,1):dlmfs_unlink:512 ERROR: unlink
P40004, error -16 from destroy
Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR: unlink
P00000, error -16 from destroy
Jan 31 19:47:08 N1 kernel: (25036,1):dlmfs_unlink:512 ERROR: unlink
P00008, error -16 from destroy
Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR: unlink
P10001, error -16 from destroy
Jan 31 19:47:19 N1 kernel: (25037,0):dlmfs_unlink:512 ERROR: unlink
P10009, error -16 from destroy
Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR: unlink
P30011, error -16 from destroy
Jan 31 19:47:31 N1 kernel: (25031,1):dlmfs_unlink:512 ERROR: unlink
P30003, error -16 from destroy


Here is data from the application dlm log file

01/31/2008 19:42:50  C000: Dlm Lock fd/id 150/P00000, returning: ok
01/31/2008 19:42:50  C001: Dlm Lock fd/id 152/P10001, returning: ok
01/31/2008 19:42:50  C002: Dlm Lock fd/id 154/P20002, returning: ok
01/31/2008 19:42:51  C003: Dlm Lock fd/id 156/P30003, returning: ok
01/31/2008 19:42:51  C004: Dlm Lock fd/id 158/P40004, returning: ok
01/31/2008 19:42:52  C005: Dlm Lock fd/id 160/P50005, returning: ok
01/31/2008 19:42:52  C006: Dlm Lock fd/id 162/P60006, returning: ok
01/31/2008 19:42:52  C007: Dlm Lock fd/id 164/P70007, returning: ok
01/31/2008 19:42:53  C008: Dlm Lock fd/id 166/P00008, returning: ok
01/31/2008 19:42:53  C009: Dlm Lock fd/id 168/P10009, returning: ok
01/31/2008 19:42:53  C00A: Dlm Lock fd/id 170/P20010, returning: ok
01/31/2008 19:42:54  C00B: Dlm Lock fd/id 172/P30011, returning: ok
01/31/2008 19:42:54  C00C: Dlm Lock fd/id 174/P40012, returning: ok
01/31/2008 19:42:54  C00D: Dlm Lock fd/id 178/P50013, returning: ok
01/31/2008 19:42:55  C00E: Dlm Lock fd/id 180/P60014, returning: ok
01/31/2008 19:42:58  C00F: Dlm Lock fd/id 182/P70015, returning: ok
01/31/2008 19:45:29  C005: Dlm UnLock.  fd/id 160/P50005, returning ok
01/31/2008 19:45:43  C00A: Dlm UnLock.  fd/id 170/P20010, returning ok
01/31/2008 19:45:44  C002: Dlm UnLock.  fd/id 154/P20002, returning ok
01/31/2008 19:45:59  C006: Dlm UnLock.  fd/id 162/P60006, returning ok
01/31/2008 19:46:07  C00F: Dlm UnLock.  fd/id 182/P70015, returning ok
01/31/2008 19:46:08  C007: Dlm UnLock.  fd/id 164/P70007, returning ok
01/31/2008 19:46:10  C00D: Dlm UnLock.  fd/id 178/P50013, returning ok
01/31/2008 19:46:25  C00C: Dlm UnLock.  fd/id 174/P40012, returning ok
01/31/2008 19:46:30  C00E: Dlm UnLock.  fd/id 180/P60014, returning ok
01/31/2008 19:46:30  C004: Dlm UnLock.  fd/id 158/P40004, returning ok
01/31/2008 19:47:07  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
01/31/2008 19:47:08  C008: Dlm UnLock.  fd/id 166/P00008, returning ok
01/31/2008 19:47:09  C001: Dlm UnLock.  fd/id 152/P10001, returning ok
01/31/2008 19:47:19  C009: Dlm UnLock.  fd/id 168/P10009, returning ok
01/31/2008 19:47:30  C00B: Dlm UnLock.  fd/id 172/P30011, returning ok
01/31/2008 19:47:31  C003: Dlm UnLock.  fd/id 156/P30003, returning ok
01/31/2008 19:49:15  C000: Dlm Lock fd/id 150/P00000, returning: ok
01/31/2008 19:49:37  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
01/31/2008 19:49:40  C000: Dlm Lock fd/id 150/P00000, returning: Trylock
failed


I also had a problem with this system the day before this. Here is
The data from that:

  SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
    DUMPFILE: vmcore
        CPUS: 4
        DATE: Wed Jan 30 17:44:49 2008
      UPTIME: 9 days, 00:55:51
LOAD AVERAGE: 1.10, 1.06, 1.01
       TASKS: 341
    NODENAME: N1
     RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
     VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
     MACHINE: i686  (2327 Mhz)
      MEMORY: 2 GB
       PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
         PID: 31585
     COMMAND: "masx"
        TASK: dab912d0  [THREAD_INFO: d5840000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 31585  TASK: dab912d0  CPU: 0   COMMAND: "masx"
 #0 [d5841d78] crash_kexec at c013bb1a
 #1 [d5841dbc] die at c01055fe
 #2 [d5841dec] do_invalid_op at c0105ce2
 #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
    EAX: 00000051  EBX: ca668280  ECX: 00000000  EDX: 00000296  EBP:
da0c1c00
    DS:  007b      ESI: da0c1c00  ES:  007b      EDI: ca668280
    CS:  0060      EIP: fb835e95  ERR: ffffffff  EFLAGS: 00010296
 #4 [d5841ed0] dlm_empty_lockres at fb835e95
 #5 [d5841ee0] dlm_unregister_domain at fb827305
 #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
 #7 [d5841f24] clear_inode at c0175dfe
 #8 [d5841f30] generic_delete_inode at c0175eee
 #9 [d5841f3c] iput at c0175838
#10 [d5841f48] dput at c01744e0
#11 [d5841f54] do_rmdir at c016e63d
#12 [d5841fb8] sysenter_entry at c0103bd4
    EAX: 00000028  EBX: 08299988  ECX: 00000000  EDX: 08273be4
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 082ebf28
    SS:  007b      ESP: bf999f7c  EBP: bf999fa8
    CS:  0073      EIP: ffffe410  ERR: 00000028  EFLAGS: 00000246


This is a Suse Sles10 SP1 system, with a suse nfs patch.
Ocfs2 tools version 1.2.3-0.7
Ocfs2 version  1.2.5-SLES-r2997

I was hoping you would have some ideas on this.

Also, another question. I have been trying to run one of the debugging 
Scripts, for example, scanlocks.  I keep getting the message 'Module
debugfs
not loaded'. I don't see any debugfs.ko on the system. Isn't it a part
of 
The ocfs2 tools ?

Thank you,

Charlie


















More information about the Ocfs2-users mailing list