[Ocfs2-users] Dlm question

Sunil Mushran Sunil.Mushran at oracle.com
Wed Feb 13 13:37:25 PST 2008


Yes. Shows node 0 not only as the owner (or master) but also
with a EX lock.

struct dlm_ctxt: mas, node=0, key=4125434387
lockres: PAF1A9, owner=0, state=0
last used: 20693697, on purge list: no
refmap nodes: [ ], inflight=0
 granted queue:
   type=5, conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n), 
bast=(empty=y,pend=n)
 converting queue:
 blocked queue:


Charlie Sharkey wrote:
> I have upgraded to ocfs2 1.2.8 and am getting the same lock problem.
> Here is the var/log/messages entries from:  echo R mas PAF1A9
>   
>> /proc/fs/ocfs2_dlm/debug
>>     
> I'm not sure how to decode this, is this lock still held ?
>
>
> N1 kernel: (13416,1):dlm_dump_one_lock_resource:259 struct dlm_ctxt:
> mas, node=0, key=4125434387
>
> N1 kernel: (13416,1):dlm_print_one_lock_resource:294 lockres: PAF1A9,
> owner=0, state=0
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:309 lockres: PAF1A9,
> owner=0, state=0
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:311   last used:
> 20693697, on purge list: no
>
> N1 kernel: (13416,1):dlm_print_lockres_refmap:277   refmap nodes: [ ],
> inflight=0
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:313   granted queue:
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:325     type=5,
> conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n),
> bast=(empty=y,pend=n)
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:328   converting
> queue:
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:343   blocked queue:
>
>                                                                     
>
> Thank you,
>
> charlie
>
>
> -----Original Message-----
> From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] 
> Sent: Friday, February 01, 2008 2:29 PM
> To: Charlie Sharkey
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] Dlm question
>
> There are 3 issues. I'll address them in the reverse order.
>
> debugfs, not to be confused with debugfs.ocfs2, is a kernel component.
> It used to be shipped with the ocfs2 kernel module package as
> RHEL4/SLES9 did not bundle it.
>
> RHEL5/SLES10 build/ship it as part of the kernel (not as a module) and
> hence the scanlocks check fails. Solution is to comment out the section
> that checks whether is is loaded. (Section commented with with "#is
> debugfs loaded?")
>
> The second issue is the oops. File a bugzilla with NOVELL.
> We will handle this via them as we need to see what patches your
> kernel/ocfs2 has.
>
> The first issue indicates that the lock is busy. (-16 is EBUSY).
> Meaning there are holders. As the locks are files, you can use fuser to
> see which pid is using it. If you want to see the state of the lock, you
> will have to dump it via the dlm proc interface.
> # echo R domain lock >/proc/fs/ocfs2_dlm/debug Here domain will the
> directory in /dlm and lock the file in it.
> The state will be dumped in /var/log/messages.
>
> No, scanlocks cannot dump dlmfs locks.
>
> Sunil
>
> Charlie Sharkey wrote:
>   
>>  
>>
>> Hi,
>>
>> I'm having some dlm issues on a system. It looks like the scenario 
>> went something like:
>>
>>   19:45:29  -->  19:47:31   16 dlm locks are released using
>> o2dlm_unlock(). 
>>                             ocfs2 logs an error into 
>> /var/log/messages, but returns
>>                             ok to the application
>>
>>   19:45:15                  a dlm lock (o2dlm_lock()) is put on P00000
>> -- ok
>>
>>   19:49:37                  lock on P00000 is released -- ok
>>  
>>   19:49:40                  a lock is attempted P00000. and the lock
>> fails. Returned error
>>                             is "Trylock failed"
>>   
>>
>> Here is the data from /var/log/messages:
>>
>> Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink 
>> P50005, error -16 from destroy Jan 31 19:45:43 N1 kernel: 
>> (25038,1):dlmfs_unlink:512 ERROR: unlink P20010, error -16 from 
>> destroy Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR: 
>> unlink P20002, error -16 from destroy Jan 31 19:45:59 N1 kernel: 
>> (25034,3):dlmfs_unlink:512 ERROR: unlink P60006, error -16 from 
>> destroy Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR: 
>> unlink P70015, error -16 from destroy Jan 31 19:46:08 N1 kernel: 
>> (25035,0):dlmfs_unlink:512 ERROR: unlink P70007, error -16 from 
>> destroy Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR: 
>> unlink P50013, error -16 from destroy Jan 31 19:46:25 N1 kernel: 
>> (25040,1):dlmfs_unlink:512 ERROR: unlink P40012, error -16 from 
>> destroy Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR: 
>> unlink P60014, error -16 from destroy Jan 31 19:46:30 N1 kernel: 
>> (25032,1):dlmfs_unlink:512 ERROR: unlink P40004, error -16 from 
>> destroy Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR: 
>> unlink P00000, error -16 from destroy Jan 31 19:47:08 N1 kernel: 
>> (25036,1):dlmfs_unlink:512 ERROR: unlink P00008, error -16 from 
>> destroy Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR: 
>> unlink P10001, error -16 from destroy Jan 31 19:47:19 N1 kernel: 
>> (25037,0):dlmfs_unlink:512 ERROR: unlink P10009, error -16 from 
>> destroy Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR: 
>> unlink P30011, error -16 from destroy Jan 31 19:47:31 N1 kernel: 
>> (25031,1):dlmfs_unlink:512 ERROR: unlink P30003, error -16 from 
>> destroy
>>
>>
>> Here is data from the application dlm log file
>>
>> 01/31/2008 19:42:50  C000: Dlm Lock fd/id 150/P00000, returning: ok
>> 01/31/2008 19:42:50  C001: Dlm Lock fd/id 152/P10001, returning: ok
>> 01/31/2008 19:42:50  C002: Dlm Lock fd/id 154/P20002, returning: ok
>> 01/31/2008 19:42:51  C003: Dlm Lock fd/id 156/P30003, returning: ok
>> 01/31/2008 19:42:51  C004: Dlm Lock fd/id 158/P40004, returning: ok
>> 01/31/2008 19:42:52  C005: Dlm Lock fd/id 160/P50005, returning: ok
>> 01/31/2008 19:42:52  C006: Dlm Lock fd/id 162/P60006, returning: ok
>> 01/31/2008 19:42:52  C007: Dlm Lock fd/id 164/P70007, returning: ok
>> 01/31/2008 19:42:53  C008: Dlm Lock fd/id 166/P00008, returning: ok
>> 01/31/2008 19:42:53  C009: Dlm Lock fd/id 168/P10009, returning: ok
>> 01/31/2008 19:42:53  C00A: Dlm Lock fd/id 170/P20010, returning: ok
>> 01/31/2008 19:42:54  C00B: Dlm Lock fd/id 172/P30011, returning: ok
>> 01/31/2008 19:42:54  C00C: Dlm Lock fd/id 174/P40012, returning: ok
>> 01/31/2008 19:42:54  C00D: Dlm Lock fd/id 178/P50013, returning: ok
>> 01/31/2008 19:42:55  C00E: Dlm Lock fd/id 180/P60014, returning: ok
>> 01/31/2008 19:42:58  C00F: Dlm Lock fd/id 182/P70015, returning: ok
>> 01/31/2008 19:45:29  C005: Dlm UnLock.  fd/id 160/P50005, returning ok
>> 01/31/2008 19:45:43  C00A: Dlm UnLock.  fd/id 170/P20010, returning ok
>> 01/31/2008 19:45:44  C002: Dlm UnLock.  fd/id 154/P20002, returning ok
>> 01/31/2008 19:45:59  C006: Dlm UnLock.  fd/id 162/P60006, returning ok
>> 01/31/2008 19:46:07  C00F: Dlm UnLock.  fd/id 182/P70015, returning ok
>> 01/31/2008 19:46:08  C007: Dlm UnLock.  fd/id 164/P70007, returning ok
>> 01/31/2008 19:46:10  C00D: Dlm UnLock.  fd/id 178/P50013, returning ok
>> 01/31/2008 19:46:25  C00C: Dlm UnLock.  fd/id 174/P40012, returning ok
>> 01/31/2008 19:46:30  C00E: Dlm UnLock.  fd/id 180/P60014, returning ok
>> 01/31/2008 19:46:30  C004: Dlm UnLock.  fd/id 158/P40004, returning ok
>> 01/31/2008 19:47:07  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
>> 01/31/2008 19:47:08  C008: Dlm UnLock.  fd/id 166/P00008, returning ok
>> 01/31/2008 19:47:09  C001: Dlm UnLock.  fd/id 152/P10001, returning ok
>> 01/31/2008 19:47:19  C009: Dlm UnLock.  fd/id 168/P10009, returning ok
>> 01/31/2008 19:47:30  C00B: Dlm UnLock.  fd/id 172/P30011, returning ok
>> 01/31/2008 19:47:31  C003: Dlm UnLock.  fd/id 156/P30003, returning ok
>> 01/31/2008 19:49:15  C000: Dlm Lock fd/id 150/P00000, returning: ok
>> 01/31/2008 19:49:37  C000: Dlm UnLock.  fd/id 150/P00000, returning ok
>> 01/31/2008 19:49:40  C000: Dlm Lock fd/id 150/P00000, returning: 
>> Trylock failed
>>
>>
>> I also had a problem with this system the day before this. Here is The
>>     
>
>   
>> data from that:
>>
>>   SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
>> DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
>>     DUMPFILE: vmcore
>>         CPUS: 4
>>         DATE: Wed Jan 30 17:44:49 2008
>>       UPTIME: 9 days, 00:55:51
>> LOAD AVERAGE: 1.10, 1.06, 1.01
>>        TASKS: 341
>>     NODENAME: N1
>>      RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
>>      VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
>>      MACHINE: i686  (2327 Mhz)
>>       MEMORY: 2 GB
>>        PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
>>          PID: 31585
>>      COMMAND: "masx"
>>         TASK: dab912d0  [THREAD_INFO: d5840000]
>>          CPU: 0
>>        STATE: TASK_RUNNING (PANIC)
>>
>> crash> bt
>> PID: 31585  TASK: dab912d0  CPU: 0   COMMAND: "masx"
>>  #0 [d5841d78] crash_kexec at c013bb1a
>>  #1 [d5841dbc] die at c01055fe
>>  #2 [d5841dec] do_invalid_op at c0105ce2
>>  #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
>>     EAX: 00000051  EBX: ca668280  ECX: 00000000  EDX: 00000296  EBP:
>> da0c1c00
>>     DS:  007b      ESI: da0c1c00  ES:  007b      EDI: ca668280
>>     CS:  0060      EIP: fb835e95  ERR: ffffffff  EFLAGS: 00010296
>>  #4 [d5841ed0] dlm_empty_lockres at fb835e95
>>  #5 [d5841ee0] dlm_unregister_domain at fb827305
>>  #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
>>  #7 [d5841f24] clear_inode at c0175dfe
>>  #8 [d5841f30] generic_delete_inode at c0175eee
>>  #9 [d5841f3c] iput at c0175838
>> #10 [d5841f48] dput at c01744e0
>> #11 [d5841f54] do_rmdir at c016e63d
>> #12 [d5841fb8] sysenter_entry at c0103bd4
>>     EAX: 00000028  EBX: 08299988  ECX: 00000000  EDX: 08273be4
>>     DS:  007b      ESI: 00000000  ES:  007b      EDI: 082ebf28
>>     SS:  007b      ESP: bf999f7c  EBP: bf999fa8
>>     CS:  0073      EIP: ffffe410  ERR: 00000028  EFLAGS: 00000246
>>
>>
>> This is a Suse Sles10 SP1 system, with a suse nfs patch.
>> Ocfs2 tools version 1.2.3-0.7
>> Ocfs2 version  1.2.5-SLES-r2997
>>
>> I was hoping you would have some ideas on this.
>>
>> Also, another question. I have been trying to run one of the debugging
>>     
>
>   
>> Scripts, for example, scanlocks.  I keep getting the message 'Module 
>> debugfs not loaded'. I don't see any debugfs.ko on the system. Isn't 
>> it a part of The ocfs2 tools ?
>>
>> Thank you,
>>
>> Charlie
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>   
>>     
>
>   




More information about the Ocfs2-users mailing list