[Ocfs2-users] Dlm question
Sunil Mushran
Sunil.Mushran at oracle.com
Wed Feb 13 13:37:25 PST 2008
Yes. Shows node 0 not only as the owner (or master) but also
with a EX lock.
struct dlm_ctxt: mas, node=0, key=4125434387
lockres: PAF1A9, owner=0, state=0
last used: 20693697, on purge list: no
refmap nodes: [ ], inflight=0
granted queue:
type=5, conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n),
bast=(empty=y,pend=n)
converting queue:
blocked queue:
Charlie Sharkey wrote:
> I have upgraded to ocfs2 1.2.8 and am getting the same lock problem.
> Here is the var/log/messages entries from: echo R mas PAF1A9
>
>> /proc/fs/ocfs2_dlm/debug
>>
> I'm not sure how to decode this, is this lock still held ?
>
>
> N1 kernel: (13416,1):dlm_dump_one_lock_resource:259 struct dlm_ctxt:
> mas, node=0, key=4125434387
>
> N1 kernel: (13416,1):dlm_print_one_lock_resource:294 lockres: PAF1A9,
> owner=0, state=0
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:309 lockres: PAF1A9,
> owner=0, state=0
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:311 last used:
> 20693697, on purge list: no
>
> N1 kernel: (13416,1):dlm_print_lockres_refmap:277 refmap nodes: [ ],
> inflight=0
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:313 granted queue:
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:325 type=5,
> conv=-1, node=0, cookie=0:35003811, ast=(empty=y,pend=n),
> bast=(empty=y,pend=n)
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:328 converting
> queue:
>
> N1 kernel: (13416,1):__dlm_print_one_lock_resource:343 blocked queue:
>
>
>
> Thank you,
>
> charlie
>
>
> -----Original Message-----
> From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com]
> Sent: Friday, February 01, 2008 2:29 PM
> To: Charlie Sharkey
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] Dlm question
>
> There are 3 issues. I'll address them in the reverse order.
>
> debugfs, not to be confused with debugfs.ocfs2, is a kernel component.
> It used to be shipped with the ocfs2 kernel module package as
> RHEL4/SLES9 did not bundle it.
>
> RHEL5/SLES10 build/ship it as part of the kernel (not as a module) and
> hence the scanlocks check fails. Solution is to comment out the section
> that checks whether is is loaded. (Section commented with with "#is
> debugfs loaded?")
>
> The second issue is the oops. File a bugzilla with NOVELL.
> We will handle this via them as we need to see what patches your
> kernel/ocfs2 has.
>
> The first issue indicates that the lock is busy. (-16 is EBUSY).
> Meaning there are holders. As the locks are files, you can use fuser to
> see which pid is using it. If you want to see the state of the lock, you
> will have to dump it via the dlm proc interface.
> # echo R domain lock >/proc/fs/ocfs2_dlm/debug Here domain will the
> directory in /dlm and lock the file in it.
> The state will be dumped in /var/log/messages.
>
> No, scanlocks cannot dump dlmfs locks.
>
> Sunil
>
> Charlie Sharkey wrote:
>
>>
>>
>> Hi,
>>
>> I'm having some dlm issues on a system. It looks like the scenario
>> went something like:
>>
>> 19:45:29 --> 19:47:31 16 dlm locks are released using
>> o2dlm_unlock().
>> ocfs2 logs an error into
>> /var/log/messages, but returns
>> ok to the application
>>
>> 19:45:15 a dlm lock (o2dlm_lock()) is put on P00000
>> -- ok
>>
>> 19:49:37 lock on P00000 is released -- ok
>>
>> 19:49:40 a lock is attempted P00000. and the lock
>> fails. Returned error
>> is "Trylock failed"
>>
>>
>> Here is the data from /var/log/messages:
>>
>> Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink
>> P50005, error -16 from destroy Jan 31 19:45:43 N1 kernel:
>> (25038,1):dlmfs_unlink:512 ERROR: unlink P20010, error -16 from
>> destroy Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR:
>> unlink P20002, error -16 from destroy Jan 31 19:45:59 N1 kernel:
>> (25034,3):dlmfs_unlink:512 ERROR: unlink P60006, error -16 from
>> destroy Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR:
>> unlink P70015, error -16 from destroy Jan 31 19:46:08 N1 kernel:
>> (25035,0):dlmfs_unlink:512 ERROR: unlink P70007, error -16 from
>> destroy Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR:
>> unlink P50013, error -16 from destroy Jan 31 19:46:25 N1 kernel:
>> (25040,1):dlmfs_unlink:512 ERROR: unlink P40012, error -16 from
>> destroy Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR:
>> unlink P60014, error -16 from destroy Jan 31 19:46:30 N1 kernel:
>> (25032,1):dlmfs_unlink:512 ERROR: unlink P40004, error -16 from
>> destroy Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR:
>> unlink P00000, error -16 from destroy Jan 31 19:47:08 N1 kernel:
>> (25036,1):dlmfs_unlink:512 ERROR: unlink P00008, error -16 from
>> destroy Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR:
>> unlink P10001, error -16 from destroy Jan 31 19:47:19 N1 kernel:
>> (25037,0):dlmfs_unlink:512 ERROR: unlink P10009, error -16 from
>> destroy Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR:
>> unlink P30011, error -16 from destroy Jan 31 19:47:31 N1 kernel:
>> (25031,1):dlmfs_unlink:512 ERROR: unlink P30003, error -16 from
>> destroy
>>
>>
>> Here is data from the application dlm log file
>>
>> 01/31/2008 19:42:50 C000: Dlm Lock fd/id 150/P00000, returning: ok
>> 01/31/2008 19:42:50 C001: Dlm Lock fd/id 152/P10001, returning: ok
>> 01/31/2008 19:42:50 C002: Dlm Lock fd/id 154/P20002, returning: ok
>> 01/31/2008 19:42:51 C003: Dlm Lock fd/id 156/P30003, returning: ok
>> 01/31/2008 19:42:51 C004: Dlm Lock fd/id 158/P40004, returning: ok
>> 01/31/2008 19:42:52 C005: Dlm Lock fd/id 160/P50005, returning: ok
>> 01/31/2008 19:42:52 C006: Dlm Lock fd/id 162/P60006, returning: ok
>> 01/31/2008 19:42:52 C007: Dlm Lock fd/id 164/P70007, returning: ok
>> 01/31/2008 19:42:53 C008: Dlm Lock fd/id 166/P00008, returning: ok
>> 01/31/2008 19:42:53 C009: Dlm Lock fd/id 168/P10009, returning: ok
>> 01/31/2008 19:42:53 C00A: Dlm Lock fd/id 170/P20010, returning: ok
>> 01/31/2008 19:42:54 C00B: Dlm Lock fd/id 172/P30011, returning: ok
>> 01/31/2008 19:42:54 C00C: Dlm Lock fd/id 174/P40012, returning: ok
>> 01/31/2008 19:42:54 C00D: Dlm Lock fd/id 178/P50013, returning: ok
>> 01/31/2008 19:42:55 C00E: Dlm Lock fd/id 180/P60014, returning: ok
>> 01/31/2008 19:42:58 C00F: Dlm Lock fd/id 182/P70015, returning: ok
>> 01/31/2008 19:45:29 C005: Dlm UnLock. fd/id 160/P50005, returning ok
>> 01/31/2008 19:45:43 C00A: Dlm UnLock. fd/id 170/P20010, returning ok
>> 01/31/2008 19:45:44 C002: Dlm UnLock. fd/id 154/P20002, returning ok
>> 01/31/2008 19:45:59 C006: Dlm UnLock. fd/id 162/P60006, returning ok
>> 01/31/2008 19:46:07 C00F: Dlm UnLock. fd/id 182/P70015, returning ok
>> 01/31/2008 19:46:08 C007: Dlm UnLock. fd/id 164/P70007, returning ok
>> 01/31/2008 19:46:10 C00D: Dlm UnLock. fd/id 178/P50013, returning ok
>> 01/31/2008 19:46:25 C00C: Dlm UnLock. fd/id 174/P40012, returning ok
>> 01/31/2008 19:46:30 C00E: Dlm UnLock. fd/id 180/P60014, returning ok
>> 01/31/2008 19:46:30 C004: Dlm UnLock. fd/id 158/P40004, returning ok
>> 01/31/2008 19:47:07 C000: Dlm UnLock. fd/id 150/P00000, returning ok
>> 01/31/2008 19:47:08 C008: Dlm UnLock. fd/id 166/P00008, returning ok
>> 01/31/2008 19:47:09 C001: Dlm UnLock. fd/id 152/P10001, returning ok
>> 01/31/2008 19:47:19 C009: Dlm UnLock. fd/id 168/P10009, returning ok
>> 01/31/2008 19:47:30 C00B: Dlm UnLock. fd/id 172/P30011, returning ok
>> 01/31/2008 19:47:31 C003: Dlm UnLock. fd/id 156/P30003, returning ok
>> 01/31/2008 19:49:15 C000: Dlm Lock fd/id 150/P00000, returning: ok
>> 01/31/2008 19:49:37 C000: Dlm UnLock. fd/id 150/P00000, returning ok
>> 01/31/2008 19:49:40 C000: Dlm Lock fd/id 150/P00000, returning:
>> Trylock failed
>>
>>
>> I also had a problem with this system the day before this. Here is The
>>
>
>
>> data from that:
>>
>> SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
>> DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
>> DUMPFILE: vmcore
>> CPUS: 4
>> DATE: Wed Jan 30 17:44:49 2008
>> UPTIME: 9 days, 00:55:51
>> LOAD AVERAGE: 1.10, 1.06, 1.01
>> TASKS: 341
>> NODENAME: N1
>> RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
>> VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
>> MACHINE: i686 (2327 Mhz)
>> MEMORY: 2 GB
>> PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
>> PID: 31585
>> COMMAND: "masx"
>> TASK: dab912d0 [THREAD_INFO: d5840000]
>> CPU: 0
>> STATE: TASK_RUNNING (PANIC)
>>
>> crash> bt
>> PID: 31585 TASK: dab912d0 CPU: 0 COMMAND: "masx"
>> #0 [d5841d78] crash_kexec at c013bb1a
>> #1 [d5841dbc] die at c01055fe
>> #2 [d5841dec] do_invalid_op at c0105ce2
>> #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
>> EAX: 00000051 EBX: ca668280 ECX: 00000000 EDX: 00000296 EBP:
>> da0c1c00
>> DS: 007b ESI: da0c1c00 ES: 007b EDI: ca668280
>> CS: 0060 EIP: fb835e95 ERR: ffffffff EFLAGS: 00010296
>> #4 [d5841ed0] dlm_empty_lockres at fb835e95
>> #5 [d5841ee0] dlm_unregister_domain at fb827305
>> #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
>> #7 [d5841f24] clear_inode at c0175dfe
>> #8 [d5841f30] generic_delete_inode at c0175eee
>> #9 [d5841f3c] iput at c0175838
>> #10 [d5841f48] dput at c01744e0
>> #11 [d5841f54] do_rmdir at c016e63d
>> #12 [d5841fb8] sysenter_entry at c0103bd4
>> EAX: 00000028 EBX: 08299988 ECX: 00000000 EDX: 08273be4
>> DS: 007b ESI: 00000000 ES: 007b EDI: 082ebf28
>> SS: 007b ESP: bf999f7c EBP: bf999fa8
>> CS: 0073 EIP: ffffe410 ERR: 00000028 EFLAGS: 00000246
>>
>>
>> This is a Suse Sles10 SP1 system, with a suse nfs patch.
>> Ocfs2 tools version 1.2.3-0.7
>> Ocfs2 version 1.2.5-SLES-r2997
>>
>> I was hoping you would have some ideas on this.
>>
>> Also, another question. I have been trying to run one of the debugging
>>
>
>
>> Scripts, for example, scanlocks. I keep getting the message 'Module
>> debugfs not loaded'. I don't see any debugfs.ko on the system. Isn't
>> it a part of The ocfs2 tools ?
>>
>> Thank you,
>>
>> Charlie
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>
>
More information about the Ocfs2-users
mailing list