[Ocfs2-users] Dlm question
Sunil Mushran
Sunil.Mushran at oracle.com
Fri Feb 1 11:29:00 PST 2008
There are 3 issues. I'll address them in the reverse order.
debugfs, not to be confused with debugfs.ocfs2, is a kernel
component. It used to be shipped with the ocfs2 kernel module
package as RHEL4/SLES9 did not bundle it.
RHEL5/SLES10 build/ship it as part of the kernel (not as a module)
and hence the scanlocks check fails. Solution is to comment out
the section that checks whether is is loaded. (Section commented
with with "#is debugfs loaded?")
The second issue is the oops. File a bugzilla with NOVELL.
We will handle this via them as we need to see what patches
your kernel/ocfs2 has.
The first issue indicates that the lock is busy. (-16 is EBUSY).
Meaning there are holders. As the locks are files, you can use
fuser to see which pid is using it. If you want to see the state
of the lock, you will have to dump it via the dlm proc interface.
# echo R domain lock >/proc/fs/ocfs2_dlm/debug
Here domain will the directory in /dlm and lock the file in it.
The state will be dumped in /var/log/messages.
No, scanlocks cannot dump dlmfs locks.
Sunil
Charlie Sharkey wrote:
>
>
> Hi,
>
> I'm having some dlm issues on a system. It looks like
> the scenario went something like:
>
> 19:45:29 --> 19:47:31 16 dlm locks are released using
> o2dlm_unlock().
> ocfs2 logs an error into /var/log/messages,
> but returns
> ok to the application
>
> 19:45:15 a dlm lock (o2dlm_lock()) is put on P00000
> -- ok
>
> 19:49:37 lock on P00000 is released -- ok
>
> 19:49:40 a lock is attempted P00000. and the lock
> fails. Returned error
> is "Trylock failed"
>
>
> Here is the data from /var/log/messages:
>
> Jan 31 19:45:29 N1 kernel: (25033,1):dlmfs_unlink:512 ERROR: unlink
> P50005, error -16 from destroy
> Jan 31 19:45:43 N1 kernel: (25038,1):dlmfs_unlink:512 ERROR: unlink
> P20010, error -16 from destroy
> Jan 31 19:45:44 N1 kernel: (25030,1):dlmfs_unlink:512 ERROR: unlink
> P20002, error -16 from destroy
> Jan 31 19:45:59 N1 kernel: (25034,3):dlmfs_unlink:512 ERROR: unlink
> P60006, error -16 from destroy
> Jan 31 19:46:07 N1 kernel: (25043,0):dlmfs_unlink:512 ERROR: unlink
> P70015, error -16 from destroy
> Jan 31 19:46:08 N1 kernel: (25035,0):dlmfs_unlink:512 ERROR: unlink
> P70007, error -16 from destroy
> Jan 31 19:46:10 N1 kernel: (25041,1):dlmfs_unlink:512 ERROR: unlink
> P50013, error -16 from destroy
> Jan 31 19:46:25 N1 kernel: (25040,1):dlmfs_unlink:512 ERROR: unlink
> P40012, error -16 from destroy
> Jan 31 19:46:30 N1 kernel: (25042,1):dlmfs_unlink:512 ERROR: unlink
> P60014, error -16 from destroy
> Jan 31 19:46:30 N1 kernel: (25032,1):dlmfs_unlink:512 ERROR: unlink
> P40004, error -16 from destroy
> Jan 31 19:47:07 N1 kernel: (25028,1):dlmfs_unlink:512 ERROR: unlink
> P00000, error -16 from destroy
> Jan 31 19:47:08 N1 kernel: (25036,1):dlmfs_unlink:512 ERROR: unlink
> P00008, error -16 from destroy
> Jan 31 19:47:09 N1 kernel: (25029,1):dlmfs_unlink:512 ERROR: unlink
> P10001, error -16 from destroy
> Jan 31 19:47:19 N1 kernel: (25037,0):dlmfs_unlink:512 ERROR: unlink
> P10009, error -16 from destroy
> Jan 31 19:47:30 N1 kernel: (25039,1):dlmfs_unlink:512 ERROR: unlink
> P30011, error -16 from destroy
> Jan 31 19:47:31 N1 kernel: (25031,1):dlmfs_unlink:512 ERROR: unlink
> P30003, error -16 from destroy
>
>
> Here is data from the application dlm log file
>
> 01/31/2008 19:42:50 C000: Dlm Lock fd/id 150/P00000, returning: ok
> 01/31/2008 19:42:50 C001: Dlm Lock fd/id 152/P10001, returning: ok
> 01/31/2008 19:42:50 C002: Dlm Lock fd/id 154/P20002, returning: ok
> 01/31/2008 19:42:51 C003: Dlm Lock fd/id 156/P30003, returning: ok
> 01/31/2008 19:42:51 C004: Dlm Lock fd/id 158/P40004, returning: ok
> 01/31/2008 19:42:52 C005: Dlm Lock fd/id 160/P50005, returning: ok
> 01/31/2008 19:42:52 C006: Dlm Lock fd/id 162/P60006, returning: ok
> 01/31/2008 19:42:52 C007: Dlm Lock fd/id 164/P70007, returning: ok
> 01/31/2008 19:42:53 C008: Dlm Lock fd/id 166/P00008, returning: ok
> 01/31/2008 19:42:53 C009: Dlm Lock fd/id 168/P10009, returning: ok
> 01/31/2008 19:42:53 C00A: Dlm Lock fd/id 170/P20010, returning: ok
> 01/31/2008 19:42:54 C00B: Dlm Lock fd/id 172/P30011, returning: ok
> 01/31/2008 19:42:54 C00C: Dlm Lock fd/id 174/P40012, returning: ok
> 01/31/2008 19:42:54 C00D: Dlm Lock fd/id 178/P50013, returning: ok
> 01/31/2008 19:42:55 C00E: Dlm Lock fd/id 180/P60014, returning: ok
> 01/31/2008 19:42:58 C00F: Dlm Lock fd/id 182/P70015, returning: ok
> 01/31/2008 19:45:29 C005: Dlm UnLock. fd/id 160/P50005, returning ok
> 01/31/2008 19:45:43 C00A: Dlm UnLock. fd/id 170/P20010, returning ok
> 01/31/2008 19:45:44 C002: Dlm UnLock. fd/id 154/P20002, returning ok
> 01/31/2008 19:45:59 C006: Dlm UnLock. fd/id 162/P60006, returning ok
> 01/31/2008 19:46:07 C00F: Dlm UnLock. fd/id 182/P70015, returning ok
> 01/31/2008 19:46:08 C007: Dlm UnLock. fd/id 164/P70007, returning ok
> 01/31/2008 19:46:10 C00D: Dlm UnLock. fd/id 178/P50013, returning ok
> 01/31/2008 19:46:25 C00C: Dlm UnLock. fd/id 174/P40012, returning ok
> 01/31/2008 19:46:30 C00E: Dlm UnLock. fd/id 180/P60014, returning ok
> 01/31/2008 19:46:30 C004: Dlm UnLock. fd/id 158/P40004, returning ok
> 01/31/2008 19:47:07 C000: Dlm UnLock. fd/id 150/P00000, returning ok
> 01/31/2008 19:47:08 C008: Dlm UnLock. fd/id 166/P00008, returning ok
> 01/31/2008 19:47:09 C001: Dlm UnLock. fd/id 152/P10001, returning ok
> 01/31/2008 19:47:19 C009: Dlm UnLock. fd/id 168/P10009, returning ok
> 01/31/2008 19:47:30 C00B: Dlm UnLock. fd/id 172/P30011, returning ok
> 01/31/2008 19:47:31 C003: Dlm UnLock. fd/id 156/P30003, returning ok
> 01/31/2008 19:49:15 C000: Dlm Lock fd/id 150/P00000, returning: ok
> 01/31/2008 19:49:37 C000: Dlm UnLock. fd/id 150/P00000, returning ok
> 01/31/2008 19:49:40 C000: Dlm Lock fd/id 150/P00000, returning: Trylock
> failed
>
>
> I also had a problem with this system the day before this. Here is
> The data from that:
>
> SYSTEM MAP: /boot/System.map-2.6.16.46-0.14.PTF.284042.0-smp
> DEBUG KERNEL: ../vmlinux.debug (2.6.16.46-0.14.PTF.284042.0-smp)
> DUMPFILE: vmcore
> CPUS: 4
> DATE: Wed Jan 30 17:44:49 2008
> UPTIME: 9 days, 00:55:51
> LOAD AVERAGE: 1.10, 1.06, 1.01
> TASKS: 341
> NODENAME: N1
> RELEASE: 2.6.16.46-0.14.PTF.284042.0-smp
> VERSION: #1 SMP Thu May 17 14:00:09 UTC 2007
> MACHINE: i686 (2327 Mhz)
> MEMORY: 2 GB
> PANIC: "kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2780!"
> PID: 31585
> COMMAND: "masx"
> TASK: dab912d0 [THREAD_INFO: d5840000]
> CPU: 0
> STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 31585 TASK: dab912d0 CPU: 0 COMMAND: "masx"
> #0 [d5841d78] crash_kexec at c013bb1a
> #1 [d5841dbc] die at c01055fe
> #2 [d5841dec] do_invalid_op at c0105ce2
> #3 [d5841e9c] error_code (via invalid_op) at c0104e4d
> EAX: 00000051 EBX: ca668280 ECX: 00000000 EDX: 00000296 EBP:
> da0c1c00
> DS: 007b ESI: da0c1c00 ES: 007b EDI: ca668280
> CS: 0060 EIP: fb835e95 ERR: ffffffff EFLAGS: 00010296
> #4 [d5841ed0] dlm_empty_lockres at fb835e95
> #5 [d5841ee0] dlm_unregister_domain at fb827305
> #6 [d5841f18] dlmfs_clear_inode at fb6c2eae
> #7 [d5841f24] clear_inode at c0175dfe
> #8 [d5841f30] generic_delete_inode at c0175eee
> #9 [d5841f3c] iput at c0175838
> #10 [d5841f48] dput at c01744e0
> #11 [d5841f54] do_rmdir at c016e63d
> #12 [d5841fb8] sysenter_entry at c0103bd4
> EAX: 00000028 EBX: 08299988 ECX: 00000000 EDX: 08273be4
> DS: 007b ESI: 00000000 ES: 007b EDI: 082ebf28
> SS: 007b ESP: bf999f7c EBP: bf999fa8
> CS: 0073 EIP: ffffe410 ERR: 00000028 EFLAGS: 00000246
>
>
> This is a Suse Sles10 SP1 system, with a suse nfs patch.
> Ocfs2 tools version 1.2.3-0.7
> Ocfs2 version 1.2.5-SLES-r2997
>
> I was hoping you would have some ideas on this.
>
> Also, another question. I have been trying to run one of the debugging
> Scripts, for example, scanlocks. I keep getting the message 'Module
> debugfs
> not loaded'. I don't see any debugfs.ko on the system. Isn't it a part
> of
> The ocfs2 tools ?
>
> Thank you,
>
> Charlie
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list