<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
This has been fixed for sometime now. <br>
<br>
============================================<br>
commit 14741472a05245ed5778aa0aec055e1f920b6ef8<br>
Author: Srinivas Eeda <a class="moz-txt-link-rfc2396E" href="mailto:srinivas.eeda@oracle.com"><srinivas.eeda@oracle.com></a><br>
Date: Mon Mar 22 16:50:47 2010 -0700<br>
<br>
ocfs2: Fix a race in o2dlm lockres mastery<br>
<br>
In o2dlm, the master of a lock resource keeps a map of all
interested<br>
nodes. This prevents the master from purging the resource before an<br>
interested node can create a lock.<br>
<br>
A race between the mastery thread and the mastery handler allowed an<br>
interested node to discover who the master is without informing the<br>
master directly. This is easily fixed by holding the dlm spinlock a<br>
little longer in the mastery handler.<br>
<br>
Signed-off-by: Srinivas Eeda <a class="moz-txt-link-rfc2396E" href="mailto:srinivas.eeda@oracle.com"><srinivas.eeda@oracle.com></a><br>
Signed-off-by: Joel Becker <a class="moz-txt-link-rfc2396E" href="mailto:joel.becker@oracle.com"><joel.becker@oracle.com></a><br>
<br>
<br>
commit a524812b7eaa7783d7811198921100f079034e61<br>
Author: Wengang Wang <a class="moz-txt-link-rfc2396E" href="mailto:wen.gang.wang@oracle.com"><wen.gang.wang@oracle.com></a><br>
Date: Fri Jul 30 16:14:44 2010 +0800<br>
<br>
ocfs2/dlm: avoid incorrect bit set in refmap on recovery master<br>
<br>
In the following situation, there remains an incorrect bit in
refmap on the<br>
recovery master. Finally the recovery master will fail at purging
the lockres<br>
due to the incorrect bit in refmap.<br>
<br>
1) node A has no interest on lockres A any longer, so it is purging
it.<br>
2) the owner of lockres A is node B, so node A is sending de-ref
message<br>
to node B.<br>
3) at this time, node B crashed. node C becomes the recovery
master. it recovers<br>
lockres A(because the master is the dead node B).<br>
4) node A migrated lockres A to node C with a refbit there.<br>
5) node A failed to send de-ref message to node B because it
crashed. The failure<br>
is ignored. no other action is done for lockres A any more.<br>
<br>
For mormal, re-send the deref message to it to recovery master can
fix it. Well,<br>
ignoring the failure of deref to the original master and not
recovering the lockres<br>
to recovery master has the same effect. And the later is simpler.<br>
<br>
Signed-off-by: Wengang Wang <a class="moz-txt-link-rfc2396E" href="mailto:wen.gang.wang@oracle.com"><wen.gang.wang@oracle.com></a><br>
Acked-by: Srinivas Eeda <a class="moz-txt-link-rfc2396E" href="mailto:srinivas.eeda@oracle.com"><srinivas.eeda@oracle.com></a><br>
Cc: <a class="moz-txt-link-abbreviated" href="mailto:stable@kernel.org">stable@kernel.org</a><br>
Signed-off-by: Joel Becker <a class="moz-txt-link-rfc2396E" href="mailto:joel.becker@oracle.com"><joel.becker@oracle.com></a><br>
============================================<br>
<br>
<br>
On 09/29/2010 06:47 AM, Charlie Sharkey wrote:
<blockquote
cite="mid:03FB5D708BE3C8448E8079186A56CDE601D5432D@BTIBURMAIL.bustech.com"
type="cite">
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:312568955;
        mso-list-type:hybrid;
        mso-list-template-ids:-1604791950 2021588184 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
        {mso-level-start-at:0;
        mso-level-number-format:bullet;
        mso-level-text:-;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:"Calibri","sans-serif";
        mso-fareast-font-family:Calibri;
        mso-bidi-font-family:"Times New Roman";}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I got the following crash on a Sles10 SP2
system, info
below.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Is this a known problem ? It looks similar to
bug# 912<o:p></o:p></p>
<p class="MsoNormal">
<a class="moz-txt-link-freetext" href="http://oss.oracle.com/bugzilla/show_bug.cgi?id=912">http://oss.oracle.com/bugzilla/show_bug.cgi?id=912</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">version info<o:p></o:p></p>
<p class="MsoNormal">-----------------<o:p></o:p></p>
<p class="MsoNormal">OCFS2 Node Manager 1.4.1-1-SLES Wed Jul 23
18:33:42 UTC 2008
(build f922955d99ef972235bd0c1fc236c5ddbb368611)<o:p></o:p></p>
<p class="MsoNormal">OCFS2 DLM 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC
2008 (build
f922955d99ef972235bd0c1fc236c5ddbb368611)<o:p></o:p></p>
<p class="MsoNormal">OCFS2 DLMFS 1.4.1-1-SLES Wed Jul 23 18:33:42 UTC
2008 (build
f922955d99ef972235bd0c1fc236c5ddbb368611)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">crash info<o:p></o:p></p>
<p class="MsoNormal">------------- <o:p></o:p></p>
<p class="MsoNormal"> KERNEL: ./vmlinux-2.6.16.60-0.42.10<o:p></o:p></p>
<p class="MsoNormal"> DUMPFILE: ../n2_vmcore_20100925<o:p></o:p></p>
<p class="MsoNormal"> CPUS: 8<o:p></o:p></p>
<p class="MsoNormal"> DATE: Sat Sep 25 12:48:00 2010<o:p></o:p></p>
<p class="MsoNormal"> UPTIME: 10 days, 04:08:44<o:p></o:p></p>
<p class="MsoNormal"> LOAD AVERAGE: 9.39, 9.11, 8.67<o:p></o:p></p>
<p class="MsoNormal"> TASKS: 484<o:p></o:p></p>
<p class="MsoNormal"> NODENAME: n2<o:p></o:p></p>
<p class="MsoNormal"> RELEASE: 2.6.16.60-0.42.10-smp<o:p></o:p></p>
<p class="MsoNormal"> VERSION: #1 SMP Tue Apr 27 05:11:27 UTC 2010<o:p></o:p></p>
<p class="MsoNormal"> MACHINE: x86_64 (2926 Mhz)<o:p></o:p></p>
<p class="MsoNormal"> MEMORY: 2.9 GB<o:p></o:p></p>
<p class="MsoNormal"> PANIC: ""<o:p></o:p></p>
<p class="MsoNormal"> PID: 6557<o:p></o:p></p>
<p class="MsoNormal"> COMMAND: "dlm_thread"<o:p></o:p></p>
<p class="MsoNormal"> TASK: ffff81012ac89860 [THREAD_INFO:
ffff81010532e000]<o:p></o:p></p>
<p class="MsoNormal"> CPU: 4<o:p></o:p></p>
<p class="MsoNormal"> STATE: TASK_RUNNING (PANIC)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">crash> bt<o:p></o:p></p>
<p class="MsoNormal">PID: 6557 TASK: ffff81012ac89860 CPU: 4
COMMAND:
"dlm_thread"<o:p></o:p></p>
<p class="MsoNormal"> #0 [ffff81010532fa50] machine_kexec at
ffffffff8011c0b6<o:p></o:p></p>
<p class="MsoNormal"> #1 [ffff81010532fb20] crash_kexec at
ffffffff80154022<o:p></o:p></p>
<p class="MsoNormal"> #2 [ffff81010532fbe0] __die at ffffffff802ec658<o:p></o:p></p>
<p class="MsoNormal"> #3 [ffff81010532fc20] die at ffffffff8010c7e6<o:p></o:p></p>
<p class="MsoNormal"> #4 [ffff81010532fc50] do_invalid_op at
ffffffff8010cd97<o:p></o:p></p>
<p class="MsoNormal"> #5 [ffff81010532fd10] error_exit at
ffffffff8010bced<o:p></o:p></p>
<p class="MsoNormal"> [exception RIP: dlm_drop_lockres_ref+480]<o:p></o:p></p>
<p class="MsoNormal"> RIP: ffffffff88511d2a RSP:
ffff81010532fdc8 RFLAGS:
00010286<o:p></o:p></p>
<p class="MsoNormal"> RAX: ffff81006181cc08 RBX:
0000000000000000 RCX:
000000000001109c<o:p></o:p></p>
<p class="MsoNormal"> RDX: 000000000000001f RSI:
0000000000000296 RDI:
ffffffff8035ba1c<o:p></o:p></p>
<p class="MsoNormal"> RBP: ffff81006181cbc0 R8:
ffffffff8045a260 R9:
000000000000001f<o:p></o:p></p>
<p class="MsoNormal"> R10: 0000000000000000 R11:
0000000000000000 R12:
ffff810129b05c00<o:p></o:p></p>
<p class="MsoNormal"> R13: 000000000000001f R14:
ffff81004ada2320 R15:
000000000000026d<o:p></o:p></p>
<p class="MsoNormal"> ORIG_RAX: ffffffffffffffff CS: 0010 SS:
0018<o:p></o:p></p>
<p class="MsoNormal"> #6 [ffff81010532fdc0] dlm_drop_lockres_ref at
ffffffff88511d2a<o:p></o:p></p>
<p class="MsoNormal"> #7 [ffff81010532fe40] dlm_run_purge_list at
ffffffff8852035c<o:p></o:p></p>
<p class="MsoNormal"> #8 [ffff81010532fe90] dlm_thread at
ffffffff88520718<o:p></o:p></p>
<p class="MsoNormal"> #9 [ffff81010532ff10] kthread at
ffffffff801480cd<o:p></o:p></p>
<p class="MsoNormal">#10 [ffff81010532ff50] kernel_thread at
ffffffff8010bea6<o:p></o:p></p>
<p class="MsoNormal">crash> <o:p></o:p></p>
<p class="MsoNormal">
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">text extracted from the core file:<o:p></o:p></p>
<p class="MsoNormal">-----------------------------------------<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><3>(6345,7):dlm_deref_lockres_handler:2302
ERROR:
27870DB34A7241CC8EBDD43647ABE1FB:M0000000000000078b4305e00000000: node
0 trying
to drop ref but it is already dropped!<o:p></o:p></p>
<p class="MsoNormal"><3>(6557,4):dlm_drop_lockres_ref:2234
ERROR: while
dropping ref on
130ADCC7DE934141AF05DA025CCD14A4:O0000000000000079a3bfbc00000000
(master=0) got
-22.<o:p></o:p></p>
<p class="MsoNormal"><1>Kernel BUG at
fs/ocfs2/dlm/dlmmaster.c:2236<o:p></o:p></p>
<p class="MsoNormal"><4>Modules linked in: af_packet ocfs2
ocfs2_dlmfs
ocfs2_dlm ocfs2_nodemanager configfs btipbsa4 ipmi_devintf ipmi_si
ipmi_msghandler bonding ipv6 bticomp_aha363 dock smi button battery
btismc ac
st loop dm_round_robin dm_multipath dm_mod usbhid usb_storage ide_core
i2c_i801
igb e1000 hw_random i2c_core uhci_hcd ehci_hcd usbcore ext3 jbd qla2xxx
firmware_class qla2xxx_conf intermodule edd fan thermal processor sg
megaraid_sas ata_piix libata sd_mod scsi_mod<o:p></o:p></p>
<p class="MsoNormal"><4>Pid: 6557, comm: dlm_thread Tainted:
P U
2.6.16.60-0.42.10-smp #1<o:p></o:p></p>
<p class="MsoNormal"><4>RIP: 0010:[<ffffffff88511d2a>]
<ffffffff88511d2a>{:ocfs2_dlm:dlm_drop_lockres_ref+480}<o:p></o:p></p>
<p class="MsoNormal"><4>Process dlm_thread (pid: 6557,
threadinfo
ffff81010532e000, task ffff81012ac89860)<o:p></o:p></p>
<p class="MsoNormal"><4>Call Trace:
<ffffffff8852035c>{:ocfs2_dlm:dlm_run_purge_list+771}<o:p></o:p></p>
<p class="MsoNormal"><4>
<ffffffff88520718>{:ocfs2_dlm:dlm_thread+131}
<ffffffff8014820e>{autoremove_wake_function+0}<o:p></o:p></p>
<p class="MsoNormal"><4>
<ffffffff88520695>{:ocfs2_dlm:dlm_thread+0}
<ffffffff80147e05>{keventd_create_kthread+0}<o:p></o:p></p>
<p class="MsoNormal"><1>RIP
<ffffffff88511d2a>{:ocfs2_dlm:dlm_drop_lockres_ref+480} RSP
<ffff81010532fdc8><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Ocfs2-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
</blockquote>
<br>
</body>
</html>