[Ocfs2-users] servers blocked on ocfs2
frank
frank at si.ct.upc.edu
Sun Dec 12 23:58:17 PST 2010
After that, all node operations frozen; we can not log in either.
Node 0 keep on log this kind of messages until it stopped "message"
logging at 10:49:
/Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):ocfs2_inode_lock_full:2121 ERROR: status = -22
Dec 4 10:49:34 heraclito kernel: (sendmail,19074,6):_ocfs2_statfs:1266
ERROR: status = -22
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):dlm_send_remote_convert_request:393 ERROR: dlm status
= DLM_IVLOCKID
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):dlmconvert_remote:327 ERROR: dlm status = DLM_IVLOCKID
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):ocfs2_cluster_lock:1258 ERROR: DLM error DLM_IVLOCKID
while calling dlmlock on resource M00000000
0000000000000b6f931666: bad lockid/
Node 1 keep on log this kind of messages until it stopped "message"
logging at 10:00:
/Dec 4 10:00:20 parmenides kernel:
(o2net,10545,14):dlm_convert_lock_handler:489 ERROR: did not find lock
to convert on grant queue! cookie=0:6
Dec 4 10:00:20 parmenides kernel: lockres:
M000000000000000000000b6f931666, owner=1, state=0
Dec 4 10:00:20 parmenides kernel: last used: 0, refcnt: 4, on purge
list: no
Dec 4 10:00:20 parmenides kernel: on dirty list: no, on reco list:
no, migrating pending: no
Dec 4 10:00:20 parmenides kernel: inflight locks: 0, asts reserved: 0
Dec 4 10:00:20 parmenides kernel: refmap nodes: [ 0 ], inflight=0
Dec 4 10:00:20 parmenides kernel: granted queue:
Dec 4 10:00:20 parmenides kernel: type=5, conv=-1, node=1,
cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec 4 10:00:20 parmenides kernel: converting queue:
Dec 4 10:00:20 parmenides kernel: type=0, conv=3, node=0,
cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec 4 10:00:20 parmenides kernel: blocked queue:/
We reboot both nodes at 13:03, and we recovered services as usual with
no more problems.
Frank
Al 10/12/10 20:40, En/na Joel Becker ha escrit:
> On Fri, Dec 10, 2010 at 11:38:04AM -0800, Joel Becker wrote:
>> On Fri, Dec 10, 2010 at 08:42:19AM +0100, frank wrote:
>>> Anyway, if there was a cut in the heartbeat or something similar, one of
>>> the nodes should have fenced itself, haven't it? Why did the nodes
>>> stall? Can we avoid that?
>> If both nodes saw the network go down, but the disk heartbeat
>> was still working, the higher node should have fenced. Was there no
>> fencing? Was it just both nodes hung? How were they hung? All
>> operation, or just ocfs2 operations?
> Oh, I see. While node 0 was waiting for node 1 to kill itself,
> node 1 managed to reconnect. The invalid lock stuff was weird, though.
> After this, did all operation resume to normal, or were many operations
> permanently frozen?
>
> Joel
>
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101213/ceb7555c/attachment.html
More information about the Ocfs2-users
mailing list