[Ocfs2-users] servers blocked on ocfs2
frank
frank at si.ct.upc.edu
Mon Dec 13 23:59:38 PST 2010
Al 13/12/10 20:49, En/na Sunil Mushran ha escrit:
> On 12/12/2010 11:58 PM, frank wrote:
>> After that, all node operations frozen; we can not log in either.
>>
>> Node 0 keep on log this kind of messages until it stopped "message"
>> logging at 10:49:
>>
>> /Dec 4 10:49:34 heraclito kernel:
>> (sendmail,19074,6):ocfs2_inode_lock_full:2121 ERROR: status = -22
>> Dec 4 10:49:34 heraclito kernel:
>> (sendmail,19074,6):_ocfs2_statfs:1266 ERROR: status = -22
>> Dec 4 10:49:34 heraclito kernel:
>> (sendmail,19074,6):dlm_send_remote_convert_request:393 ERROR: dlm
>> status = DLM_IVLOCKID
>> Dec 4 10:49:34 heraclito kernel:
>> (sendmail,19074,6):dlmconvert_remote:327 ERROR: dlm status = DLM_IVLOCKID
>> Dec 4 10:49:34 heraclito kernel:
>> (sendmail,19074,6):ocfs2_cluster_lock:1258 ERROR: DLM error
>> DLM_IVLOCKID while calling dlmlock on resource M00000000
>> 0000000000000b6f931666: bad lockid/
>
> Node 0 is trying to upconvert the lock level.
>
>> Node 1 keep on log this kind of messages until it stopped "message"
>> logging at 10:00:
>>
>> /Dec 4 10:00:20 parmenides kernel:
>> (o2net,10545,14):dlm_convert_lock_handler:489 ERROR: did not find
>> lock to convert on grant queue! cookie=0:6
>> Dec 4 10:00:20 parmenides kernel: lockres:
>> M000000000000000000000b6f931666, owner=1, state=0
>> Dec 4 10:00:20 parmenides kernel: last used: 0, refcnt: 4, on
>> purge list: no
>> Dec 4 10:00:20 parmenides kernel: on dirty list: no, on reco list:
>> no, migrating pending: no
>> Dec 4 10:00:20 parmenides kernel: inflight locks: 0, asts reserved: 0
>> Dec 4 10:00:20 parmenides kernel: refmap nodes: [ 0 ], inflight=0
>> Dec 4 10:00:20 parmenides kernel: granted queue:
>> Dec 4 10:00:20 parmenides kernel: type=5, conv=-1, node=1,
>> cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
>> pending=(conv=n,lock=n
>> ,cancel=n,unlock=n)
>> Dec 4 10:00:20 parmenides kernel: converting queue:
>> Dec 4 10:00:20 parmenides kernel: type=0, conv=3, node=0,
>> cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
>> pending=(conv=n,lock=n,
>> cancel=n,unlock=n)
>> Dec 4 10:00:20 parmenides kernel: blocked queue:/
>
> Node 1 does not find that lock in the granted queue because that lock
> is in the
> converting queue. Do you have the very first error message on both nodes
> relating to this resource?
Here they are:
Node 0:
Dec 4 09:15:06 heraclito kernel: o2net: connection to node parmenides
(num 1) at 192.168.1.2:7777 has been idle for 30.0 seconds, shutting it
down.
Dec 4 09:15:06 heraclito kernel: (swapper,0,7):o2net_idle_timer:1503
here are some times that might help debug the situation: (tmr
1291450476.228826
now 1291450506.229456 dr 1291450476.228760 adv
1291450476.228842:1291450476.228843 func (de6e01eb:500)
1291450476.228827:1291450476.228829)
Dec 4 09:15:06 heraclito kernel: o2net: no longer connected to node
parmenides (num 1) at 192.168.1.2:7777
Dec 4 09:15:06 heraclito kernel:
(vzlist,22622,7):dlm_send_remote_convert_request:395 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(snmpd,16452,10):dlm_send_remote_convert_request:395 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(snmpd,16452,10):dlm_wait_for_node_death:370
0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec 4 09:15:06 heraclito kernel:
(httpd,4615,10):dlm_do_master_request:1334 ERROR: link to 1 went down!
Dec 4 09:15:06 heraclito kernel:
(httpd,4615,10):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(python,20750,10):dlm_do_master_request:1334 ERROR: link to 1 went down!
Dec 4 09:15:06 heraclito kernel:
(python,20750,10):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(vzlist,22622,7):dlm_wait_for_node_death:370
0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec 4 09:15:06 heraclito kernel: o2net: accepted connection from node
parmenides (num 1) at 192.168.1.2:7777
Dec 4 09:15:11 heraclito kernel:
(snmpd,16452,5):dlm_send_remote_convert_request:393 ERROR: dlm status =
DLM_IVLOCKID
Dec 4 09:15:11 heraclito kernel: (snmpd,16452,5):dlmconvert_remote:327
ERROR: dlm status = DLM_IVLOCKID
Dec 4 09:15:11 heraclito kernel:
(snmpd,16452,5):ocfs2_cluster_lock:1258 ERROR: DLM error DLM_IVLOCKID
while calling dlmlock on resource M00000000000
0000000000b6f931666: bad lockid
Node 1:
Dec 4 09:15:06 parmenides kernel: o2net: connection to node heraclito
(num 0) at 192.168.1.3:7777 has been idle for 30.0 seconds, shutting it
down.
Dec 4 09:15:06 parmenides kernel: (swapper,0,9):o2net_idle_timer:1503
here are some times that might help debug the situation: (tmr
1291450476.231519
now 1291450506.232462 dr 1291450476.231506 adv
1291450476.231522:1291450476.231522 func (de6e01eb:505)
1291450475.650496:1291450475.650501)
Dec 4 09:15:06 parmenides kernel: o2net: no longer connected to node
heraclito (num 0) at 192.168.1.3:7777
Dec 4 09:15:06 parmenides kernel:
(snmpd,12342,11):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(minilogd,12700,0):dlm_wait_for_lock_mastery:1117 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(smbd,25555,12):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(python,12439,9):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(python,12439,9):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(smbd,25555,12):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(minilogd,12700,0):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(minilogd,12700,0):dlm_get_lock_resource:917 ERROR: status = -107
Dec 4 09:15:06 parmenides kernel:
(dlm_thread,10627,4):dlm_drop_lockres_ref:2211 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(dlm_thread,10627,4):dlm_purge_lockres:206 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel: o2net: connected to node heraclito
(num 0) at 192.168.1.3:7777
Dec 4 09:15:06 parmenides kernel:
(snmpd,12342,11):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:11 parmenides kernel:
(o2net,10545,6):dlm_convert_lock_handler:489 ERROR: did not find lock to
convert on grant queue! cookie=0:6
Dec 4 09:15:11 parmenides kernel: lockres:
M000000000000000000000b6f931666, owner=1, state=0
Dec 4 09:15:11 parmenides kernel: last used: 0, refcnt: 4, on purge
list: no
Dec 4 09:15:11 parmenides kernel: on dirty list: no, on reco list:
no, migrating pending: no
Dec 4 09:15:11 parmenides kernel: inflight locks: 0, asts reserved: 0
Dec 4 09:15:11 parmenides kernel: refmap nodes: [ 0 ], inflight=0
Dec 4 09:15:11 parmenides kernel: granted queue:
Dec 4 09:15:11 parmenides kernel: type=5, conv=-1, node=1,
cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec 4 09:15:11 parmenides kernel: converting queue:
Dec 4 09:15:11 parmenides kernel: type=0, conv=3, node=0,
cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec 4 09:15:11 parmenides kernel: blocked queue:
>
> Also, this is definitely a system object. Can you list the system
> directory?
> # debugfs.ocfs2 -R "ls -l //" /dev/sdX
>
# debugfs.ocfs2 -R "ls -l //" /dev/mapper/mpath2
6 drwxr-xr-x 4 0 0 3896
19-Oct-2010 08:42 .
6 drwxr-xr-x 4 0 0 3896
19-Oct-2010 08:42 ..
7 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 bad_blocks
8 -rw-r--r-- 1 0 0 831488
19-Oct-2010 08:42 global_inode_alloc
9 -rw-r--r-- 1 0 0 4096
19-Oct-2010 08:47 slot_map
10 -rw-r--r-- 1 0 0 1048576
19-Oct-2010 08:42 heartbeat
11 -rw-r--r-- 1 0 0 2199023255552
19-Oct-2010 08:42 global_bitmap
12 drwxr-xr-x 2 0 0 12288
14-Dec-2010 08:58 orphan_dir:0000
13 drwxr-xr-x 2 0 0 16384
14-Dec-2010 08:50 orphan_dir:0001
14 -rw-r--r-- 1 0 0 1103101952
19-Oct-2010 08:42 extent_alloc:0000
15 -rw-r--r-- 1 0 0 1103101952
19-Oct-2010 08:42 extent_alloc:0001
16 -rw-r--r-- 1 0 0 14109638656
19-Oct-2010 08:42 inode_alloc:0000
17 -rw-r--r-- 1 0 0 6673137664
19-Oct-2010 08:42 inode_alloc:0001
18 -rw-r--r-- 1 0 0 268435456
19-Oct-2010 08:46 journal:0000
19 -rw-r--r-- 1 0 0 268435456
19-Oct-2010 08:47 journal:0001
20 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 local_alloc:0000
21 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 local_alloc:0001
22 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 truncate_log:0000
23 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 truncate_log:0001
Thanks once more for your help.
Regards.
Frank
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101214/5090bf7b/attachment.html
More information about the Ocfs2-users
mailing list