[Ocfs2-users] servers blocked on ocfs2

frank frank at si.ct.upc.edu
Mon Dec 13 23:59:38 PST 2010


Al 13/12/10 20:49, En/na Sunil Mushran ha escrit:
> On 12/12/2010 11:58 PM, frank wrote:
>> After that, all node operations frozen; we can not log in either.
>>
>> Node 0 keep on log this kind of messages until it stopped "message" 
>> logging at 10:49:
>>
>> /Dec  4 10:49:34 heraclito kernel: 
>> (sendmail,19074,6):ocfs2_inode_lock_full:2121 ERROR: status = -22
>> Dec  4 10:49:34 heraclito kernel: 
>> (sendmail,19074,6):_ocfs2_statfs:1266 ERROR: status = -22
>> Dec  4 10:49:34 heraclito kernel: 
>> (sendmail,19074,6):dlm_send_remote_convert_request:393 ERROR: dlm 
>> status = DLM_IVLOCKID
>> Dec  4 10:49:34 heraclito kernel: 
>> (sendmail,19074,6):dlmconvert_remote:327 ERROR: dlm status = DLM_IVLOCKID
>> Dec  4 10:49:34 heraclito kernel: 
>> (sendmail,19074,6):ocfs2_cluster_lock:1258 ERROR: DLM error 
>> DLM_IVLOCKID while calling dlmlock on resource M00000000
>> 0000000000000b6f931666: bad lockid/
>
> Node 0 is trying to upconvert the lock level.
>
>> Node 1 keep on log this kind of messages until it stopped "message" 
>> logging at 10:00:
>>
>> /Dec  4 10:00:20 parmenides kernel: 
>> (o2net,10545,14):dlm_convert_lock_handler:489 ERROR: did not find 
>> lock to convert on grant queue! cookie=0:6
>> Dec  4 10:00:20 parmenides kernel: lockres: 
>> M000000000000000000000b6f931666, owner=1, state=0
>> Dec  4 10:00:20 parmenides kernel:   last used: 0, refcnt: 4, on 
>> purge list: no
>> Dec  4 10:00:20 parmenides kernel:   on dirty list: no, on reco list: 
>> no, migrating pending: no
>> Dec  4 10:00:20 parmenides kernel:   inflight locks: 0, asts reserved: 0
>> Dec  4 10:00:20 parmenides kernel:   refmap nodes: [ 0 ], inflight=0
>> Dec  4 10:00:20 parmenides kernel:   granted queue:
>> Dec  4 10:00:20 parmenides kernel:     type=5, conv=-1, node=1, 
>> cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), 
>> pending=(conv=n,lock=n
>> ,cancel=n,unlock=n)
>> Dec  4 10:00:20 parmenides kernel:   converting queue:
>> Dec  4 10:00:20 parmenides kernel:     type=0, conv=3, node=0, 
>> cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), 
>> pending=(conv=n,lock=n,
>> cancel=n,unlock=n)
>> Dec  4 10:00:20 parmenides kernel:   blocked queue:/
>
> Node 1 does not find that lock in the granted queue because that lock 
> is in the
> converting queue. Do you have the very first error message on both nodes
> relating to this resource?
Here they are:

Node 0:
Dec  4 09:15:06 heraclito kernel: o2net: connection to node parmenides 
(num 1) at 192.168.1.2:7777 has been idle for 30.0 seconds, shutting it 
down.
Dec  4 09:15:06 heraclito kernel: (swapper,0,7):o2net_idle_timer:1503 
here are some times that might help debug the situation: (tmr 
1291450476.228826
now 1291450506.229456 dr 1291450476.228760 adv 
1291450476.228842:1291450476.228843 func (de6e01eb:500) 
1291450476.228827:1291450476.228829)
Dec  4 09:15:06 heraclito kernel: o2net: no longer connected to node 
parmenides (num 1) at 192.168.1.2:7777
Dec  4 09:15:06 heraclito kernel: 
(vzlist,22622,7):dlm_send_remote_convert_request:395 ERROR: status = -112
Dec  4 09:15:06 heraclito kernel: 
(snmpd,16452,10):dlm_send_remote_convert_request:395 ERROR: status = -112
Dec  4 09:15:06 heraclito kernel: 
(snmpd,16452,10):dlm_wait_for_node_death:370 
0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec  4 09:15:06 heraclito kernel: 
(httpd,4615,10):dlm_do_master_request:1334 ERROR: link to 1 went down!
Dec  4 09:15:06 heraclito kernel: 
(httpd,4615,10):dlm_get_lock_resource:917 ERROR: status = -112
Dec  4 09:15:06 heraclito kernel: 
(python,20750,10):dlm_do_master_request:1334 ERROR: link to 1 went down!
Dec  4 09:15:06 heraclito kernel: 
(python,20750,10):dlm_get_lock_resource:917 ERROR: status = -112
Dec  4 09:15:06 heraclito kernel: 
(vzlist,22622,7):dlm_wait_for_node_death:370 
0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec  4 09:15:06 heraclito kernel: o2net: accepted connection from node 
parmenides (num 1) at 192.168.1.2:7777
Dec  4 09:15:11 heraclito kernel: 
(snmpd,16452,5):dlm_send_remote_convert_request:393 ERROR: dlm status = 
DLM_IVLOCKID
Dec  4 09:15:11 heraclito kernel: (snmpd,16452,5):dlmconvert_remote:327 
ERROR: dlm status = DLM_IVLOCKID
Dec  4 09:15:11 heraclito kernel: 
(snmpd,16452,5):ocfs2_cluster_lock:1258 ERROR: DLM error DLM_IVLOCKID 
while calling dlmlock on resource M00000000000
0000000000b6f931666: bad lockid

Node 1:
Dec  4 09:15:06 parmenides kernel: o2net: connection to node heraclito 
(num 0) at 192.168.1.3:7777 has been idle for 30.0 seconds, shutting it 
down.
Dec  4 09:15:06 parmenides kernel: (swapper,0,9):o2net_idle_timer:1503 
here are some times that might help debug the situation: (tmr 
1291450476.231519
  now 1291450506.232462 dr 1291450476.231506 adv 
1291450476.231522:1291450476.231522 func (de6e01eb:505) 
1291450475.650496:1291450475.650501)
Dec  4 09:15:06 parmenides kernel: o2net: no longer connected to node 
heraclito (num 0) at 192.168.1.3:7777
Dec  4 09:15:06 parmenides kernel: 
(snmpd,12342,11):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec  4 09:15:06 parmenides kernel: 
(minilogd,12700,0):dlm_wait_for_lock_mastery:1117 ERROR: status = -112
Dec  4 09:15:06 parmenides kernel: 
(smbd,25555,12):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec  4 09:15:06 parmenides kernel: 
(python,12439,9):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec  4 09:15:06 parmenides kernel: 
(python,12439,9):dlm_get_lock_resource:917 ERROR: status = -112
Dec  4 09:15:06 parmenides kernel: 
(smbd,25555,12):dlm_get_lock_resource:917 ERROR: status = -112
Dec  4 09:15:06 parmenides kernel: 
(minilogd,12700,0):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec  4 09:15:06 parmenides kernel: 
(minilogd,12700,0):dlm_get_lock_resource:917 ERROR: status = -107
Dec  4 09:15:06 parmenides kernel: 
(dlm_thread,10627,4):dlm_drop_lockres_ref:2211 ERROR: status = -112
Dec  4 09:15:06 parmenides kernel: 
(dlm_thread,10627,4):dlm_purge_lockres:206 ERROR: status = -112
Dec  4 09:15:06 parmenides kernel: o2net: connected to node heraclito 
(num 0) at 192.168.1.3:7777
Dec  4 09:15:06 parmenides kernel: 
(snmpd,12342,11):dlm_get_lock_resource:917 ERROR: status = -112
Dec  4 09:15:11 parmenides kernel: 
(o2net,10545,6):dlm_convert_lock_handler:489 ERROR: did not find lock to 
convert on grant queue! cookie=0:6
Dec  4 09:15:11 parmenides kernel: lockres: 
M000000000000000000000b6f931666, owner=1, state=0
Dec  4 09:15:11 parmenides kernel:   last used: 0, refcnt: 4, on purge 
list: no
Dec  4 09:15:11 parmenides kernel:   on dirty list: no, on reco list: 
no, migrating pending: no
Dec  4 09:15:11 parmenides kernel:   inflight locks: 0, asts reserved: 0
Dec  4 09:15:11 parmenides kernel:   refmap nodes: [ 0 ], inflight=0
Dec  4 09:15:11 parmenides kernel:   granted queue:
Dec  4 09:15:11 parmenides kernel:     type=5, conv=-1, node=1, 
cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), 
pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec  4 09:15:11 parmenides kernel:   converting queue:
Dec  4 09:15:11 parmenides kernel:     type=0, conv=3, node=0, 
cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), 
pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec  4 09:15:11 parmenides kernel:   blocked queue:

>
> Also, this is definitely a system object. Can you list the system 
> directory?
> # debugfs.ocfs2 -R "ls -l //" /dev/sdX
>
# debugfs.ocfs2 -R "ls -l //" /dev/mapper/mpath2
     6               drwxr-xr-x   4     0     0            3896 
19-Oct-2010 08:42 .
     6               drwxr-xr-x   4     0     0            3896 
19-Oct-2010 08:42 ..
     7               -rw-r--r--   1     0     0               0 
19-Oct-2010 08:42 bad_blocks
     8               -rw-r--r--   1     0     0          831488 
19-Oct-2010 08:42 global_inode_alloc
     9               -rw-r--r--   1     0     0            4096 
19-Oct-2010 08:47 slot_map
     10              -rw-r--r--   1     0     0         1048576 
19-Oct-2010 08:42 heartbeat
     11              -rw-r--r--   1     0     0   2199023255552 
19-Oct-2010 08:42 global_bitmap
     12              drwxr-xr-x   2     0     0           12288 
14-Dec-2010 08:58 orphan_dir:0000
     13              drwxr-xr-x   2     0     0           16384 
14-Dec-2010 08:50 orphan_dir:0001
     14              -rw-r--r--   1     0     0      1103101952 
19-Oct-2010 08:42 extent_alloc:0000
     15              -rw-r--r--   1     0     0      1103101952 
19-Oct-2010 08:42 extent_alloc:0001
     16              -rw-r--r--   1     0     0     14109638656 
19-Oct-2010 08:42 inode_alloc:0000
     17              -rw-r--r--   1     0     0      6673137664 
19-Oct-2010 08:42 inode_alloc:0001
     18              -rw-r--r--   1     0     0       268435456 
19-Oct-2010 08:46 journal:0000
     19              -rw-r--r--   1     0     0       268435456 
19-Oct-2010 08:47 journal:0001
     20              -rw-r--r--   1     0     0               0 
19-Oct-2010 08:42 local_alloc:0000
     21              -rw-r--r--   1     0     0               0 
19-Oct-2010 08:42 local_alloc:0001
     22              -rw-r--r--   1     0     0               0 
19-Oct-2010 08:42 truncate_log:0000
     23              -rw-r--r--   1     0     0               0 
19-Oct-2010 08:42 truncate_log:0001

Thanks once more for your help.
Regards.

Frank



-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101214/5090bf7b/attachment.html 


More information about the Ocfs2-users mailing list