[Ocfs2-users] OCFS2 fencing

Tao Ma tao.ma at oracle.com
Thu Mar 12 22:26:55 PDT 2009


Hi ramya,

ramya tn wrote:
> Hi All,
>  
> One of our system fenced by itself few days back and this has been 
> happening very frequently from many days.
> But unfortunately, we aree not able to stop the system fencing as we are 
> not sure what is causing this.
>  
> The error i found out in log file is:
> .
> ..
> .
> .
> .
> Feb 20 23:36:41 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = 
> 0x20000
> Feb 20 23:36:41 ImageInt1 kernel: end_request: I/O error, dev sdc, 
> sector 656216192
> Feb 20 23:36:41 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = 
> 0x20000
> Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, 
> sector 657248384
> Feb 20 23:36:42 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = 
> 0x20000
> Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, 
> sector 667312256
> Feb 20 23:36:42 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = 
> 0x20000
> Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, 
> sector 670408832
> Feb 20 23:36:42 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = 
> 0x20000
> Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, 
> sector 670666880
> .
> .
> .
> .
> .
> Feb 20 23:53:21 ImageInt1 kernel: Index 13: took 0 ms to do submit_bio 
> for write
> Feb 20 23:53:21 ImageInt1 kernel: Index 14: took 0 ms to do checking slots
> Feb 20 23:53:21 ImageInt1 kernel: Index 15: took 50 ms to do waiting for 
> write completion
> Feb 20 23:53:21 ImageInt1 kernel: Index 16: took 1904 ms to do msleep
> Feb 20 23:53:21 ImageInt1 kernel: Index 17: took 0 ms to do allocating 
> bios for read
> Feb 20 23:53:21 ImageInt1 kernel: Index 18: took 0 ms to do bio alloc read
> Feb 20 23:53:21 ImageInt1 kernel: Index 19: took 0 ms to do bio add page 
> read
> Feb 20 23:53:21 ImageInt1 kernel: Index 20: took 0 ms to do submit_bio 
> for read
> Feb 20 23:53:21 ImageInt1 kernel: Index 21: took 44652 ms to do waiting 
> for read completion
> Feb 20 23:53:21 ImageInt1 kernel: Index 22: took 0 ms to do bio alloc write
> Feb 20 23:53:21 ImageInt1 kernel: Index 23: took 0 ms to do bio add page 
> write
> Feb 20 23:53:21 ImageInt1 kernel: Index 0: took 0 ms to do submit_bio 
> for write
> Feb 20 23:53:21 ImageInt1 kernel: Index 1: took 0 ms to do checking slots
> Feb 20 23:53:21 ImageInt1 kernel: Index 2: took 9307 ms to do waiting 
> for write completion
> Feb 20 23:53:21 ImageInt1 kernel: Index 3: took 0 ms to do allocating 
> bios for read
> Feb 20 23:53:21 ImageInt1 kernel: Index 4: took 0 ms to do bio alloc read
> Feb 20 23:53:21 ImageInt1 kernel: Index 5: took 0 ms to do bio add page read
> Feb 20 23:53:21 ImageInt1 kernel: Index 6: took 0 ms to do submit_bio 
> for read
> Feb 20 23:53:22 ImageInt1 kernel: Index 7: took 35756 ms to do waiting 
> for read completion
> Feb 20 23:53:22 ImageInt1 kernel: Index 8: took 0 ms to do bio alloc write
> Feb 20 23:53:22 ImageInt1 kernel: Index 9: took 0 ms to do bio add page 
> write
> Feb 20 23:53:22 ImageInt1 kernel: Index 10: took 0 ms to do submit_bio 
> for write
> Feb 20 23:53:22 ImageInt1 kernel: Index 11: took 0 ms to do checking slots
> Feb 20 23:53:22 ImageInt1 kernel: Index 12: took 84549 ms to do waiting 
> for write completion
> Feb 20 23:53:22 ImageInt1 kernel: *** ocfs2 is very sorry to be fencing 
> this system by restarting ***
> I found the same scsi errors each time it fences. Can anyone suggest 
> what could be the reason for these SCSI errors and is it those SCSI 
> errors which is causing fencing.
I don't know the reason for SCSI errors. So just answer your second qs.
Yes, SCSI error will cause ocfs2 fencing. OCFS2 need to heartbeat in the 
disk, so if it tries many times and still fails to write to disk because 
of the SCSI error, it will fence itself.

Regards,
Tao



More information about the Ocfs2-users mailing list