[Ocfs2-users] Large syslog created after cluster node logout iSCSI LUN

Wed Feb 26 01:11:56 PST 2014

Hi everyone,

I have meet a OCFS2 issue.

The OS is Oracle Linux 6.5, using the latest Oracle UEK kernel 3.8.13-26.1.1.el6uek.x86_64.

Three are two nodes in the OCFS2 cluster, and all nodes use the iSCSI SAN as share storage.
The heartbeat mode of OCFS2 cluster is global. There are three iSCSI LUNs, one is used as
heartbeat device and other two are formatted to OCFS2 volume by mkfs.ocfs2 and mounted on each node.

The problem occurred when I intentionally logout one iSCSI LUN (OCFS2 volume) using command : iscsiadm ¨Cm node ¨CT xxx ¨Cu.
After 5 minutes or more, large same log messages would begin to written into the syslog (/var/log/messages), the contents are as below:

Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520
Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520

.............................................................................................

The syslog file size increases quickly, and will occupy all the remained capacity of the / directory, which making the host blocked and not responsible.

According to the error logs, the messages is logged by function ocfs2_dir_foreach_blk_id in source file fs/ocfs2/dir.c

static int ocfs2_dir_foreach_blk_id(struct inode *inode,
                                    u64 *f_version,
                                    loff_t *f_pos, void *priv,
                                    filldir_t filldir, int *filldir_err)
{
        int ret, i, filldir_ret;
        unsigned long offset = *f_pos;
        struct buffer_head *di_bh = NULL;
        struct ocfs2_dinode *di;
        struct ocfs2_inline_data *data;
        struct ocfs2_dir_entry *de;
        ret = ocfs2_read_inode_block(inode, &di_bh);
        if (ret) {
                mlog(ML_ERROR, "Unable to read inode block for dir %llu\n",
                     (unsigned long long)OCFS2_I(inode)->ip_blkno);
                goto out;
        }
        di = (struct ocfs2_dinode *)di_bh->b_data;
        data = &di->id2.i_data;
.............................................................................................

I can use the command: debugfs.ocfs2 ¨Cl ERROR off to disable mlog(ML_ERROR) logging, but a kernel process will be
created and occupy large cpu resources, and it cannot be killed.

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
5141 root      20   0     0    0    0 R 97.2  0.0  33:03.89 kworker/u:0
2464 root      20   0  193m  28m 6212 S  1.0  2.8   0:19.48 Xorg
3331 root      20   0  289m 8972 4944 S  0.7  0.9   0:06.58 gnome-terminal
2941 root      20   0  130m 4804 1512 S  0.3  0.5   0:00.29 gconfd-2
2990 root      20   0  299m 7268 5136 S  0.3  0.7   0:03.71 wnck-applet
3056 root      20   0  272m 6572 4092 S  0.3  0.6   0:00.21 notification-da
6073 root      20   0 15088 1196  852 R  0.3  0.1   0:00.36 top

If I umount the OCFS2 volume mounted within 5 minutes, this problem would not happen, and the volume
can be re-mounted successfully. While after 5 minitues or more, the OCFS2 volume cannot be umounted
successfully, and the umount process will hang. Even I reconnect the iSCSI LUN, and mount operation
will also hang, the OCFS2 volume cannot be mounted anymore.

This may be a bug of OCFS2. Now I have to reboot the host to solve this problem, is the issue
had been solved or any other way to avoid it?

Thanks a lot!

Tony Zhang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20140226/85d34c4c/attachment.html