[Ocfs2-users] Any good idea about different time between nodes of ocfs2 cluster, thanks

Thu Jul 5 02:19:02 PDT 2012

Hi, everyone

I have a question about the OCFS2 when I using it.

I setup a cluster with two nodes vmc-4, and vmc-7.
The host vmc-4's system time is very different with the host vmc-7, about more than one day time. They all use the iSCSI storage target3.

The time is below:
root at vmc-7:~# date
Sun Jun 24 06:54:20 CST 2012
root at vmc-4:~# date
Mon Jun 25 09:53:35 CST 2012

When I am using vmc-4, and copying more then 200G data file to the directory /vms/target3 mounted on vmc-4 as ocfs2 with target3.
At the same time, I add the host of vmc-7 into the ocfs2 cluster with command, joining with vmc-4.
I execute the command on vmc-4:
o2cb_ctl -C -i -n vmc-7 -t node -a number=2 -a ip_address=192.168.0.7 -a ip_port=7100 -a cluster=iPool

I copy the /etc/ocfs2/cluster.conf file from vmc-4 to vmc-7, and startup the ocfs2 service using "service o2cb onload" and "service o2cb online".
And I mount the device of ocfs2 with /vms/target3 on the host of vmc-7.  mount -t ocfs2 /dev/sdb /vms/target3

And the result to see the file list on vmc-7 with debugfs.ocfs2 is interesting:

vmc-7:
debugfs: ls
        513             16   1    2  .
        513             16   2    2  ..
        1412353         24   10   2  lost+found
        1412354         16   3    1  111
        1412355         16   4    1  abbb
        1412356         20   7    1  BeiJing
        1412357         24   9    2  caibackup
        1412486         3764 8    1  caiWinXP
debugfs: slotmap
        Slot#   Node#
            1       2
debugfs: hb
        node: node              seq       generation checksum
           1:    1 000000004fe7c52e ea6ac3ff9a179e7d 1b8ea7f3
           2:    2 000000004fe64998 43909c20ab5bc319 8a18147e

debugfs: quit

root at vmc-7:~# mounted.ocfs2 -f
Device                FS     Nodes
/dev/sdb              ocfs2  vmc-7
/dev/sdc              ocfs2  Not mounted
/dev/sdd              ocfs2  Not mounted
root at vmc-7:~#

And at the same time, the result of the file list on vmc-4 with debugfs.ocfs2:

root at vmc-4:~# mount | grep ocfs
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sdc on /vms/target3 type ocfs2 (rw,_netdev,heartbeat=local)
root at vmc-4:~# debugfs.ocfs2 /dev/sdc
debugfs.ocfs2 1.6.3
debugfs: ls
        513             16   1    2  .
        513             16   2    2  ..
        1412353         24   10   2  lost+found
        1412354         16   3    1  111
        1412355         16   4    1  abbb
        1412356         20   7    1  BeiJing
        1412357         24   9    2  caibackup
        1412486         3764 8    1  caiWinXP
debugfs: quit

root at vmc-4:~# ls -al /vms/target3/
total 64238596
drwxr-xr-x  4 root root        3896 Jun 25 09:50 .
drwxr-xr-x 15 root root        4096 Jun 25 09:39 ..
-rw-------  1 root root     6291968 Jun 25 09:40 111
-rw-------  1 root root     6291968 Jun 25 09:40 abbb
-rw-------  1 root root    12582912 Jun 25 09:40 BeiJing
drwxr-xr-x  8 root root        3896 Jun 25 09:40 caibackup
-rw-------  1 root root  3495952384 Jun 25 09:41 caiWinXP
-rwxr--r--  1 root root 21550333952 Jun 25 09:45 convirture_ev
-rw-r--r--  1 root root 17825792000 Jun 25 09:48 convirture_ev-clone0
-rwxr--r--  1 root root  6744440832 Jun 25 09:49 convirture_ev-clone0-clone0
-rwxr--r--  1 root root  2153775104 Jun 25 09:49 convirture_ev-clone1
-rw-------  1 root root  1786773504 Jun 25 09:50 DEBIAN_WXS
-rw-r--r--  1 root root 12195987456 Jun 25 09:52 dvsc.img
drwxr-xr-x  2 root root        3896 Jun 25 09:37 lost+found
root at vmc-4:~#

Why so many files on vmc-4, and so small files on vmc-7 with debugfs.ocfs2 tools at the same time?

After unmounting on vmc-7 and remount the device on vmc-7, the vmc-7 host is panic for mount time out, the kernel log is below:
Jun 25 06:03:56 vmc-7 kernel: [ 9226.663878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 25 06:03:56 vmc-7 kernel: [ 9226.675662] mount.ocfs2     D ffffffff81806240     0  1214   1213 0x00000000
Jun 25 06:03:56 vmc-7 kernel: [ 9226.675668]  ffff880212529838 0000000000000082 ffff8802125297f8 ffffffffa04f156b

While I reboot the vmc-4, the vmc-4 is also panic.

And I adjust the time on vmc-7 a little time difference with vmc-4, about 10 seconds, and reboot them.
They are OK with OCFS2 cluster. I mean the file list is same with each other, and mounting operation is OK too.

This scenario can be repeated with two different time (a lot of time) nodes, I test several times and it always appears..

There are some questions about it as below:
Is it that the time different a lot(more than 24 hours) for vmc-4 and vmc-7?
What is the time different limit between different nodes of OCFS2?
What's the side-effect of the different time between nodes of ocfs2 cluster, and should all the nodes be set NTP for same time?

Thanks

-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20120705/0b246f2d/attachment-0001.html