[Ocfs2-users] unmounted volumes

Sunil Mushran Sunil.Mushran at oracle.com
Wed Sep 26 11:00:01 PDT 2007


Automatic umount?

The messages do not indicate a umount. How did you detect that the volumes
were umounted?

As in, did you see "mount -t ocfs2" or did you also do "cat /proc/mounts".

When ocfs2 umounts, it prints the umount message in syslog. I don't
see that message.

Charlie Sharkey wrote:
>  
> I have seen a problem on a two node system where some (but not all) of
> the ocfs2 
> volumes on node 2 became unmounted. Correcting the problem only required
> issuing the
> mount command, but I am curious if anyone has an explanation of what may
> have 
> happened. One thing I should mention is that the hostname node 1 was
> changed
> about a week ago. At that time corrections were made to the cluster.conf
> and hosts
> files of both nodes, and both machines were rebooted. 
>
> Any ideas ?
>  
> System Info:
> -------------
>  
> SuSe Sles10 SP1    2.6.16.46-0.12-smp
>  
> /proc/fs/ocfs2/version
> OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles)
>
> rpm -qa | grep ocfs2
> ocfs2-tools-1.2.3-0.7
> ocfs2console-1.2.3-0.7
>
> /etc/sysconfig/o2cb
> #
> # This is a configuration file for automatic startup of the O2CB
> # driver. It is generated by running /etc/init.d/o2cb configure.
> # Please use that method to modify this file
> #
> # O2CB_ENABELED: 'true' means to load the driver on boot.
> O2CB_ENABLED=true
>
> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
> O2CB_BOOTCLUSTER=ocfs2
>
> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
>
> #O2CB_HEARTBEAT_THRESHOLD=240 bti changed, 240 was default
> O2CB_HEARTBEAT_THRESHOLD=90
>
> # O2CB_HEARTBEAT_MODE: Whether to use the native "kernel" or the "user"
> # driven heartbeat (for example, for integration with heartbeat 2.0.x)
>
> O2CB_HEARTBEAT_MODE="kernel"
>
> # O2CB_IDLE_TIMEOUT_MS: The time frame in which all cluster memberships
> # for a given ocfs2 filesystem must be configured on all nodes.
> O2CB_IDLE_TIMEOUT_MS=120000
>
> # O2CB_RECONNECT_DELAY_MS: How long to wait for the other nodes to
> recognise us
> O2CB_RECONNECT_DELAY_MS=2000
>
> # O2CB_KEEPALIVE_DELAY_MS: How often to remind our peers that we are
> alive
> O2CB_KEEPALIVE_DELAY_MS=5000
>
>
> Messages from CN2:   (this is the node that had the unmounted volumes)
>
> Sep 25 05:45:01 CN2 run-crons[22005]: time.cron returned 1
> Sep 25 06:41:58 CN2 syslog-ng[3969]: STATS: dropped 0
> Sep 25 06:45:01 CN2 run-crons[23956]: time.cron returned 1
> Sep 25 07:41:58 CN2 syslog-ng[3969]: STATS: dropped 0
> Sep 25 07:45:01 CN2 run-crons[25893]: time.cron returned 1
> Sep 25 08:41:59 CN2 syslog-ng[3969]: STATS: dropped 0
> Sep 25 08:45:01 CN2 run-crons[27833]: time.cron returned 1
> Sep 25 09:42:00 CN2 syslog-ng[3969]: STATS: dropped 0
> Sep 25 09:45:01 CN2 run-crons[29770]: time.cron returned 1
> Sep 25 09:58:13 CN2 zmd: NetworkManagerModule (WARN): Failed to connect
> to NetworkManager
> Sep 25 09:58:19 CN2 zmd: Daemon (WARN): Not starting remote web server
> Sep 25 10:40:33 CN2 sshd[31688]: Accepted publickey for root from
> 192.168.100.3 port 48567 ssh2
> Sep 25 10:42:00 CN2 syslog-ng[3969]: STATS: dropped 0
> Sep 25 10:43:25 CN2 sshd[31798]: Accepted publickey for root from
> 192.168.100.3 port 48569 ssh2
> Sep 25 10:43:54 CN2 kernel: ocfs2_dlm: Nodes in domain
> ("967C5C3174B341A399FAF031B4D544FE"): 0 1 
> Sep 25 10:43:54 CN2 kernel: (31849,1):ocfs2_find_slot:261 slot 1 is
> already allocated to this node!
> Sep 25 10:43:54 CN2 kernel: (31849,1):ocfs2_check_volume:1654 File
> system was not unmounted cleanly, recovering volume.
> Sep 25 10:43:54 CN2 kernel: kjournald starting.  Commit interval 5
> seconds
> Sep 25 10:43:54 CN2 kernel: ocfs2: Mounting device (253,5) on (node 1,
> slot 1)
> Sep 25 10:43:58 CN2 kernel: ocfs2_dlm: Nodes in domain
> ("655FEE13B3604FCE8E780BA2F525EB6A"): 0 1 
> Sep 25 10:43:58 CN2 kernel: (31860,1):ocfs2_find_slot:261 slot 1 is
> already allocated to this node!
> Sep 25 10:43:58 CN2 kernel: (31860,1):ocfs2_check_volume:1654 File
> system was not unmounted cleanly, recovering volume.
> Sep 25 10:43:59 CN2 kernel: kjournald starting.  Commit interval 5
> seconds
> Sep 25 10:43:59 CN2 kernel: ocfs2: Mounting device (253,9) on (node 1,
> slot 1)
> Sep 25 10:45:01 CN2 run-crons[31894]: time.cron returned 1
>
>
> Messages from CN1: (this is the node whos hostname was changed. From:
> CN1 to: bustech-bu)
>
> Sep 25 05:45:10 bustech-bu su: (to nobody) root on none
> Sep 25 06:30:01 bustech-bu run-crons[31122]: time.cron returned 1
> Sep 25 06:42:38 bustech-bu syslog-ng[3901]: STATS: dropped 0
> Sep 25 07:30:01 bustech-bu run-crons[597]: time.cron returned 1
> Sep 25 07:42:38 bustech-bu syslog-ng[3901]: STATS: dropped 0
> Sep 25 07:55:00 bustech-bu kernel: (6823,3):ocfs2_broadcast_vote:725
> ERROR: status = -92
> Sep 25 07:55:00 bustech-bu kernel: (6823,3):ocfs2_do_request_vote:798
> ERROR: status = -92
> Sep 25 07:55:00 bustech-bu kernel: (6823,3):ocfs2_rename:1099 ERROR:
> status = -92
>      The above three messages continue over and over again.
> Sep 25 07:55:10 bustech-bu kernel: (6823,0):ocfs2_broadcast_vote:725
> ERROR: status = -92
> Sep 25 07:55:10 bustech-bu kernel: (6823,0):ocfs2_do_request_vote:798
> ERROR: status = -92
> Sep 25 07:55:10 bustech-bu kernel: (6823,0):ocfs2_rename:1099 ERROR:
> status = -92
> Sep 25 08:30:01 bustech-bu run-crons[2562]: time.cron returned 1
> Sep 25 08:42:38 bustech-bu syslog-ng[3901]: STATS: dropped 7647
> Sep 25 09:30:01 bustech-bu run-crons[4528]: time.cron returned 1
> Sep 25 09:42:39 bustech-bu syslog-ng[3901]: STATS: dropped 0
> Sep 25 10:30:01 bustech-bu run-crons[6600]: time.cron returned 1
> Sep 25 10:40:55 bustech-bu sshd[7010]: Accepted publickey for root from
> 192.168.100.3 port 42585 ssh2
> Sep 25 10:42:39 bustech-bu syslog-ng[3901]: STATS: dropped 0
> Sep 25 10:43:54 bustech-bu kernel: ocfs2_dlm: Node 1 joins domain
> 967C5C3174B341A399FAF031B4D544FE
> Sep 25 10:43:54 bustech-bu kernel: ocfs2_dlm: Nodes in domain
> ("967C5C3174B341A399FAF031B4D544FE"): 0 1 
> Sep 25 10:43:58 bustech-bu kernel: ocfs2_dlm: Node 1 joins domain
> 655FEE13B3604FCE8E780BA2F525EB6A
> Sep 25 10:43:58 bustech-bu kernel: ocfs2_dlm: Nodes in domain
> ("655FEE13B3604FCE8E780BA2F525EB6A"): 0 1 
>
>  
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list