[Ocfs2-users] unmounted volumes
Sunil Mushran
Sunil.Mushran at oracle.com
Thu Sep 27 09:49:43 PDT 2007
Yes, the dlm_join_domain timed out. The other node may have
more information.
Such issues are best diagnosed when one files a bugzilla
with all the message files.
Charlie Sharkey wrote:
> Looking back through the log it appears that these devices never mounted
> at boot.
> See the timestamps at 10:33:28 and 10:34:01 (timeout joining dlm domain,
> unmounting device).
>
> It looks as though these two volumes are the first to attempt to mount.
> After the entry
> At 10:34:05 (connected to node ...), the remaining volumes mount ok
> (10:34:09 .....)
>
> Is this a startup timing issue that can be corrected by adjusting one of
> the values in
> the o2cb file (O2CB_HEARTBEAT_THRESHOLD ?).
>
> Thank you,
> cs
>
> Sep 21 10:32:57 CN2 kernel: OCFS2 Node Manager 1.2.5-SLES-r2997 Tue Mar
> 27 16:33:19 EDT 2007 (build sles)
> Sep 21 10:32:57 CN2 kernel: o2cb heartbeat: registered disk mode
> Sep 21 10:32:57 CN2 kernel: OCFS2 DLM 1.2.5-SLES-r2997 Tue Mar 27
> 16:33:19 EDT 2007 (build sles)
> Sep 21 10:32:57 CN2 kernel: OCFS2 DLMFS 1.2.5-SLES-r2997 Tue Mar 27
> 16:33:19 EDT 2007 (build sles)
> Sep 21 10:32:57 CN2 kernel: OCFS2 User DLM kernel interface loaded
> Sep 21 10:32:57 CN2 sshd[5482]: Server listening on 192.168.100.20 port
> 22.
> Sep 21 10:32:57 CN2 su: (to postgres) root on /dev/pts/4
> Sep 21 10:32:58 CN2 zmd: NetworkManagerModule (WARN): Failed to connect
> to NetworkManager
> Sep 21 10:32:59 CN2 zmd: Daemon (WARN): Not starting remote web server
> Sep 21 10:33:01 CN2 kernel: o2net: connected to node bustech-bu (num 0)
> at 192.168.200.10:7777
> Sep 21 10:33:05 CN2 kernel: OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19
> EDT 2007 (build sles)
> Sep 21 10:33:28 CN2 kernel: (5504,1):dlm_join_domain:1304 Timed out
> joining dlm domain 967C5C3174B341A399FAF031B4D544FE after 90400 msecs
> Sep 21 10:33:28 CN2 kernel: ocfs2: Unmounting device (253,5) on (node 1)
> Sep 21 10:33:28 CN2 multipathd: dm-5: umount map (uevent)
> Sep 21 10:33:29 CN2 kernel: o2net: no longer connected to node
> bustech-bu (num 0) at 192.168.200.10:7777
> Sep 21 10:33:33 CN2 kernel: o2net: connected to node bustech-bu (num 0)
> at 192.168.200.10:7777
> Sep 21 10:34:01 CN2 kernel: (5712,3):dlm_join_domain:1304 Timed out
> joining dlm domain 655FEE13B3604FCE8E780BA2F525EB6A after 90400 msecs
> Sep 21 10:34:01 CN2 kernel: ocfs2: Unmounting device (253,9) on (node 1)
> Sep 21 10:34:01 CN2 multipathd: dm-9: umount map (uevent)
> Sep 21 10:34:01 CN2 kernel: o2net: no longer connected to node
> bustech-bu (num 0) at 192.168.200.10:7777
> Sep 21 10:34:05 CN2 kernel: o2net: connected to node bustech-bu (num 0)
> at 192.168.200.10:7777
> Sep 21 10:34:09 CN2 kernel: ocfs2_dlm: Nodes in domain
> ("78267BA75A624D80ABF564DCC3294FFE"): 0 1
> Sep 21 10:34:09 CN2 kernel: (5741,0):ocfs2_find_slot:261 slot 1 is
> already allocated to this node!
> Sep 21 10:34:09 CN2 kernel: (5741,0):ocfs2_check_volume:1654 File system
> was not unmounted cleanly, recovering volume.
> Sep 21 10:34:09 CN2 kernel: kjournald starting. Commit interval 5
> seconds
> Sep 21 10:34:09 CN2 kernel: ocfs2: Mounting device (253,7) on (node 1,
> slot 1)
> Sep 21 10:34:14 CN2 kernel: ocfs2_dlm: Nodes in domain
> ("2A180DDE7D2A4D5A925DDB7714CF05BA"): 0 1
> Sep 21 10:34:14 CN2 kernel: (5752,0):ocfs2_find_slot:261 slot 1 is
> already allocated to this node!
> Sep 21 10:34:14 CN2 kernel: (5752,0):ocfs2_check_volume:1654 File system
> was not unmounted cleanly, recovering volume.
> Sep 21 10:34:14 CN2 kernel: kjournald starting. Commit interval 5
> seconds
> Sep 21 10:34:14 CN2 kernel: ocfs2: Mounting device (253,6) on (node 1,
> slot 1)
> Sep 21 10:34:18 CN2 kernel: ocfs2_dlm: Nodes in domain
> ("1FDC6873AA0A43E8872EDB64F2930C0E"): 0 1
> Sep 21 10:34:18 CN2 kernel: (5763,0):ocfs2_find_slot:261 slot 1 is
> already allocated to this node!
> Sep 21 10:34:18 CN2 kernel: (5763,0):ocfs2_check_volume:1654 File system
> was not unmounted cleanly, recovering volume.
> Sep 21 10:34:18 CN2 kernel: kjournald starting. Commit interval 5
> seconds
> Sep 21 10:34:18 CN2 kernel: ocfs2: Mounting device (253,8) on (node 1,
> slot 1)
>
>
>
> -----Original Message-----
> From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com]
> Sent: Wednesday, September 26, 2007 2:00 PM
> To: Charlie Sharkey
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] unmounted volumes
>
> Automatic umount?
>
> The messages do not indicate a umount. How did you detect that the
> volumes were umounted?
>
> As in, did you see "mount -t ocfs2" or did you also do "cat
> /proc/mounts".
>
> When ocfs2 umounts, it prints the umount message in syslog. I don't see
> that message.
>
> Charlie Sharkey wrote:
>
>>
>> I have seen a problem on a two node system where some (but not all) of
>>
>
>
>> the ocfs2 volumes on node 2 became unmounted. Correcting the problem
>> only required issuing the mount command, but I am curious if anyone
>> has an explanation of what may have happened. One thing I should
>> mention is that the hostname node 1 was changed about a week ago. At
>> that time corrections were made to the cluster.conf and hosts files of
>>
>
>
>> both nodes, and both machines were rebooted.
>>
>> Any ideas ?
>>
>> System Info:
>> -------------
>>
>> SuSe Sles10 SP1 2.6.16.46-0.12-smp
>>
>> /proc/fs/ocfs2/version
>> OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles)
>>
>> rpm -qa | grep ocfs2
>> ocfs2-tools-1.2.3-0.7
>> ocfs2console-1.2.3-0.7
>>
>> /etc/sysconfig/o2cb
>> #
>> # This is a configuration file for automatic startup of the O2CB #
>> driver. It is generated by running /etc/init.d/o2cb configure.
>> # Please use that method to modify this file # # O2CB_ENABELED: 'true'
>>
>
>
>> means to load the driver on boot.
>> O2CB_ENABLED=true
>>
>> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
>> O2CB_BOOTCLUSTER=ocfs2
>>
>> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered
>>
> dead.
>
>> #O2CB_HEARTBEAT_THRESHOLD=240 bti changed, 240 was default
>> O2CB_HEARTBEAT_THRESHOLD=90
>>
>> # O2CB_HEARTBEAT_MODE: Whether to use the native "kernel" or the
>>
> "user"
>
>> # driven heartbeat (for example, for integration with heartbeat 2.0.x)
>>
>> O2CB_HEARTBEAT_MODE="kernel"
>>
>> # O2CB_IDLE_TIMEOUT_MS: The time frame in which all cluster
>> memberships # for a given ocfs2 filesystem must be configured on all
>>
> nodes.
>
>> O2CB_IDLE_TIMEOUT_MS=120000
>>
>> # O2CB_RECONNECT_DELAY_MS: How long to wait for the other nodes to
>> recognise us O2CB_RECONNECT_DELAY_MS=2000
>>
>> # O2CB_KEEPALIVE_DELAY_MS: How often to remind our peers that we are
>> alive O2CB_KEEPALIVE_DELAY_MS=5000
>>
>>
>> Messages from CN2: (this is the node that had the unmounted volumes)
>>
>> Sep 25 05:45:01 CN2 run-crons[22005]: time.cron returned 1 Sep 25
>> 06:41:58 CN2 syslog-ng[3969]: STATS: dropped 0 Sep 25 06:45:01 CN2
>> run-crons[23956]: time.cron returned 1 Sep 25 07:41:58 CN2
>> syslog-ng[3969]: STATS: dropped 0 Sep 25 07:45:01 CN2
>> run-crons[25893]: time.cron returned 1 Sep 25 08:41:59 CN2
>> syslog-ng[3969]: STATS: dropped 0 Sep 25 08:45:01 CN2
>> run-crons[27833]: time.cron returned 1 Sep 25 09:42:00 CN2
>> syslog-ng[3969]: STATS: dropped 0 Sep 25 09:45:01 CN2
>> run-crons[29770]: time.cron returned 1 Sep 25 09:58:13 CN2 zmd:
>> NetworkManagerModule (WARN): Failed to connect to NetworkManager Sep
>> 25 09:58:19 CN2 zmd: Daemon (WARN): Not starting remote web server Sep
>>
>
>
>> 25 10:40:33 CN2 sshd[31688]: Accepted publickey for root from
>> 192.168.100.3 port 48567 ssh2
>> Sep 25 10:42:00 CN2 syslog-ng[3969]: STATS: dropped 0 Sep 25 10:43:25
>> CN2 sshd[31798]: Accepted publickey for root from
>> 192.168.100.3 port 48569 ssh2
>> Sep 25 10:43:54 CN2 kernel: ocfs2_dlm: Nodes in domain
>> ("967C5C3174B341A399FAF031B4D544FE"): 0 1 Sep 25 10:43:54 CN2 kernel:
>> (31849,1):ocfs2_find_slot:261 slot 1 is already allocated to this
>> node!
>> Sep 25 10:43:54 CN2 kernel: (31849,1):ocfs2_check_volume:1654 File
>> system was not unmounted cleanly, recovering volume.
>> Sep 25 10:43:54 CN2 kernel: kjournald starting. Commit interval 5
>> seconds Sep 25 10:43:54 CN2 kernel: ocfs2: Mounting device (253,5) on
>> (node 1, slot 1) Sep 25 10:43:58 CN2 kernel: ocfs2_dlm: Nodes in
>> domain
>> ("655FEE13B3604FCE8E780BA2F525EB6A"): 0 1 Sep 25 10:43:58 CN2 kernel:
>> (31860,1):ocfs2_find_slot:261 slot 1 is already allocated to this
>> node!
>> Sep 25 10:43:58 CN2 kernel: (31860,1):ocfs2_check_volume:1654 File
>> system was not unmounted cleanly, recovering volume.
>> Sep 25 10:43:59 CN2 kernel: kjournald starting. Commit interval 5
>> seconds Sep 25 10:43:59 CN2 kernel: ocfs2: Mounting device (253,9) on
>> (node 1, slot 1) Sep 25 10:45:01 CN2 run-crons[31894]: time.cron
>> returned 1
>>
>>
>> Messages from CN1: (this is the node whos hostname was changed. From:
>> CN1 to: bustech-bu)
>>
>> Sep 25 05:45:10 bustech-bu su: (to nobody) root on none Sep 25
>> 06:30:01 bustech-bu run-crons[31122]: time.cron returned 1 Sep 25
>> 06:42:38 bustech-bu syslog-ng[3901]: STATS: dropped 0 Sep 25 07:30:01
>> bustech-bu run-crons[597]: time.cron returned 1 Sep 25 07:42:38
>> bustech-bu syslog-ng[3901]: STATS: dropped 0 Sep 25 07:55:00
>> bustech-bu kernel: (6823,3):ocfs2_broadcast_vote:725
>> ERROR: status = -92
>> Sep 25 07:55:00 bustech-bu kernel: (6823,3):ocfs2_do_request_vote:798
>> ERROR: status = -92
>> Sep 25 07:55:00 bustech-bu kernel: (6823,3):ocfs2_rename:1099 ERROR:
>> status = -92
>> The above three messages continue over and over again.
>> Sep 25 07:55:10 bustech-bu kernel: (6823,0):ocfs2_broadcast_vote:725
>> ERROR: status = -92
>> Sep 25 07:55:10 bustech-bu kernel: (6823,0):ocfs2_do_request_vote:798
>> ERROR: status = -92
>> Sep 25 07:55:10 bustech-bu kernel: (6823,0):ocfs2_rename:1099 ERROR:
>> status = -92
>> Sep 25 08:30:01 bustech-bu run-crons[2562]: time.cron returned 1 Sep
>> 25 08:42:38 bustech-bu syslog-ng[3901]: STATS: dropped 7647 Sep 25
>> 09:30:01 bustech-bu run-crons[4528]: time.cron returned 1 Sep 25
>> 09:42:39 bustech-bu syslog-ng[3901]: STATS: dropped 0 Sep 25 10:30:01
>> bustech-bu run-crons[6600]: time.cron returned 1 Sep 25 10:40:55
>> bustech-bu sshd[7010]: Accepted publickey for root from
>> 192.168.100.3 port 42585 ssh2
>> Sep 25 10:42:39 bustech-bu syslog-ng[3901]: STATS: dropped 0 Sep 25
>> 10:43:54 bustech-bu kernel: ocfs2_dlm: Node 1 joins domain
>> 967C5C3174B341A399FAF031B4D544FE Sep 25 10:43:54 bustech-bu kernel:
>> ocfs2_dlm: Nodes in domain
>> ("967C5C3174B341A399FAF031B4D544FE"): 0 1 Sep 25 10:43:58 bustech-bu
>> kernel: ocfs2_dlm: Node 1 joins domain
>> 655FEE13B3604FCE8E780BA2F525EB6A Sep 25 10:43:58 bustech-bu kernel:
>> ocfs2_dlm: Nodes in domain
>> ("655FEE13B3604FCE8E780BA2F525EB6A"): 0 1
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list