[Ocfs2-tools-devel] [PATCH] ocfs2_controld: Fail-fast when leaving a mountgroup unexpectedly.
Joel Becker
Joel.Becker at oracle.com
Fri Aug 22 16:42:39 PDT 2008
When ocfs2_controld leaves a group unexpectedly (notified from cpg,
unclean exit), it currently sends node down events to the filesystem.
This doesn't work, because the filesystem BUG()s when given the local
node number.
A better solution is to fail-fast. Exit right when we discover we are
leaving a live group unexpectedly. If the group isn't live, clean up
our state and continue.
Signed-off-by: Joel Becker <joel.becker at oracle.com>
---
ocfs2_controld/mount.c | 42 ++++++++++++++++++++++++++++--------------
1 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/ocfs2_controld/mount.c b/ocfs2_controld/mount.c
index 0bf9ed0..11c3e08 100644
--- a/ocfs2_controld/mount.c
+++ b/ocfs2_controld/mount.c
@@ -17,6 +17,7 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <limits.h>
+#include <unistd.h>
#include <inttypes.h>
#include <errno.h>
#include <syslog.h>
@@ -437,18 +438,11 @@ static void mount_node_down(int nodeid, void *user_data)
dlmcontrol_node_down(mg->mg_uuid, nodeid);
}
-static void force_node_down(int nodeid, void *user_data)
-{
- struct mountgroup *mg = user_data;
-
- log_error("Forcing node %d down in group %s", nodeid, mg->mg_uuid);
- mount_node_down(nodeid, mg);
-}
-
static void finish_leave(struct mountgroup *mg)
{
struct list_head *p, *n;
struct service *ms;
+ struct timespec ts;
if (list_empty(&mg->mg_services) &&
mg->mg_ms_in_progress) {
@@ -463,13 +457,33 @@ static void finish_leave(struct mountgroup *mg)
goto out;
}
- /* This leave is unexpected */
-
+ /*
+ * This leave is unexpected. If we weren't part of the group, we
+ * just cleanup our state. However, if we were part of a group, we
+ * cannot safely continue and must die. Fail-fast allows other
+ * nodes to make a decision about us.
+ */
log_error("Unexpected leave of group %s", mg->mg_uuid);
- if (mg->mg_group)
- for_each_node(mg->mg_group, force_node_down, mg);
- else
- log_error("No mg_group for group %s", mg->mg_uuid);
+
+
+ if (mg->mg_group) {
+ log_error("Group %s is live, exiting", mg->mg_uuid);
+
+ /*
+ * The _exit(2) may cause a reboot, and we want the errors
+ * to hit syslogd(8). We can't call sync(2) which might
+ * sleep on an ocfs2 operation. I'd say sleeping for 10ms
+ * is a good compromise. Local syslogd(8) won't have time
+ * to write to disk, but a network syslogd(8) should get
+ * the data.
+ */
+ ts.tv_sec = 0;
+ ts.tv_nsec = 10000000;
+ nanosleep(&ts, NULL);
+ _exit(1);
+ }
+
+ log_error("No mg_group for group %s", mg->mg_uuid);
list_for_each_safe(p, n, &mg->mg_services) {
ms = list_entry(p, struct service, ms_list);
--
1.5.6.3
--
"Heav'n hath no rage like love to hatred turn'd, nor Hell a fury,
like a woman scorn'd."
- William Congreve
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
More information about the Ocfs2-tools-devel
mailing list