[Ocfs2-users] re: o2hb_do_disk_heartbeat:963 ERROR: Device "sdb1" another node is heartbeating in our slot!

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Fri Mar 16 13:35:03 PDT 2007


Btw, upgrade kernel to #283; 282 had a serious bug in OCFSv2 (relaying to
the simultaneous append t the file).

Another story - try to keep CSR and CSS files out of OCFSv2. reason is that
keeping CRS files on OCFS, you de facto keep
one cluster (CRS) depending of another (OCFS), which can influence CRS
decisions in a faulrty situations.

(It's usually simple to create 2 more partitions or LUN's for OCRFile and
CSSFile - 102MB and 22MB each).

What's about your case - these experiments could really broke heartbeat (did
you allowed access to the same disks from these new
experimental servers?)


----- Original Message ----- 
From: "Peter Santos" <psantos at cheetahmail.com>
To: <ocfs2-users at oss.oracle.com>
Sent: Friday, March 16, 2007 1:04 PM
Subject: [Ocfs2-users] re: o2hb_do_disk_heartbeat:963 ERROR: Device "sdb1"
another node is heartbeating in our slot!


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Folks,
>
> I'm trying to wrap my head around something that happened in our
environment.
> Basically, we noticed the error in /var/log/messages with no other errors.
>
> "Mar 16 13:38:02 dbo3 kernel: (3712,3):o2hb_do_disk_heartbeat:963 ERROR:
Device "sdb1": another node is
>       heartbeating in our slot!"
> Usually there are a number of other errors, but this one was it.
>
> Our RAC cluster is made up of 3 nodes (dbo1,dbo2,dbo3) and they use ocfs2
for the ocr /voting file, but
> ASM is where the datafiles are located. This is suse9 kernel 282.
>
>
> A while back one of our SA's was trying to install ocfs2 on a couple of
red-hat machines, and didn't properly
> configure ocfs2 to add the nodes. I believe he just copied directories and
the /etc/ocfs2/cluster.conf file.
> Anyway, when he turned the machines on today, they were still mis
configured and I believe that is the
> cause of the error message "another node is heartbeating in our slot"
message? would you agree ?
>
> As I mentioned there are only 3 nodes in our cluster, but the
/etc/cluster.conf file shows 6 and so does the
> following:
> oracle at dbo1:/etc/ocfs2> ls /config/cluster/ocfs2/node/
> dbo1  dbo2  dbo3  dbo4  dbt3  dbt4
>
> So my question, is how do I permanently remove dbt3, dbt4 and dbo4 ? I
checked out the ocfs2 guide, but it only
> has information on adding a node to both an online/offline cluster.
>
>
> More importantly is how the oracle clusterware behaved.  After this
happened, my ASM and RDBMS instances stayed
> up. None of the machines rebooted. But the CRS deamon appears to be having
issues.
>
> When I run "crsctl check crs" on all 3 nodes, I get the error "Cannot
communicate with CRS" on all 3 nodes.
> The cssd log directory has a core file .. yet I can log into all 3
database instances as if nothing happened.
>
> I suspect this is a bug?
>
> The CRSD log files reveal some sort of issue relating to problems writing
to the ocr file ..which is on ocfs2. But
> if there really was a problem, wouldn't ocfs2 have rebooted the machine?
And when RAC has a problem accessing the ocfs2
> volume, there are usually a large number of io errors in the system log
>
>
> Any insight is greatly appreciated.
>
> - -peter
>
>
> alertdbo3.log
> =============
> 2007-03-16 13:38:25.471
> [crsd(4994)]CRS-1006:The OCR location /ocfs2/oracrs/ocr.crs is
inaccessible. Details in
>      /data/app/crs/oracle/product/10.2.0/crs/log/dbo3/crsd/crsd.log.
>
> 2007-03-16 13:38:43.377
> [client(13125)]CRS-1006:The OCR location /ocfs2/oracrs/ocr.crs is
inaccessible. Details in
>        /data/app/crs/oracle/product/10.2.0/crs/log/dbo3/client/css.log.
>
>
> crsd.log
> =============
> 2007-03-16 13:38:11.708: [  OCRCLI][1407371616]proac_set_value: Response
message returned with failure keyname =
> [CRS.CUR.ora!ORACTAH!ORACTAH3!inst.REASON], retcode = 26
> 2007-03-16 13:38:11.710: [  OCRCLI][1417865568]proac_set_value: Response
message returned with failure keyname =
> [CRS.CUR.ora!dbo3!LISTENER_DBO3!lsnr.REASON], retcode = 26
> 2007-03-16 13:38:24.159: [  OCRMSG][1407371616]prom_rpc: CLSC recv
failure..ret code 7
> 2007-03-16 13:38:24.159: [  OCRMSG][1407371616]prom_rpc: possible OCR
retry scenario
> 2007-03-16 13:38:24.159: [ COMMCRS][1417865568]clscsendx: (0xc80100)
Physical connection (0xc7fa30) not active
>
> 2007-03-16 13:38:24.159: [  OCRMSG][1417865568]prom_rpc: CLSC send
failure..ret code 11
> 2007-03-16 13:38:24.159: [  OCRMSG][1417865568]prom_rpc: possible OCR
retry scenario
> 2007-03-16 13:38:25.036: [  OCRMAS][1182845280]th_master:13: I AM THE NEW
OCR MASTER at incar 3. Node Number = 3
> 2007-03-16 13:38:25.046: [  OCRRAW][1182845280]proprioo: for disk 0
(/ocfs2/oracrs/ocr.crs), id match (1), my id set
> (1201294405,1028247821) total id sets (1), 1st set
(1201294405,1028247821), 2nd set (0,0) my votes (2), total votes (2)
> 2007-03-16 13:38:25.102: [  OCRRAW][1182845280]rrecover:3: recovery
required
> 2007-03-16 13:38:25.471: [  OCRRAW][1182845280]rtnode:3: invalid tnode
1085
> 2007-03-16 13:38:25.471: [  OCRRAW][1182845280]propropen:0: could not read
tnode addrd=0
> 2007-03-16 13:38:25.471: [  OCRRAW][1182845280]proprseterror: Error in
accessing physical storage [26] Marking context
> invalid.
> 2007-03-16 13:38:25.471: [  OCRUTL][1182845280]u_freem: INVALID
PROU_BEGIN_MEMTAG for memory [99351708] Begin tag
> [99351170] Expected begin tag [5072426d]
> [  OCRMAS][1182845280]th_calc_av:8.1': Error reading key
[SYSTEM.version.node_numbers.node3]
> 2007-03-16 13:38:25.471: [  OCRMAS][1182845280]th_master:9: Shutdown
CacheMaster. prev AV [169869824] new calc av
> [169869824] my sv [169869824]2007-03-16 13:38:39.932: [
CRSOCR][1438853472]0OCR api procr_open_key failed for key
> CRS.CUR. OCR error code = 3 OCR error msg:
> 2007-03-16 13:38:39.932: [  CRSOCR][1438853472][PANIC]0Failed to open key:
CRS.CUR(File: caaocr.cpp, line: 472)
>
>
> * The cssd directory has a core file, but nothing in the ocssd.log file.
>
>
>
>
>
>
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFF+vg0oyy5QBCjoT0RAkemAJ9NSS2e9gndC62WErJlgr82aAwuZwCgjfk8
> xFtWactcUf2LcoUKLexmaPQ=
> =Av6M
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>




More information about the Ocfs2-users mailing list