[Ocfs2-users] Reservation conflicts

Wed Dec 15 11:10:00 PST 2010

We use RDM in our (production) setup for almost 15months at its works
without problems.
Initially we used ocfs2, but the last month we switched to gfs2, as
oracle decided to stop fixing  bugs on redhat 5.x OS.

We have also tried vmware disk sharing  in lab environment (with gfs2),
and we have not experienced any problems. As far
i know, vmware disk sharing it's supposed to be a production quality
feature according to vmware.

akops

On 15/12/2010 8:17 ??, Sunil Mushran wrote:
> Ideally the scsi reservation error should be trapped by hypervisor/mgmt
> domain and should not bubble upto the guest. That is if vmfs is doing the
> reservation. Have you looked into the logs on all machines? See if there
> is a way to get vmfs to log that info.
>
> As far as RDM's goes, that's how I believe people use it. But you'll
> have to
> get confirmation from actual vmware users.
>
> On 12/15/2010 09:59 AM, brad hancock wrote:
>> We have never used RDM in the past due to backup reasons etc and VM
>> admins not having to deal the SAN admins. Do you think this would
>> resolve the issue?
>>  
>>
>>
>> On Tue, Dec 14, 2010 at 3:25 PM, Sunil Mushran
>> <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>> wrote:
>>
>>     I meant repeats 60 secs at a stretch. If not, as it seems so,
>>     then the messages
>>     should be only annoying.
>>
>>     VMFS uses SCSI Reservation to perform disk based locking. See if
>>     they have
>>     some logging in ESX that shows when a VMFS performs reserve/unreserve
>>     on a SCSI device. You'll have to look at the logs of all nodes.
>>     As in, that log
>>     will be on a different node than that that got this error.
>>
>>     BTW, any reason you are not using RDM.
>>
>>
>>     On 12/14/2010 12:51 PM, brad hancock wrote:
>>>     The issue does repeat. 
>>>
>>>     I looked through the vsphere 4.1, and the host logs and didn't
>>>     see anything weird that corresponds with these times. 
>>>
>>>     What is a reservation conflict? Can this issue cause the nodes
>>>     to see different data?
>>>
>>>
>>>     Dec 14 07:37:52 mdcvmsmes02 kernel: [351952.113847] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 07:37:52 mdcvmsmes02 kernel: [351952.113859] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 07:37:52 mdcvmsmes02 kernel: [351952.113868] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 07:37:52 mdcvmsmes02 kernel: [351952.114134]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 07:37:52 mdcvmsmes02 kernel: [351952.114379]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.233764] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.233775] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.233855] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.234112]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.234365]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.234789] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.234793] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.234796] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.235033]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 07:51:01 mdcvmsmes02 kernel: [352762.235273]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 09:23:15 mdcvmsmes02 kernel: [358423.734356] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 09:23:15 mdcvmsmes02 kernel: [358423.734366] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 09:23:15 mdcvmsmes02 kernel: [358423.734370] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 09:23:15 mdcvmsmes02 kernel: [358423.734620]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 09:23:15 mdcvmsmes02 kernel: [358423.734882]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.184302] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.184312] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.184316] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.184565]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.184809]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.188045] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.188045] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.188045] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.188045]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 10:25:27 mdcvmsmes02 kernel: [362254.188045]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062] sd 1:0:0:0:
>>>     reservation conflict
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062] sd 1:0:0:0:
>>>     [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062] end_request:
>>>     I/O error, dev sdb, sector 1735
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062]
>>>     (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5
>>>     Dec 14 10:33:08 mdcvmsmes02 kernel: [362727.621062]
>>>     (1882,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>     On Tue, Dec 14, 2010 at 11:38 AM, Sunil Mushran
>>>     <sunil.mushran at oracle.com <mailto:sunil.mushran at oracle.com>> wrote:
>>>
>>>         sd 1:0:0:0: reservation conflict
>>>
>>>         That's the cause of the error in the guest. You'll have to
>>>         track the error
>>>         to ESX's management domain. See the logs.
>>>
>>>         Does this error come repeatedly? This error is only a
>>>         problem for o2hb
>>>         if it continues for the next 60 secs. Else it can be ignored.
>>>
>>>
>>>         On 12/14/2010 07:20 AM, brad hancock wrote:
>>>>         The issue is starting to come up again. Both machines are
>>>>         logging the error a couple of minutes apart from each other.
>>>>
>>>>         sd 1:0:0:0: reservation conflict
>>>>         Dec 13 16:40:07 mdcvmsmes01 kernel: [295051.378262] sd
>>>>         1:0:0:0: [sdb] Result: hostbyte=DID_OK d
>>>>         Dec 13 16:40:07 mdcvmsmes01 kernel: [295051.378347]
>>>>         end_request: I/O error, dev sdb, sector 173
>>>>         Dec 13 16:40:07 mdcvmsmes01 kernel: [295051.378694]
>>>>         (0,1):o2hb_bio_end_io:225 ERROR: IO Error -
>>>>         Dec 13 16:40:07 mdcvmsmes01 kernel: [295051.379055]
>>>>         (1897,1):o2hb_do_disk_heartbeat:753 ERROR:
>>>>
>>>>         Should I open a bug report? Who with, VMware or Oracle?
>>>>
>>>>
>>>>
>>>>         On Sun, Dec 12, 2010 at 9:25 AM, brad hancock
>>>>         <braddhancock at gmail.com <mailto:braddhancock at gmail.com>> wrote:
>>>>
>>>>             Kevin,
>>>>             I modified the VMFS virtual disk to Independent, and I
>>>>             haven't seen the issue since the change Friday morning.
>>>>             I noticed this didn't work for you. I will continue to
>>>>             watch it and let the list know. The issue I saw after
>>>>             several weeks was the data was not in sync. Two nodes
>>>>             saw different data on the same OCFS2 drive. 
>>>>
>>>>             We have Vsphere 4.1, and HP EVA 3000 SAN.  
>>>>
>>>>             Thanks,
>>>>
>>>>
>>>>
>>>>             On Sat, Dec 11, 2010 at 10:41 AM,
>>>>             <kevin at utahsysadmin.com
>>>>             <mailto:kevin at utahsysadmin.com>> wrote:
>>>>
>>>>                 On Fri, 10 Dec 2010 06:26:06 -0800,
>>>>                 ocfs2-users-request at oracle.com
>>>>                 <mailto:ocfs2-users-request at oracle.com> wrote:
>>>>                 >
>>>>                 > My setup has the SCSI controller set to Physical
>>>>                 so the guest can be on
>>>>                 > different hosts, but I do not have the disk setup
>>>>                 as Independent. I am
>>>>                 > going
>>>>                 > to change that setting in VMware and see if it
>>>>                 makes a difference.
>>>>                 >
>>>>                 > > [2037805.922718] end_request: I/O error, dev
>>>>                 sdb, sector 1735
>>>>                 > > [2037805.922974] (0,0):o2hb_bio_end_io:225
>>>>                 ERROR: IO Error -5
>>>>                 > > [2037805.923370]
>>>>                 (27506,0):o2hb_do_disk_heartbeat:753 ERROR: status =
>>>>                 -5
>>>>
>>>>                 Brad,
>>>>
>>>>                 I have had the same issue for over a year on ESX
>>>>                 3.5 as well as on vSphere
>>>>                 4.0.  I have not tried yet on 4.1.  The error
>>>>                 occurs when I put the shared
>>>>                 disk on either SATA or FC LUNs on our SAN.  It also
>>>>                 doesn't matter if the
>>>>                 virtual machines are on the same physical host or
>>>>                 not (with independent
>>>>                 disks).  The only problem that has come from it is
>>>>                 the occasional reboot of
>>>>                 one of the VMs, which for me is tolerable.  I keep
>>>>                 hoping to upgrade to a
>>>>                 new SAN thinking that might fix it.  The vSphere
>>>>                 4.0 release IOPS
>>>>                 capability is higher than the SAN (it's 5 years
>>>>                 old) so I didn't think it
>>>>                 was VMware's fault.  If you have fairly new
>>>>                 hardware, maybe there is a real
>>>>                 bug somewhere.  I don't get I/O errors in any of my
>>>>                 other implementations
>>>>                 on this SAN.  I sent a post like yours to the list
>>>>                 when I first built it,
>>>>                 but never opened a bug report with either OCFS or
>>>>                 VMware.  If you create a
>>>>                 bug report I could add information from my
>>>>                 implementation as well.  (I
>>>>                 actually have two of these setups and they both
>>>>                 have the same errors.)
>>>>
>>>>                 Of course, if you find a solution, please post that
>>>>                 as well.
>>>>
>>>>                 Thanks,
>>>>                 Kevin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Ocfs2-users mailing list
>>>>         Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
>>>>         http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>>
>>
>>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101215/4c1d9c02/attachment-0001.html