[Ocfs2-users] Ftp server... single file seems locked

Fri Apr 2 13:02:23 PDT 2010

Following the issue in bugtraq, we're guessing that the issue is
buried in the 'double locking' that OCFS2 does with it's sendfile
implementation.

I've disabled the use of sendfile within proftpd as a temporary measure.

However, I'm concerned about something else... The same problem
cropped up on the second host, which I had failed over to after this
morning.  I had left the first node up so we could do further
debugging/information gathering.  However, when I attempted to reboot
the node where the problem first occurred, the second node ALSO
decided to reboot.

Why did the other node in the cluster go down?

--Jason

On Fri, Apr 2, 2010 at 3:40 PM, Jason Price <japrice at gmail.com> wrote:
> For reference, my 'reader' process that's doing the spinlocking is in
> an R state.  The 'writer' process is in a D state, as are every other
> proftpd process that's attempting to get to that file.
>
> --Jason
>
> On Fri, Apr 2, 2010 at 3:12 PM, David Johle <djohle at industrialinfo.com> wrote:
>>
>> FWIW, I have seen a similar problem here on occasion, but with vsftpd
>> instead.
>>
>> When I run `ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN` I usually see
>> one node with a single vsftpd in D (uninterruptable I/O) state, and multiple
>> vsftpd processes on the other node, presumably waiting for the resource.
>>
>> I also believe this when multiple processes are trying to read & write the
>> same file via FTP.  And if left alone for a bit, other programs that may
>> read the same file will get hung waiting as well.  Mine are typically not
>> busy waits though, but I have seen a couple that were.
>>
>> Sometimes I will find that all is cleared and back to normal after a short
>> while (a timeout somewhere perhaps?).  Usually the only solution is to
>> reboot one or both nodes, which I have to instigate via kernel panic/self
>> fence because a normal shutdown also gets caught up by the non-killable
>> processes.
>>
>>
>> I need to get a netconsole set up to capture some stuff for the next time so
>> that I can add it to the bugzilla as well.
>>
>>
>> At 10:52 AM 4/2/2010, Jason Price wrote:
>>>
>>> Message: 1
>>> Date: Fri, 2 Apr 2010 11:38:24 -0400
>>> From: Jason Price <japrice at gmail.com>
>>> Subject: [Ocfs2-users] Ftp server... single file seems locked
>>> To: ocfs2-users at oss.oracle.com
>>> Message-ID:
>>>        <p2r83f15e31004020838o961f478cg19ae4f403631764 at mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> I'm setting up an HA ftp server (amongst other services).
>>>
>>> When two connections happen simultaneously, and (more specifically) the
>>> same
>>> user from two IP's attempt to access the same file (one for reading, and
>>> one
>>> for writing), the processes both hang.  And all subsequent attempts to
>>> either read or write the file fail.
>>>
>>> The two processes that seem to have caused the lock:
>>> user  24139  1657 Thu Apr  1 18:25:01 2010 proftpd: cbs -
>>> ::ffff:xxx.yyy.0.253: RETR prim_wo_img_dom.obs
>>> user  24142  1657 Thu Apr  1 18:25:01 2010 proftpd: cbs -
>>> ::ffff:xxx.yyy.103.208: STOR prim_wo_img_dom.obs
>>>
>>> (there are 49 other process trying to do the same things, but these are
>>> the
>>> first ones.)
>>
>