[Ocfs2-users] No space left on device in one node

Tue Jan 26 13:50:02 PST 2010

Hi Sunil!

Thanks for your reply, we’ll reduce the number of node slots to 2
and I subscribed to bug #1189 to see when a fix will be available.

Alex

Am 26.01.2010 um 19:12 schrieb Sunil Mushran:

> You are running into bz#1189.
> 
> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189
> 
> I'll be attaching a potential fix to that bugzilla soon.
> 
> In your case, you will be better off reducing the number of node slots
> from 4 to 3. Or maybe even 2 as drbd supports max 2 nodes.
> 
> Alexander Barton wrote:
> 
>> Hi!
>> 
>> We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about 4.3 GB free space on the OCFS2 filesystem using df on both nodes, but one node can't even write 10 MB:
>> 
>> df (ouput identical on both the nodes)
>> 
>>  $ df -k /cluster
>>  Filesystem           1K-blocks      Used Available Use% Mounted on
>>  /dev/drbd0            83883484  80071096   3812388  96% /cluster
>> 
>>  $ df -i /cluster
>>  Filesystem             Inodes    IUsed   IFree IUse% Mounted on
>>  /dev/drbd0           20970871 20017778  953093   96% /cluster
>> 
>> dd test on CL1-N1 -- FAILING:
>> 
>>  $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10
>>  dd: writing `cl1-n1.tst': No space left on device
>>  1+0 records in
>>  0+0 records out
>>  1032192 bytes (1,0 MB) copied, 1,56907 s, 658 kB/s
>> 
>> same dd test on CL1-N2 -- OK:
>> 
>>  $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10
>>  10+0 records in
>>  10+0 records out
>>  10485760 bytes (10 MB) copied, 1,58164 s, 6,6 MB/s
>> 
>> We are running Debian Linux. The problems occurred while running linux kernel 2.6.26 and according to <http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg03661.html> we hoped that it will be fixed using a newer kernel.
>> 
>> Therefore we upgraded to Linux kernel 2.6.32 (using Debian package linux-image-2.6.32-trunk-amd64_2.6.32-5_amd64.deb from sid), upgraded the userland tools to ocfs2-tools 1.4.3-1 and ran fsck.ocfs -fy (that showed no errors) — but the problem still persists: one node can't write data while the other one has no problems ...
>> 
>>  $ modinfo ocfs2
>>  filename:       /lib/modules/2.6.32-trunk-amd64/kernel/fs/ocfs2/ocfs2.ko
>>  license:        GPL
>>  author:         Oracle
>>  version:        1.5.0
>>  description:    OCFS2 1.5.0
>>  srcversion:     944B0B239B4DEBAF58A7FE1
>>  depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
>>  vermagic:       2.6.32-trunk-amd64 SMP mod_unload modversions 
>> (isn't the 1.5.0 version number a little bit strange here??)
>> 
>> "fsck.ocfs2 -f" doesn't show any errors at all.
>> Neither are any (kernel) messages logged.
>> 
>> I think this is similar to bug #1167 (http://oss.oracle.com/bugzilla/show_bug.cgi?id=1167) so I updated the information there as well and attached the output of the „stat_sysdir.sh“ script running on the failing node.
>> 
>> Do you have any idea what goes wrong here?
>> 
>> Any workarounds?
>> 
>> Anything we can test to help debug this issue?
>> 
>> Thanks
>> Alex