[Ocfs2-users] ocfs2 hangs during webserver usage

Sunil Mushran Sunil.Mushran at oracle.com
Wed Apr 25 11:51:48 PDT 2007


I am unclear as to the exact issue. As in, are process(es) on a
particular node hanging when accessing (a) ocfs2 volume or is the
problem cluster-wide. As in, whether processes on all nodes are
hanging.

Let me list some steps that will help you gather the relevant info.

Start with downloading the following:

# wget http://oss.oracle.com/~smushran/.debug/scripts/scanlocks
# wget http://oss.oracle.com/~smushran/.debug/scripts/dumplock

Run:
# ./scanlocks
This scripts merely runs the debugfs.ocfs2 command fs_locks for all
mounted ocfs2 volumes and prints the "Busy" locks. That the locks
are busy is in itself not a problem. The locks are transitioning
between states all the time. However, if scanlocks shows that lockres
as busy on more than one run, it does warrant a closer look. Run it
on all the nodes.

NOTE: If no locks show up as Busy, the chance that ocfs2_dlm is
the cause of the hang reduces to almost 0. But, never say never.

Next, do:
# alias psall='ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN '
# psall | grep D

This will show D state processes. D state indicates that that process
is in an uninterruptible sleep. The interesting problem here is
determining whether the D state process is leading to the "Busy" lockres
or vice-versa. As in, if ocfs2rec (recovery) shows up in D state,
it means that a recovery is in hung state which will cause all
normal dlm operations to hang. The solution in that case is figuring out
as to why the recovery is stuck. Prior to 1.2.5, one common reason
was the way we fenced - panic() would not always reboot the box leading to
a stuck recovery. But we don't have the same issue here.

The psall output will also show the kernel function name. But interpreting
that requires code knowledge. Note that ocfs2 may not be even involved.
I have triaged many "hang" issues where the reason turned out to be
an errant nfs server. On second thoughts, maybe start with this step so
that you know what processes are hanging which may narrow down the focus.

Once you have determined one or more lockres' on busy states are the
cause of the hang, do the following:

# dumplock \"<lockname>\" /dev/sdX
Note the < >. Replace lockname with the lockres name obtained above.
Do specify the correct device. Run the command on all nodes.
Ignore what it prints on the xterm. Instead look at /var/log/messages.

At this stage, you have to get us involved. The faq lists more about
reading the lockres, but it still needs some work.

File a bugzilla with all the information collected. Attach the
files created by scanlocks (/tmp/__fsl_<dev>). Attach the message files
from all nodes containg the dumplock output. psall output can also
be attached. (Preferably use attachments as they are easier to read.)

To answer your qs as to the number of lockres', that's normal. As in,
dlm on a node keeps track of lockres' the node is interested in.
In fs terms, if the node has not "stat-ed" the file, the dlm will not
create a lockres for it.

Sunil

Stephan Hendl wrote:
> Hi all,
>
> the last days I often got the problem that apache webservers running on a 4 node ocfs2 cluster with rhel4 (2.6.9-42.0.10.ELsmp) and ocfs2-1.2.5 got hanging ... The webservers are triggerd by a lvs load balancer. Three of the 4 nodes are answering the webserverrequests while on node is a socalled "administrative" node where new webserver data are sent to and so on.
>
> The most of the requests are read requests, only a few like php session files etc. are write requests. The slabinfo and meminfo are shown below. Once the filesystem hangs (nothing is reported in the messages file) there are many lockres entries in the debugfs output:
>
> 54331 lockres lines (27165 times "M", 27164 times D, 0 time "W", 1 time "R" and 1 time "S")
>
> Aafter rebooting he whole cluster there are 2845 lockres line only. How can I debug the problem in a more detailed way? We also habe theharmless ocfs2_file_sendfile errors "-104" and "-32" - sometime 5 to 10 times in a minute. The SAN devices are connected via device-mapper-multipath with no errors in the logfile.
>
> Since the swap space isn't used at any time I don't think that it could be a memory leak...
>
> Regards
> Stephan
>
> slabinfo - version: 2.0
> # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> fib6_nodes             7    119     32  119    1 : tunables  120   60    8 : slabdata      1      1      0
> ip6_dst_cache          7     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0
> ndisc_cache            1     20    192   20    1 : tunables  120   60    8 : slabdata      1      1      0
> rawv6_sock             6     11    704   11    2 : tunables   54   27    8 : slabdata      1      1      0
> udpv6_sock             1     11    704   11    2 : tunables   54   27    8 : slabdata      1      1      0
> tcpv6_sock           803    804   1216    3    1 : tunables   24   12    8 : slabdata    268    268      0
> ocfs2_lock             0      0     16  226    1 : tunables  120   60    8 : slabdata      0      0      0
> ocfs2_inode_cache  28659  39576    896    4    1 : tunables   54   27    8 : slabdata   9894   9894      0
> ocfs2_uptodate       692   1071     32  119    1 : tunables  120   60    8 : slabdata      9      9      0
> ocfs2_em_ent       25652  38735     64   61    1 : tunables  120   60    8 : slabdata    635    635      0
> dlmfs_inode_cache      1      7    576    7    1 : tunables   54   27    8 : slabdata      1      1      0
> dlm_mle_cache          0      0    320   12    1 : tunables   54   27    8 : slabdata      0      0      0
> ip_fib_alias          15    226     16  226    1 : tunables  120   60    8 : slabdata      1      1      0
> ip_fib_hash           15    119     32  119    1 : tunables  120   60    8 : slabdata      1      1      0
> dm_mpath             771    810     28  135    1 : tunables  120   60    8 : slabdata      6      6      0
> ext3_inode_cache  237241 239554    552    7    1 : tunables   54   27    8 : slabdata  34222  34222      0
> ext3_xattr             0      0     48   81    1 : tunables  120   60    8 : slabdata      0      0      0
> journal_handle        20    135     28  135    1 : tunables  120   60    8 : slabdata      1      1      0
> journal_head         263    648     48   81    1 : tunables  120   60    8 : slabdata      8      8      0
> revoke_table          14    290     12  290    1 : tunables  120   60    8 : slabdata      1      1      0
> revoke_record          0      0     16  226    1 : tunables  120   60    8 : slabdata      0      0      0
> dm_tio              1335   1582     16  226    1 : tunables  120   60    8 : slabdata      7      7      0
> dm_io               1334   1480     20  185    1 : tunables  120   60    8 : slabdata      8      8      0
> scsi_cmd_cache         8     20    384   10    1 : tunables   54   27    8 : slabdata      2      2      0
> qla2xxx_srbs         263    310    128   31    1 : tunables  120   60    8 : slabdata     10     10      0
> sgpool-128            32     33   2560    3    2 : tunables   24   12    8 : slabdata     11     11      0
> sgpool-64             32     33   1280    3    1 : tunables   24   12    8 : slabdata     11     11      0
> sgpool-32             32     36    640    6    1 : tunables   54   27    8 : slabdata      6      6      0
> sgpool-16             32     36    320   12    1 : tunables   54   27    8 : slabdata      3      3      0
> sgpool-8              35     60    192   20    1 : tunables  120   60    8 : slabdata      3      3      0
> unix_sock            102    140    512    7    1 : tunables   54   27    8 : slabdata     20     20      0
> ip_mrt_cache           0      0    128   31    1 : tunables  120   60    8 : slabdata      0      0      0
> tcp_tw_bucket          0      0    128   31    1 : tunables  120   60    8 : slabdata      0      0      0
> tcp_bind_bucket      172    904     16  226    1 : tunables  120   60    8 : slabdata      4      4      0
> tcp_open_request     657    682    128   31    1 : tunables  120   60    8 : slabdata     22     22      0
> inet_peer_cache        0      0     64   61    1 : tunables  120   60    8 : slabdata      0      0      0
> secpath_cache          0      0    128   31    1 : tunables  120   60    8 : slabdata      0      0      0
> xfrm_dst_cache         0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
> ip_dst_cache         247    690    256   15    1 : tunables  120   60    8 : slabdata     46     46      0
> arp_cache             11     40    192   20    1 : tunables  120   60    8 : slabdata      2      2      0
> raw_sock               6      7    576    7    1 : tunables   54   27    8 : slabdata      1      1      0
> udp_sock              31     35    576    7    1 : tunables   54   27    8 : slabdata      5      5      0
> tcp_sock             385    413   1152    7    2 : tunables   24   12    8 : slabdata     59     59      0
> flow_cache             0      0    128   31    1 : tunables  120   60    8 : slabdata      0      0      0
> mqueue_inode_cache      1      7    576    7    1 : tunables   54   27    8 : slabdata      1      1      0
> relayfs_inode_cache      0      0    348   11    1 : tunables   54   27    8 : slabdata      0      0      0
> isofs_inode_cache      0      0    372   10    1 : tunables   54   27    8 : slabdata      0      0      0
> hugetlbfs_inode_cache      1     11    344   11    1 : tunables   54   27    8 : slabdata      1      1      0
> ext2_inode_cache       0      0    488    8    1 : tunables   54   27    8 : slabdata      0      0      0
> ext2_xattr             0      0     48   81    1 : tunables  120   60    8 : slabdata      0      0      0
> dquot                  0      0    144   27    1 : tunables  120   60    8 : slabdata      0      0      0
> eventpoll_pwq        750    856     36  107    1 : tunables  120   60    8 : slabdata      8      8      0
> eventpoll_epi        750    775    128   31    1 : tunables  120   60    8 : slabdata     25     25      0
> kioctx                 0      0    192   20    1 : tunables  120   60    8 : slabdata      0      0      0
> kiocb                  0      0    128   31    1 : tunables  120   60    8 : slabdata      0      0      0
> dnotify_cache          1    185     20  185    1 : tunables  120   60    8 : slabdata      1      1      0
> fasync_cache           0      0     16  226    1 : tunables  120   60    8 : slabdata      0      0      0
> shmem_inode_cache    293    324    444    9    1 : tunables   54   27    8 : slabdata     36     36      0
> posix_timers_cache      0      0    112   35    1 : tunables  120   60    8 : slabdata      0      0      0
> uid_cache              5     61     64   61    1 : tunables  120   60    8 : slabdata      1      1      0
> cfq_pool              64    119     32  119    1 : tunables  120   60    8 : slabdata      1      1      0
> crq_pool               0      0     40   96    1 : tunables  120   60    8 : slabdata      0      0      0
> deadline_drq         179    375     52   75    1 : tunables  120   60    8 : slabdata      5      5      0
> as_arq                 0      0     64   61    1 : tunables  120   60    8 : slabdata      0      0      0
> blkdev_ioc           194    370     20  185    1 : tunables  120   60    8 : slabdata      2      2      0
> blkdev_queue          44     64    488    8    1 : tunables   54   27    8 : slabdata      8      8      0
> blkdev_requests      208    250    160   25    1 : tunables  120   60    8 : slabdata     10     10      0
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    8 : slabdata    128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    8 : slabdata     52     52      0
> biovec-64            256    260    768    5    1 : tunables   54   27    8 : slabdata     52     52      0
> biovec-16            256    260    192   20    1 : tunables  120   60    8 : slabdata     13     13      0
> biovec-4             256    305     64   61    1 : tunables  120   60    8 : slabdata      5      5      0
> biovec-1             462    678     16  226    1 : tunables  120   60    8 : slabdata      3      3     60
> bio                  431    527    128   31    1 : tunables  120   60    8 : slabdata     17     17      0
> file_lock_cache       24     82     96   41    1 : tunables  120   60    8 : slabdata      2      2      0
> sock_inode_cache     697    720    448    9    1 : tunables   54   27    8 : slabdata     80     80      0
> skbuff_head_cache   2096   2160    192   20    1 : tunables  120   60    8 : slabdata    108    108      0
> sock                   7     20    384   10    1 : tunables   54   27    8 : slabdata      2      2      0
> proc_inode_cache    1672   1672    360   11    1 : tunables   54   27    8 : slabdata    152    152      0
> sigqueue             130    216    148   27    1 : tunables  120   60    8 : slabdata      8      8      0
> radix_tree_node   119663 131250    276   14    1 : tunables   54   27    8 : slabdata   9375   9375      0
> bdev_cache            33     42    512    7    1 : tunables   54   27    8 : slabdata      6      6      0
> mnt_cache             35     93    128   31    1 : tunables  120   60    8 : slabdata      3      3      0
> audit_watch_cache      0      0     48   81    1 : tunables  120   60    8 : slabdata      0      0      0
> inode_cache         1112   1144    344   11    1 : tunables   54   27    8 : slabdata    103    104      0
> dentry_cache      287138 307892    152   26    1 : tunables  120   60    8 : slabdata  11842  11842      0
> filp                2427   2520    192   20    1 : tunables  120   60    8 : slabdata    126    126      0
> names_cache           18     28   4096    1    1 : tunables   24   12    8 : slabdata     18     28      0
> avc_node              12    600     52   75    1 : tunables  120   60    8 : slabdata      8      8      0
> key_jar               10     31    128   31    1 : tunables  120   60    8 : slabdata      1      1      0
> idr_layer_cache      101    116    136   29    1 : tunables  120   60    8 : slabdata      4      4      0
> buffer_head       427352 706125     52   75    1 : tunables  120   60    8 : slabdata   9415   9415      0
> mm_struct            211    330    704   11    2 : tunables   54   27    8 : slabdata     30     30      0
> vm_area_struct     23032  23400     88   45    1 : tunables  120   60    8 : slabdata    520    520     52
> fs_cache             211    488     64   61    1 : tunables  120   60    8 : slabdata      8      8      0
> files_cache          207    306    448    9    1 : tunables   54   27    8 : slabdata     34     34      0
> signal_cache         275    460    192   20    1 : tunables  120   60    8 : slabdata     22     23      0
> sighand_cache        271    306   1344    3    1 : tunables   24   12    8 : slabdata     99    102      3
> task_struct          395    425   1408    5    2 : tunables   24   12    8 : slabdata     85     85      0
> anon_vma            1602   2034     16  226    1 : tunables  120   60    8 : slabdata      9      9      0
> pgd                  214    476     32  119    1 : tunables  120   60    8 : slabdata      4      4      0
> pmd                  630    669   4096    1    1 : tunables   24   12    8 : slabdata    630    669     16
> size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
> size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
> size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
> size-65536             2      2  65536    1   16 : tunables    8    4    0 : slabdata      2      2      0
> size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
> size-32768             5      5  32768    1    8 : tunables    8    4    0 : slabdata      5      5      0
> size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
> size-16384             7      7  16384    1    4 : tunables    8    4    0 : slabdata      7      7      0
> size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
> size-8192             15     15   8192    1    2 : tunables    8    4    0 : slabdata     15     15      0
> size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 : slabdata      0      0      0
> size-4096            443    450   4096    1    1 : tunables   24   12    8 : slabdata    443    450      0
> size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 : slabdata      0      0      0
> size-2048            574    576   2048    2    1 : tunables   24   12    8 : slabdata    288    288      0
> size-1620(DMA)         0      0   1664    4    2 : tunables   24   12    8 : slabdata      0      0      0
> size-1620             40     40   1664    4    2 : tunables   24   12    8 : slabdata     10     10      0
> size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 : slabdata      0      0      0
> size-1024            389    392   1024    4    1 : tunables   54   27    8 : slabdata     98     98      0
> size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 : slabdata      0      0      0
> size-512            1966   2408    512    8    1 : tunables   54   27    8 : slabdata    301    301      0
> size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 : slabdata      0      0      0
> size-256           91976 104355    256   15    1 : tunables  120   60    8 : slabdata   6957   6957      0
> size-128(DMA)          0      0    128   31    1 : tunables  120   60    8 : slabdata      0      0      0
> size-128           95488 142910    128   31    1 : tunables  120   60    8 : slabdata   4610   4610      0
> size-64(DMA)           0      0     64   61    1 : tunables  120   60    8 : slabdata      0      0      0
> size-64            17868  27572     64   61    1 : tunables  120   60    8 : slabdata    452    452      0
> size-32(DMA)           0      0     32  119    1 : tunables  120   60    8 : slabdata      0      0      0
> size-32            96180 108171     32  119    1 : tunables  120   60    8 : slabdata    909    909     15
> kmem_cache           165    165    256   15    1 : tunables  120   60    8 : slabdata     11     11      0
>
>
> [root at wcl3 ~]# cat meminfo
> MemTotal:      3984940 kB
> MemFree:         93700 kB
> Buffers:        256600 kB
> Cached:        1886060 kB
> SwapCached:          0 kB
> Active:        2722236 kB
> Inactive:       748132 kB
> HighTotal:     3104728 kB
> HighFree:        75712 kB
> LowTotal:       880212 kB
> LowFree:         17988 kB
> SwapTotal:     3911816 kB
> SwapFree:      3911656 kB
> Dirty:             584 kB
> Writeback:           0 kB
> Mapped:        1365184 kB
> Slab:           371708 kB
> CommitLimit:   5904284 kB
> Committed_AS:  4488308 kB
> PageTables:      27328 kB
> VmallocTotal:   106488 kB
> VmallocUsed:      8620 kB
> VmallocChunk:    97268 kB
> HugePages_Total:     0
> HugePages_Free:      0
> Hugepagesize:     2048 kB
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list