[Ocfs2-users] ocfs2 hangs during webserver usage
Sunil Mushran
Sunil.Mushran at oracle.com
Wed Apr 25 11:51:48 PDT 2007
I am unclear as to the exact issue. As in, are process(es) on a
particular node hanging when accessing (a) ocfs2 volume or is the
problem cluster-wide. As in, whether processes on all nodes are
hanging.
Let me list some steps that will help you gather the relevant info.
Start with downloading the following:
# wget http://oss.oracle.com/~smushran/.debug/scripts/scanlocks
# wget http://oss.oracle.com/~smushran/.debug/scripts/dumplock
Run:
# ./scanlocks
This scripts merely runs the debugfs.ocfs2 command fs_locks for all
mounted ocfs2 volumes and prints the "Busy" locks. That the locks
are busy is in itself not a problem. The locks are transitioning
between states all the time. However, if scanlocks shows that lockres
as busy on more than one run, it does warrant a closer look. Run it
on all the nodes.
NOTE: If no locks show up as Busy, the chance that ocfs2_dlm is
the cause of the hang reduces to almost 0. But, never say never.
Next, do:
# alias psall='ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN '
# psall | grep D
This will show D state processes. D state indicates that that process
is in an uninterruptible sleep. The interesting problem here is
determining whether the D state process is leading to the "Busy" lockres
or vice-versa. As in, if ocfs2rec (recovery) shows up in D state,
it means that a recovery is in hung state which will cause all
normal dlm operations to hang. The solution in that case is figuring out
as to why the recovery is stuck. Prior to 1.2.5, one common reason
was the way we fenced - panic() would not always reboot the box leading to
a stuck recovery. But we don't have the same issue here.
The psall output will also show the kernel function name. But interpreting
that requires code knowledge. Note that ocfs2 may not be even involved.
I have triaged many "hang" issues where the reason turned out to be
an errant nfs server. On second thoughts, maybe start with this step so
that you know what processes are hanging which may narrow down the focus.
Once you have determined one or more lockres' on busy states are the
cause of the hang, do the following:
# dumplock \"<lockname>\" /dev/sdX
Note the < >. Replace lockname with the lockres name obtained above.
Do specify the correct device. Run the command on all nodes.
Ignore what it prints on the xterm. Instead look at /var/log/messages.
At this stage, you have to get us involved. The faq lists more about
reading the lockres, but it still needs some work.
File a bugzilla with all the information collected. Attach the
files created by scanlocks (/tmp/__fsl_<dev>). Attach the message files
from all nodes containg the dumplock output. psall output can also
be attached. (Preferably use attachments as they are easier to read.)
To answer your qs as to the number of lockres', that's normal. As in,
dlm on a node keeps track of lockres' the node is interested in.
In fs terms, if the node has not "stat-ed" the file, the dlm will not
create a lockres for it.
Sunil
Stephan Hendl wrote:
> Hi all,
>
> the last days I often got the problem that apache webservers running on a 4 node ocfs2 cluster with rhel4 (2.6.9-42.0.10.ELsmp) and ocfs2-1.2.5 got hanging ... The webservers are triggerd by a lvs load balancer. Three of the 4 nodes are answering the webserverrequests while on node is a socalled "administrative" node where new webserver data are sent to and so on.
>
> The most of the requests are read requests, only a few like php session files etc. are write requests. The slabinfo and meminfo are shown below. Once the filesystem hangs (nothing is reported in the messages file) there are many lockres entries in the debugfs output:
>
> 54331 lockres lines (27165 times "M", 27164 times D, 0 time "W", 1 time "R" and 1 time "S")
>
> Aafter rebooting he whole cluster there are 2845 lockres line only. How can I debug the problem in a more detailed way? We also habe theharmless ocfs2_file_sendfile errors "-104" and "-32" - sometime 5 to 10 times in a minute. The SAN devices are connected via device-mapper-multipath with no errors in the logfile.
>
> Since the swap space isn't used at any time I don't think that it could be a memory leak...
>
> Regards
> Stephan
>
> slabinfo - version: 2.0
> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> fib6_nodes 7 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0
> ip6_dst_cache 7 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
> ndisc_cache 1 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0
> rawv6_sock 6 11 704 11 2 : tunables 54 27 8 : slabdata 1 1 0
> udpv6_sock 1 11 704 11 2 : tunables 54 27 8 : slabdata 1 1 0
> tcpv6_sock 803 804 1216 3 1 : tunables 24 12 8 : slabdata 268 268 0
> ocfs2_lock 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0 0
> ocfs2_inode_cache 28659 39576 896 4 1 : tunables 54 27 8 : slabdata 9894 9894 0
> ocfs2_uptodate 692 1071 32 119 1 : tunables 120 60 8 : slabdata 9 9 0
> ocfs2_em_ent 25652 38735 64 61 1 : tunables 120 60 8 : slabdata 635 635 0
> dlmfs_inode_cache 1 7 576 7 1 : tunables 54 27 8 : slabdata 1 1 0
> dlm_mle_cache 0 0 320 12 1 : tunables 54 27 8 : slabdata 0 0 0
> ip_fib_alias 15 226 16 226 1 : tunables 120 60 8 : slabdata 1 1 0
> ip_fib_hash 15 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0
> dm_mpath 771 810 28 135 1 : tunables 120 60 8 : slabdata 6 6 0
> ext3_inode_cache 237241 239554 552 7 1 : tunables 54 27 8 : slabdata 34222 34222 0
> ext3_xattr 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0 0
> journal_handle 20 135 28 135 1 : tunables 120 60 8 : slabdata 1 1 0
> journal_head 263 648 48 81 1 : tunables 120 60 8 : slabdata 8 8 0
> revoke_table 14 290 12 290 1 : tunables 120 60 8 : slabdata 1 1 0
> revoke_record 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0 0
> dm_tio 1335 1582 16 226 1 : tunables 120 60 8 : slabdata 7 7 0
> dm_io 1334 1480 20 185 1 : tunables 120 60 8 : slabdata 8 8 0
> scsi_cmd_cache 8 20 384 10 1 : tunables 54 27 8 : slabdata 2 2 0
> qla2xxx_srbs 263 310 128 31 1 : tunables 120 60 8 : slabdata 10 10 0
> sgpool-128 32 33 2560 3 2 : tunables 24 12 8 : slabdata 11 11 0
> sgpool-64 32 33 1280 3 1 : tunables 24 12 8 : slabdata 11 11 0
> sgpool-32 32 36 640 6 1 : tunables 54 27 8 : slabdata 6 6 0
> sgpool-16 32 36 320 12 1 : tunables 54 27 8 : slabdata 3 3 0
> sgpool-8 35 60 192 20 1 : tunables 120 60 8 : slabdata 3 3 0
> unix_sock 102 140 512 7 1 : tunables 54 27 8 : slabdata 20 20 0
> ip_mrt_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0
> tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0
> tcp_bind_bucket 172 904 16 226 1 : tunables 120 60 8 : slabdata 4 4 0
> tcp_open_request 657 682 128 31 1 : tunables 120 60 8 : slabdata 22 22 0
> inet_peer_cache 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0 0
> secpath_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0
> xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
> ip_dst_cache 247 690 256 15 1 : tunables 120 60 8 : slabdata 46 46 0
> arp_cache 11 40 192 20 1 : tunables 120 60 8 : slabdata 2 2 0
> raw_sock 6 7 576 7 1 : tunables 54 27 8 : slabdata 1 1 0
> udp_sock 31 35 576 7 1 : tunables 54 27 8 : slabdata 5 5 0
> tcp_sock 385 413 1152 7 2 : tunables 24 12 8 : slabdata 59 59 0
> flow_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0
> mqueue_inode_cache 1 7 576 7 1 : tunables 54 27 8 : slabdata 1 1 0
> relayfs_inode_cache 0 0 348 11 1 : tunables 54 27 8 : slabdata 0 0 0
> isofs_inode_cache 0 0 372 10 1 : tunables 54 27 8 : slabdata 0 0 0
> hugetlbfs_inode_cache 1 11 344 11 1 : tunables 54 27 8 : slabdata 1 1 0
> ext2_inode_cache 0 0 488 8 1 : tunables 54 27 8 : slabdata 0 0 0
> ext2_xattr 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0 0
> dquot 0 0 144 27 1 : tunables 120 60 8 : slabdata 0 0 0
> eventpoll_pwq 750 856 36 107 1 : tunables 120 60 8 : slabdata 8 8 0
> eventpoll_epi 750 775 128 31 1 : tunables 120 60 8 : slabdata 25 25 0
> kioctx 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0
> kiocb 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0
> dnotify_cache 1 185 20 185 1 : tunables 120 60 8 : slabdata 1 1 0
> fasync_cache 0 0 16 226 1 : tunables 120 60 8 : slabdata 0 0 0
> shmem_inode_cache 293 324 444 9 1 : tunables 54 27 8 : slabdata 36 36 0
> posix_timers_cache 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0 0
> uid_cache 5 61 64 61 1 : tunables 120 60 8 : slabdata 1 1 0
> cfq_pool 64 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0
> crq_pool 0 0 40 96 1 : tunables 120 60 8 : slabdata 0 0 0
> deadline_drq 179 375 52 75 1 : tunables 120 60 8 : slabdata 5 5 0
> as_arq 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0 0
> blkdev_ioc 194 370 20 185 1 : tunables 120 60 8 : slabdata 2 2 0
> blkdev_queue 44 64 488 8 1 : tunables 54 27 8 : slabdata 8 8 0
> blkdev_requests 208 250 160 25 1 : tunables 120 60 8 : slabdata 10 10 0
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 8 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 8 : slabdata 52 52 0
> biovec-64 256 260 768 5 1 : tunables 54 27 8 : slabdata 52 52 0
> biovec-16 256 260 192 20 1 : tunables 120 60 8 : slabdata 13 13 0
> biovec-4 256 305 64 61 1 : tunables 120 60 8 : slabdata 5 5 0
> biovec-1 462 678 16 226 1 : tunables 120 60 8 : slabdata 3 3 60
> bio 431 527 128 31 1 : tunables 120 60 8 : slabdata 17 17 0
> file_lock_cache 24 82 96 41 1 : tunables 120 60 8 : slabdata 2 2 0
> sock_inode_cache 697 720 448 9 1 : tunables 54 27 8 : slabdata 80 80 0
> skbuff_head_cache 2096 2160 192 20 1 : tunables 120 60 8 : slabdata 108 108 0
> sock 7 20 384 10 1 : tunables 54 27 8 : slabdata 2 2 0
> proc_inode_cache 1672 1672 360 11 1 : tunables 54 27 8 : slabdata 152 152 0
> sigqueue 130 216 148 27 1 : tunables 120 60 8 : slabdata 8 8 0
> radix_tree_node 119663 131250 276 14 1 : tunables 54 27 8 : slabdata 9375 9375 0
> bdev_cache 33 42 512 7 1 : tunables 54 27 8 : slabdata 6 6 0
> mnt_cache 35 93 128 31 1 : tunables 120 60 8 : slabdata 3 3 0
> audit_watch_cache 0 0 48 81 1 : tunables 120 60 8 : slabdata 0 0 0
> inode_cache 1112 1144 344 11 1 : tunables 54 27 8 : slabdata 103 104 0
> dentry_cache 287138 307892 152 26 1 : tunables 120 60 8 : slabdata 11842 11842 0
> filp 2427 2520 192 20 1 : tunables 120 60 8 : slabdata 126 126 0
> names_cache 18 28 4096 1 1 : tunables 24 12 8 : slabdata 18 28 0
> avc_node 12 600 52 75 1 : tunables 120 60 8 : slabdata 8 8 0
> key_jar 10 31 128 31 1 : tunables 120 60 8 : slabdata 1 1 0
> idr_layer_cache 101 116 136 29 1 : tunables 120 60 8 : slabdata 4 4 0
> buffer_head 427352 706125 52 75 1 : tunables 120 60 8 : slabdata 9415 9415 0
> mm_struct 211 330 704 11 2 : tunables 54 27 8 : slabdata 30 30 0
> vm_area_struct 23032 23400 88 45 1 : tunables 120 60 8 : slabdata 520 520 52
> fs_cache 211 488 64 61 1 : tunables 120 60 8 : slabdata 8 8 0
> files_cache 207 306 448 9 1 : tunables 54 27 8 : slabdata 34 34 0
> signal_cache 275 460 192 20 1 : tunables 120 60 8 : slabdata 22 23 0
> sighand_cache 271 306 1344 3 1 : tunables 24 12 8 : slabdata 99 102 3
> task_struct 395 425 1408 5 2 : tunables 24 12 8 : slabdata 85 85 0
> anon_vma 1602 2034 16 226 1 : tunables 120 60 8 : slabdata 9 9 0
> pgd 214 476 32 119 1 : tunables 120 60 8 : slabdata 4 4 0
> pmd 630 669 4096 1 1 : tunables 24 12 8 : slabdata 630 669 16
> size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
> size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
> size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
> size-65536 2 2 65536 1 16 : tunables 8 4 0 : slabdata 2 2 0
> size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
> size-32768 5 5 32768 1 8 : tunables 8 4 0 : slabdata 5 5 0
> size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
> size-16384 7 7 16384 1 4 : tunables 8 4 0 : slabdata 7 7 0
> size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
> size-8192 15 15 8192 1 2 : tunables 8 4 0 : slabdata 15 15 0
> size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0
> size-4096 443 450 4096 1 1 : tunables 24 12 8 : slabdata 443 450 0
> size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0
> size-2048 574 576 2048 2 1 : tunables 24 12 8 : slabdata 288 288 0
> size-1620(DMA) 0 0 1664 4 2 : tunables 24 12 8 : slabdata 0 0 0
> size-1620 40 40 1664 4 2 : tunables 24 12 8 : slabdata 10 10 0
> size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
> size-1024 389 392 1024 4 1 : tunables 54 27 8 : slabdata 98 98 0
> size-512(DMA) 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0
> size-512 1966 2408 512 8 1 : tunables 54 27 8 : slabdata 301 301 0
> size-256(DMA) 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0
> size-256 91976 104355 256 15 1 : tunables 120 60 8 : slabdata 6957 6957 0
> size-128(DMA) 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0
> size-128 95488 142910 128 31 1 : tunables 120 60 8 : slabdata 4610 4610 0
> size-64(DMA) 0 0 64 61 1 : tunables 120 60 8 : slabdata 0 0 0
> size-64 17868 27572 64 61 1 : tunables 120 60 8 : slabdata 452 452 0
> size-32(DMA) 0 0 32 119 1 : tunables 120 60 8 : slabdata 0 0 0
> size-32 96180 108171 32 119 1 : tunables 120 60 8 : slabdata 909 909 15
> kmem_cache 165 165 256 15 1 : tunables 120 60 8 : slabdata 11 11 0
>
>
> [root at wcl3 ~]# cat meminfo
> MemTotal: 3984940 kB
> MemFree: 93700 kB
> Buffers: 256600 kB
> Cached: 1886060 kB
> SwapCached: 0 kB
> Active: 2722236 kB
> Inactive: 748132 kB
> HighTotal: 3104728 kB
> HighFree: 75712 kB
> LowTotal: 880212 kB
> LowFree: 17988 kB
> SwapTotal: 3911816 kB
> SwapFree: 3911656 kB
> Dirty: 584 kB
> Writeback: 0 kB
> Mapped: 1365184 kB
> Slab: 371708 kB
> CommitLimit: 5904284 kB
> Committed_AS: 4488308 kB
> PageTables: 27328 kB
> VmallocTotal: 106488 kB
> VmallocUsed: 8620 kB
> VmallocChunk: 97268 kB
> HugePages_Total: 0
> HugePages_Free: 0
> Hugepagesize: 2048 kB
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list