[Ocfs2-tools-devel] [RFC] changing default block device detection method and more

Mon Nov 26 15:52:39 PST 2007

[Sorry I took so long to respond.  Out of town and all.]

On Tue, Nov 20, 2007 at 08:05:24AM +0100, Fabio Massimo Di Nitto wrote:
> so our tools in general uses /proc/partitions to detect available block devices
> on the system. While this is quick and somewhat a not expensive operation, it
> has a major limitation of not discovering devices that are not exported to
> /proc/partitions. This generally fits well most of the users out there but of
> course not me :)

	We should, of course, detect available devices.  If they are
not in /proc/partitions, we need to fix.
	Do we know what devices don't appear there?  You mentioned that
some folks are saying they "won't add them", do you have the email
thread to reference?  I'd like to read up on it.
	The following is a bunch of observations, I don't think I know
what approach is best, other than that "/proc/partitions is missing some
items, we need to do something" is a driving statement.

> One of them could be to scan directly /dev for block devices and that is the
> same approach I used in another piece of software that was using
> /proc/partitions. The general experience is that it is still very fast to stat
> the bits we need, we catch all the devices around and sometimes a bit too many
> (for example all loop devices are also shown).

	Well, we want to catch *active* loop devices.
	The basic concern is what happens for expensive scans (many
non-active devices in /dev).  Some people keep a static /dev (turning
off udev).  That scan will be expensive, but perhaps we say "those
people can be ignored", as udev is the common case.  Debian (I don't know
about Ubuntu) boots with a static /dev, moving it to /dev/.static when
udev starts up.  On my laptop:

  # find /dev  -ls | grep -v '\/dev\/.static' | grep 'brw' | wc -l
  32
  # find /dev  -ls  | grep 'brw' | wc -l
  4566

This means that for Debian and perhaps Ubuntu, we need to know to avoid
/dev/.static in our scans.  Fabbione, have you accounted for this in
your code elsewhere?  Note that /dev/.static is a separate filesystem,
so you can check for that.
	Maybe we just ignore that case and assume the majority uses EL.
Checking EL4, it pre-populates md, loop, floppy, and ramdisk devices.
On a powerpc, the same find command gives 108 devices, only 13 of which
are in /proc/partitions.  Jumping to EL5, it only pre-populates the
ramdisk and loop devices, giving 53 in /dev and 29 in /proc/partitions
for this machine.
	Interestingly, /dev/scd0 is in /dev (thus triggered by ISBLK)
but not in /proc/partitions, but shows up in /sys as /sys/block/sr0.
That will be ugly to match whether we scan from /dev or /sys.  Which
leads to Fabbione's next point.

> Another solution could be to walk /sys/block that indeed exports only available
> block devices in the system. The problem is that /sys/block does not have any
> knowledge on how /dev is populated. So once we get to the major/minor
> information of each single device, we still have the issue to go around /dev to
> stat all block devices to match them with /sys/block entries.

	Actually, from that direction, you can just mknod a temporary
name with the device number (if you are just checking something).  But
yes, you'd have to match up as in the "sr0 <-> scd0" case above.
	In the reverse, if you scan /dev and then need to check /sys for
media type (it's really unsafe to blindly open tape or cd devices
according to Alan and Al, which is why we avoid them in the IDE case),
you have to do the search in reverse.
	I don't know which direction is best, and I don't have a
solution, but I figured food for thought.

> One general comment that I heard in a previous discussion is that scanning /dev
> is racy. This is true no matter if you start by /dev or by /sys/block as usually
> /sys/block is populated before the kernel has sent the hotplug even down to
> userland (like udev) and the /dev entry has not been created yet.

	You are absolutely right that it's all racy and we can mainly
ignore it.  If we *need* the scan to find something (we're expecting it
to appear), we can sleep and retry, or we can connect to the sysfs
netlink socket and wait for an add@/block/... message.

> (**) it's there as a matter of commodity. I am personally strongly against
> shipping copies of code that live in other upstream and it is more wise to ask
> the users to get them from the original upstream or distro maintainers rather
> than having to carry to burden to maintain them ourself. So no matter what path
> we take i will not touch this code as it will introduce a delta compared to
> upstream.

	If EL4 at some update level has a new enough blkid, we can drop
the internal one and depend on the external one.  If not, I'd rather
build the internal one than force EL4 users to get a not-from-the-distro
package.

> 2) the scanning code uses some kind of filtering to avoid work on media that are
> not disks. This generally needs some cleanup to fit with new system. For example
> it checks for cdrom only on IDE chains. Now we have cdrom that are exported as
> scsi devices due to libata changes in the kernel. I am pretty sure we want to
> rework those filters a bit to match the new reality. I already have done some
> work in this direction before for totally different reasons and all the
> information we need are exported in /sys/block so we can stop parsing bits in
> /proc in one go.

	Certainly we want to keep and expand the media checks.  No
argument here.

Joel

-- 

Life's Little Instruction Book #314

	"Never underestimate the power of forgiveness."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127