[Ocfs2-devel] orphan cleanup

Joel Becker Joel.Becker at oracle.com
Thu Apr 30 11:53:46 PDT 2009


Srini,
	Ok, you can go ahead and cook up the background orphan cleaner.
Now, we can do this in a workqueue, a thread, or a timer.  I don't see
why a timer doesn't work.  When the timer fires, you do this:

1. Take EX on a new orphan_scan lock.
2. check the LVB for the last scan time.  If it's less than the scan
   timeout, reset the timer for (timeout - last scan), drop the EX, and
   exit.
3. Call ocfs2_queue_recovery_completion() for all slots with NULL, NULL,
   NULL on the non-orphan-dir arguments.  This sets up the orphan
   recovery.
4. Update the LVB with the current scan time.
5. Drop the EX to an NL.
6. Reset the timer for the scan timeout.

	Points about this scheme:

- Doesn't need a process.
- Don't need to change the locking protocol version, as older versions
  just ignore this problem.
- Ensures only one node runs the scan each timeout period.
- Uses our existing orphan recovery code unchanged.
- We don't need to keep a PR on the orphan scan lock.  It's just extra
  network traffic and downconvert processing we don't care about.
  Better to wake up once when our timeout fires than to wake up every
  time another node goes to make a scan.
- I realize that I've updated the scan time at the queue of the scan,
  not at the completion.  It doesn't really make much of a difference
  with many-minute scan periods, and it is a lot simpler than trying to
  add code to wait on all the orphans.

Joel

-- 

Life's Little Instruction Book #232

	"Keep your promises."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-devel mailing list