[BDB 11gR2 Beta] Fwd: Potential migration issues with BDB locking model [#18275]

Wed Mar 3 09:07:05 PST 2010

Michael,

Excellent write up from Mike Owens, one of our Beta testers. It's long, 
but well worth reading.

After a quick scan, my takeaways from this are:

   1. We need to make sure that deadlock detection is configured by
      default for people who are using the SQL API and that the errors
      that are returned to the API make sense within their frame of
      reference so that they don't get too surprised by blocking/error
      behavior that they don't expect/didn't code for. Can you review
      and comment on Mike's observations, especially towards the end?
      Are we keeping in line with what a SQLite user would expect to see
      if they actually had finer grained locking?
   2. Steve, some of this content is what needs to be in the
      documentation. Obviously not the first thing that they read, but a
      discussion of locking models and how SQLite applications are
      affected should be in the documentation somewhere.

Regards,

Dave

-------- Original Message --------
Subject: 	Potential migration issues with BDB locking model
Date: 	Wed, 3 Mar 2010 09:13:44 -0600
From: 	Mike Owens <mikeowens at gmail.com>
To: 	Dave Segleau <dave.segleau at oracle.com>, Sleepycat Software Support 
<support at sleepycat.com>

Dave,

Here is an overview of the issue I talked to you about the other day in detail.

SQLite's concurrency model is different than most other databases. Some common
patterns you use with other databases may not work like you expect in SQLite.
To write good code, you must understand the transaction model, the locking
model, cursors and how they all relate to each other. This is the most important
thing to understand in programming SQLite when concurrency is involved.

SQLite uses database level locking, which is implemented using file locking on
the database file. It keeps three different file locks to implement six lock
states. In order to accommodate its locking model from the SQL level, SQLite has
unique transaction semantics

BEGIN             -->  Read lock
BEGIN IMMEDIATE   -->  "Modify" lock, called a RESERVED lock
BEGIN EXCLUSIVE   -->  Write lock

There can be multiple readers but only one writer. The latter two are used as a
kind of contract. If you can get them to complete (not get back SQLITE_BUSY),
then you are guaranteed to be able to start modifying and eventually write to
the database. By modifying, I mean changing data in form of modifying pages in
the page cache, which are not committed to disk (this would be writing
pages which
is done only in EXCLUSIVE).

There is a specific protocol a programmer needs to follow in SQLite to avoid
deadlocks when writing applications that modify the contents of a
database. Typically, it is best to start with the best transaction that supports
the job are doing. If all you plan on doing in reading, then a simple
BEGIN will do (or none at all whereby you run in autocommit mode). If you plan
on modifying the database (INSERT, UPDATE, DELETE), then you should
start with BEGIN IMMEDIATE. By doing this, you will wait on the
correct locks in a
deterministic way and when you obtain them -- and more importantly you will
know the state of the database so as to avoid deadlocks.

For example, here is a deadlock scenario. Say you have two concurrent programs X
and Y connected to the same database, and neither knows anything about the
other. X gets a READ lock. Y gets a RESERVED lock (READ and RESERVED can
coexist -- at this point you have potentially multiple readers but just one
"modifier"). X decides it wants to do an UPDATE. So it tries, but is
unsuccessful because Y has the only available RESERVED lock. Y decides it wants
to commit its changes to disk and tries a COMMIT, which attempts to get an
EXCLUSIVE lock. Y's attempt fails because it cannot got an EXCLUSIVE lock while
there are any read locks on the database. Both X and Y just take the brute force
approach and decide to just keep retrying in an endless loop until their
operation succeeds -- X it's UPDATE and Y it's COMMIT. They are now
deadlocked. X
has a READ lock and will never get RESERVED because Y has it. Y has RESERVED but
will never get EXCLUSIVE because it's blocked by X's READ lock.

The solution is simple: they follow a simple protocol. X should never
start with a BEGIN if
it intends to modify. It should start with a BEGIN IMMEDIATE. By doing
so, when it
fails (because Y has it), it falls back to an UNLOCKED state (does not
hold a READ lock
on the database) thus making it possible for any connection in
RESERVED to go to EXCLUSIVE and complete. Thus in this protocol, a RESERVED
connection can busywait safely because it assumes that all other potential
writers will start by attempting RESERVED and fail back into an
unlocked state which
does not interfere by introducing inhibitory READ locks. This protocol
thus provides
connections with a means of getting out of each others way.

The point in all this is that both BEGIN IMMEDIATE and BEGIN EXCLUSIVE are
contracts that guarantee the state of the database. A writer can assume that
once he has either of these locks that he can proceed in a specific
way according to this
protocol and complete work without encountering deadlock (assuming
everyone else plays along and follows the proper protocol as well).

While Berkely DB allows for finer grain locking, it's lock model changes these
semantics. In BDB, both X and Y can get a RESERVED lock as long as they are not
trying to modify the same page. In order to see this, just open two
instance of dbsql and do
the following in parallel, instruction by instruction in each session:

Session 1:

dbsql>  create table a(x int);
dbsql>  begin immediate;
dbsql>  insert into a values (1);
dbsql>  commit;

Session 2:

dbsql>  create table b(x int);
dbsql>  begin immediate;
dbsql>  insert into b values (1);
dbsql>  commit;

No problem -- both sessions operated concurrently after successfully
starting a reserved
transaction at the same time. Two readers working simultaneously. It
worked because
they did not operate on the same pages. Now try this in both:

dbsql>  begin immediate;
dbsql>  insert into a values (2);

The second session will block.

In this case, the block will resolve once the first session COMMITs.
But what would
happen if the first session tries to update a page already locked by the second
session -- it would deadlock. Here is the scenario:

Session 1:

dbsql>  begin immediate;
dbsql>  insert into a values (3);
dbsql>  insert into b values (3);
Error: database table is locked

Session 2:

dbsql>  begin immediate;
dbsql>  insert into b values (3);
dbsql>  insert into a values (3);

We see that see that session 1 gets an error. But in code, if this is just an
SQLITE_LOCKED, how does is interpret it any differently?

This leads programs written for SQLite into trouble in that they would assume
that a brute force model is acceptable from RESERVED should they run into
SQLITE_LOCKED. But in BDB they could in fact they could be in a deadlock
as you effectively have two writers fighting over the same page and
don't know it.

The locking model in BDB changes from one reader / multiple writers to
multiple readers/writers. This is not necessarily a bad thing, but a new
protocol for handling SQLITE_BUSY in BDB must be clearly defined to avoid
deadlocks in BDB that wouldn't happen in SQLite. What does an application do
now if it successfully gets RESERVED and runs into SQLITE_LOCKED?

One thing I do see however that may easily explain/resolve all of this
is whether
the behavior in BDB mirrors SQLite in shared cache mode. I have been looking
through the source and it seems more and more that it might and therefore
SQLITE_LOCK might simply be interpreted as in shared cache mode, in which
case the behavior would always  be to rollback the transaction and retry.
If that is true, then the protocol would be simpler -- just start over
every time
you get SQLITE_LOCKED. SQLite's shared cache mode also changes the
locking semantics as well.

I am looking into this more and will write all of this out in more
detail in my report.
In the meantime, if there is a programmer/engineer would might be able
to comment
on this, so much the better. In any case, it is something that would
need to be covered
clearly in the documentation.

Mike

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/bdb-beta-feedback/attachments/20100303/d858dcaa/attachment.html