httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From (Robert S. Thau)
Subject Re: help with a concurrency locking problem?
Date Thu, 23 May 1996 22:18:13 GMT
[ Surveys the following note for last proofread before sending out... ]
[ Thinks "hoo boy, thesis avoidance has hit with a *vengeance*"... ]

Sameer, your original code does have a problem.  The code

      cls->tableptr[orig->finfo.st_uid].bytes_sent += orig->bytes_sent;

translates into something like (in made up pseudo-assembler):

      ... get orig->bytes_sent into r2 ...
      ... compute address in r5 ...

      ld   r7, (r5)       # A: Fetch value
      addi r7, r2         # B: Add orig->bytes_sent
      st   r7, (r5)       # C: Store updated value

Now, suppose that process 3 is executing this code, and gets
interrupted at instruction B --- it has an updated value in its
registers, but has not yet written it out.  Then, if some other process
(say, process 5) comes along and does an update, it will use the
original value, since process 3 hasn't written out its result yet.
Then, when process 3 gets restarted, it finally writes out the value
which it has already computed --- wiping out the results of the update
done by process 5.

In short, this code *is* unsafe.  However, I would not expect updates
to be lost very frequently for this reason --- if you're using a
decent compiler at a reasonable optimization level, the unprotected
critical section is only a few instructions long, and lost updates
should not be happening very often.  I'd be really surprised if it
caused every third update to be lost, which sounds like about what you
suspect is happening.

Still, it probably ought to be fixed.  However, adding a "locked"
field to the structure isn't the proper fix unless your machine
supports an atomic test-and-set instruction, and you use that
instruction to manipulate the lock field.  Otherwise (assuming the
"locked" field is properly declared volatile), the machine code will
probably wind up looking something like:

     # compute address in r3

     ld   r4, (r3)      # A: load lock value
     tst  r4            # B: check if locked
     bnz  while_loop    # C: Go back if so

     ldi  r4, 1         # D: Get a "1"
     st   r4, (r3)      # E: Set the lock

     ... perform update as above ...

This code has a *very* similar race condition --- if some process gets
interrupted at D, another process can come in and grab the lock, after
which the first can proceed and they both think they have it.

So, this particular locking strategy doesn't work.  There actually are
locking algorithms in the literature that work in a load-store model
(Leslie Lamport's published several papers on them, which make
interesting reading), but they are a bit tricky.  The easiest thing
would be to use fcntl() --- even if you just have one lock for the
entire table, I doubt processes will be waiting for it terribly often.
(If you do want to lock the individual records, and you're on a system
that supports sysv IPC, then a bank of semaphores is what you're
really after --- read the semop man page for details).

Concurrent programming doesn't always land you this deep --- if you
can arrange things so that, by policy, only one thread at a time ever
has permission to write any given data structure, and all the others
may only read it, life stays pretty simple.  (This rule is what keeps
the scoreboard code semi-sane).  However, when different threads want
to write the same shared data structure (i.e., the same words of
memory), things get hairy fast.  Studying a decent OS internals text
--- particularly one that deals with management of multi-processor
systems --- may help to get the lay of the land.


View raw message