lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <>
Subject RE: FieldsReader synchronized access vs. ThreadLocal ?
Date Wed, 17 May 2006 07:21:43 GMT
The test results seem hard to believe. Doubling the CPUs only increased
through put by 20%??? Seems rather low for primarily a "read only" test.

Peter did not seem to answer many of the follow-up questions (at least I
could not find the answers) regarding whether or not the CPU usage was 100%.
If the OS cache is insufficient to support the size of the index and the
number of queries being executed, then you will not achieve linear increases
with the number of CPUs, since you will become quickly become IO bound
(especially if the queries are returning a wide variety of documents that
are scattered through out the index).

Since reading a document is a relatively expensive operation (especially if
the data blocks are not in the OS cache), if synchronized, no other thread
can read a document, or begin to read a document (in the case of an
OS/hardware that supports scatter/gather multiple IO requests). The is not
just applicable to cases where lots of documents are being read. Since the
isDeleted() method uses the same synchronized lock as document(), all query
scorers that filter out deleted documents will also be impacted, as they
will block while the document is being read.

In order to test this, I wrote the attached test case. It uses 2 threads,
one which reads every document in a segment, another which reads the same
document repeatedly (for as many documents as there are in the index). The
theory being, the "readsame" should be able to execute rather quickly (since
the needed disk blocks will quickly become available in the OS cache), where
as the "readall" will be much slower (since almost every document retrieval
will require disk access).

I tested using a segment containing 100k documents. I ran the test on a
single CPU machine (1.2 ghz P4).

I used the windows "cleanmem" to clear the system cache before running the
tests. (It seemed unreliable at times. Does anyone know a fool-proof method
of emptying the system cache on windows???)

Running using the unmodified SegmentReader and FieldsReader (synchronized)
over multiple tests, I got the following:

ReadSameThread, time = 2359
ReadAllThread, time = 2469

ReadSameThread, time = 2671
ReadAllThread, time = 2968

Using the modified (unsynchronized using ThreadLocal) classes, I got the

ReadSameThread, time = 1328
ReadAllThread, time = 1859

ReadSameThread, time = 1671
ReadAllThread, time = 1953

I believe that using an MMap directory only improves the situation since the
OS reads the blocks much more efficiently (faster). Imagine if you were
running Lucene using a VERY SLOW disk subsystem - the synchronized block
would have an even greater negative impact.

Hopefully, this is enough to demonstrate the value of using ThreadLocals to
support simultaneous IO.

I look forward to your thoughts, and others - hopefully someone can run the
test on a multiple CPU machine.


-----Original Message-----
From: Doug Cutting []
Sent: Tuesday, May 16, 2006 3:17 PM
Subject: Re: FieldsReader synchronized access vs. ThreadLocal ?

Robert Engels wrote:
> It seems that in a highly multi-threaded server this synchronized method
> could lead to significant blocking when the documents are being retrieved?

Perhaps, but I'd prefer to wait for someone to demonstrate this as a
performance bottleneck before adding another ThreadLocal.

Peter Keegan has recently demonstrated pretty good concurrency using
mmap directory on four and eight CPU systems:

Peter also wondered if the SegmentReader.document(int) method might be a
bottleneck, and tried patching it to run unsynchronized:

Unfortunately that did not improve his performance:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message