incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Threads, revisited
Date Wed, 06 Jan 2010 03:28:57 GMT

Recent changes to VTable and the addition of LockFreeRegistry have made Lucy's
core OO framework thread safe.  There are still a couple of TODO items:

  * JSON::XS is not thread safe, and we need to replace it with our own custom 
    JSON parsing module.
  * Lucy->error must be made into a thread local and tested as such.

Nevertheless, we are close to being able to advertise basic support for

In my opinion, we should stop there.

Here is a documentation class I would like to add, Lucy::Docs::ThreadSupport.

    /** Using Lucy with threads.
     * Lucy's primary concurrency model is processes.  Threads are also
     * supported, so long as those threads look like processes -- meaning that
     * objects must not be shared across threads.
     * For illustration: It is safe to start up several threads, each with its
     * own search object, and have those searchers operate concurrently
     * against the same index.  However it is not safe (and will eventually
     * result in segfaults or other severe memory errors) to start up a search
     * object in a "boss" thread and then search against that object in
     * several worker threads.

    inert class Lucy::Docs::ThreadSupport { }

The advantage of stopping at that level of support is that we can provide our
users with a rule which is clear and for practical purposes absolute:

  Don't share objects across threads -- period.

There are other classes besides VTable which it might be nice to make thread
safe, like Folder -- but the instant we start down that path, we destroy the
clarity of our "don't share" rule.  

There are two use cases where threads would be particularly useful, and it's
worth addressing those specifically.

First... during indexing, it is faster if you can offload analysis to worker
threads running on multiple CPUs.  However, having implemented a
BackgroundMerger for KinoSearch which can operate simultaneously with an
Indexer, I am now reasonably confident that we can achieve this using multiple
Indexer objects in separate processes (or threads) writing to separate

Second... at search time, it would be useful to dedicate one CPU to each index
segment to achieve the fastest possible search response.  For this, I don't
have as good an answer, but I think we will be able to max out speed on some
systems using non-portable IPC techniques.  If we can achieve no-compromises
performance on a few Unixen that way, I think that's good enough for me.

Prior disussion:

Marvin Humphrey

View raw message