lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <>
Subject Re: Future projects
Date Tue, 07 Apr 2009 23:05:00 GMT
 >  I think we should keep it simple, unless we discover real perf problems
with the current approach.

Simple is good, however the indexing performance will lag because we're back
to the indexing speed of pre ram buffer? (i.e. merging segments using a

> need to do a merge sort (across the N thread states)

I'm confused about why a merge sort is required?

On Tue, Apr 7, 2009 at 1:45 AM, Michael McCandless <> wrote:

> On Mon, Apr 6, 2009 at 6:43 PM, Jason Rutherglen
> <> wrote:
> >> The realtime reader would have to have sub-readers per thread,
> > and an aggregate reader that "joins" them by interleaving the
> > docIDs
> >
> > Nice (i.e. nice and complex)!
> Right, this is why I like the current [simple] near real-time
> approach.  I think we should keep it simple, unless we discover real
> perf problems with the current approach.
> > Not knowing too much about the
> > internals, how would the interleaving work? Does each subreader
> > have a "start" ala Multi*Reader? Or are the doc ids incremented
> > from a synced place such that no two readers have the same doc
> > id?
> The docID must be woven/interleaved together (unlike MultiReader,
> where they are concatenated).  DW ensures that a given docID is used
> by only 1 thread.  So you'd need to do a merge sort (across the N
> thread states) on reading the postings for a given term.  Probably
> we'd then suggest for best searching performance to use a single
> thread for indexing when NRT search will be used.
> >> BTW there are benefits to not reusing the RAM buffer, outside
> > of faster near real-time search
> >
> > Not reusing the RAM buffer means not reusing the pooled byte
> > arrays after a flush or something else?
> Pooled byte, char and int arrays, *PerThread, *PerField classes, norms,
> etc.
> > SSDs are cool, I can't see management approving of those quite
> > yet, are there many places piloting Lucene on SSDs that you're
> > aware of?
> Yes they are still somewhat expensive, though the gain in productivity
> is sizable, and prices have been coming down...
> EG I have a zillion Lucene source code checkouts, and it used to be
> whenever I switch back to one and do an "svn up" or "svn diff", it's a
> good 30 seconds of disk heads grinding away before anything really
> happened.  Now it's a second or two.  VMWare/Parellels also become
> much more responsive.  Not hearing  disk heads grinding is somewhat
> disconcerting at first, though.
> At least several people on java-user have posted benchmarks with SSDs.
> SSDs are clearly the future and I think we need to think more about
> what their adoption means for our prioritization of Lucene's ongoing
> improvements.  EG I think it means the CPU cost of searching, single
> search concurrency (using more than one thread on one search), become
> important, because once your index is on an SSD Lucene will spend far
> less time waiting for IO to complete even on "normal" installations
> that don't cache the entire index in RAM.  I think we especially need
> to figure out how to leverage concurrency in the IO system (but alas
> we don't have an async IO API from Java... we'd have to "emulate" it
> using threads).
> > From what you've said so far, this is how I understand realtime
> > ram buffer readers could work:
> >
> > There'd be a IndexWriter.getRAMReader method that gathers all
> > the ram buffers from the various threads, marks a doc id as the
> > last one for the overall RAMBufferMultiReader. A new set of
> > classes, RAMBufferTermEnum, RAMBufferTermDocs,
> > RAMBufferTermPositions would be implemented that can read from
> > the ram buffer.
> Right, but we shouldn't start work on this until we see a reason to.
> And even once that reason appears, we should next do the intermediate
> optimization of using RAMDir for newly flushed segments.
> > I don't think the current field cache API would like growing
> > arrays? Something hopefully LUCENE-831 will support.
> I'm thinking for LUCENE-831 we should make field-cache segment
> centric, which would then play well w/ NRT.
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message