lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2312) Search on IndexWriter's RAM Buffer
Date Mon, 22 Mar 2010 17:01:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848210#action_12848210
] 

Michael Busch edited comment on LUCENE-2312 at 3/22/10 5:01 PM:
----------------------------------------------------------------

bq. So.. what does this mean for allowing an IR impl to directly search IW's RAM buffer?

The main idea is to have an approach that is lock-free.  Then write performance will not suffer
no matter how big your query load is.

When you open/reopen a RAMReader it would first ask the MemoryBarrier for the last sync'ed
docID (volatile read).  This would be the maxDoc for that reader and it's safe for the reader
to read up to that id, because it can be sure that all changes the writer thread made up to
that maxDoc are visible to the reader.

If we called MemoryBarrier.sync() let's say every 100 docs, then the max. search latency would
be the amount of time it takes to index 100 docs.  Doing no volatile/atomic writes and not
going through explicit locks for 100 docs will allow the JVM to do all its nice optimizations.
 I think this will work, but honestly I have not really a good feeling for how much performance
this approach would gain compared to writing to volatile variables for every document.

      was (Author: michaelbusch):
    bq. So.. what does this mean for allowing an IR impl to directly search IW's RAM buffer?

The main idea is to have an approach that is lock-free.  Then write performance will not suffer
no matter how big your query load is.

When you open/reopen a RAMReader it would first ask the MemoryBarrier for the last sync'ed
docID.  This would be the maxDoc for that reader and it's safe for the reader to read up to
that id, because it can be sure that all changes the writer thread made up to that maxDoc
are visible to the reader.

If we called MemoryBarrier.sync() let's say every 100 docs, then the max. search latency would
be the amount of time it takes to index 100 docs.  Doing no volatile/atomic writes and not
going through explicit locks for 100 docs will allow the JVM to do all its nice optimizations.
 I think this will work, but honestly I have not really a good feeling for how much performance
this approach would gain compared to writing to volatile variables for every document.
  
> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max doc ids.
 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message