lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
Date Mon, 22 Mar 2010 16:31:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848198#action_12848198
] 

Michael Busch commented on LUCENE-2312:
---------------------------------------

Hi Brian - good to see you on this list!

In my previous comment I actually quoted some sections of the concurrency book:
https://issues.apache.org/jira/browse/LUCENE-2312?focusedCommentId=12845712&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12845712

Did I understand correctly that a volatile write can be used to enforce a cache->RAM write-through
of *all* updates a thread made that came before the volatile write in the thread's program
order?

My idea here was to use this behavior to avoid volatile writes for every document, but instead
to periodically do such a volatile write (say e.g. every 100 documents).  I implemented a
class called MemoryBarrier, which keeps track of when the last volatile write happened.  A
reader thread can ask the MemoryBarrier what the last successfully processed docID before
crossing the barrier was.  The reader will then never attempt to read beyond that document.

Of course there are tons of details regarding safe publication of all involved fields and
objects.  I was just wondering if this general "memory barrier" approach seems right and if
indeed performance gains can be expected compared to doing volatile writes for every document?

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max doc ids.
 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message