lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2312) Search on IndexWriter's RAM Buffer
Date Thu, 25 Aug 2011 04:40:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090764#comment-13090764
] 

Jason Rutherglen commented on LUCENE-2312:
------------------------------------------

A benchmark plan is, compare the speed of NRT vs. RT.  

Index documents in a single thread, in a 2nd thread open a reader and perform a query.  It
would be nice to synchronize the point / max doc at which RT and NRT open new readers to additionally
verify the correctness of the directly comparable search results.  To make the test fair,
concurrent merge scheduler should be turned off in the NRT test.

The hypothesis is that array copying, even on large [RT] indexes is no big deal compared with
the excessive segment merging with NRT.

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, LUCENE-2312.patch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max doc ids.
 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message