lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1313) Realtime Search
Date Wed, 29 Apr 2009 10:04:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704051#action_12704051
] 

Michael McCandless commented on LUCENE-1313:
--------------------------------------------

{quote}
I assume it's ok for the IW.mergescheduler to be used which may
not immediately perform the merge to disk (in the case of
ConcurrentMergeScheduler)?
{quote}

Only if we "accept" requiring MergePolicy to be aware that some
segments are in RAMDir and some are in the "real" Dir and to "act
accordingly", ie 1) don't mix the dirs when merging, 2) when RAM is
"full" merge every single RAM segment into a single "real Dir" segment
(requires IW to provide exposure on how much RAM DW's buffer is
currently consuming), 3) properly "maintain" the RAM segments (ie,
merge RAM -> RAM somehow) so that searchers don't search too many RAM
segments.

I think this approach is probably best: you're right that allowing CMS
to manage these RAM segments is nice since it'll happen in the BG and
will not block updates.

It does mean, though, that the RAM usage semantics of IW is no longer
so "crisp" as flushing today ("once RAM is full, stop world & flush it
to disk, then resume") but I think that's acceptable and perhaps
preferable since world is no longer stopped to flush RAM -> disk.

Though one trickiness is... if a large RAM -> RAM merge takes place,
we temporarily double the RAM consumption.  I think MergePolicy simply
shouldn't do that.  Ie at not point should it be merging a very large
%tg of the RAM segments.  It should instead merge RAM -> disk.

This'd also mean advanced users that implement their own MergePolicy
must realize when IW is used with NRT reader that additional smarts is
recommended wrt 

{quote}
When implementing using
addIndexesNoOptimize (which blocks) I realized we probably don't
want blocking to occur because that means shutting down the
updates.
{quote}
Right, this is one of the strong reasons to do the "internal" approach
vs "external" one.

{quote}
Also a random thought, it seems like ConcurrentMergeScheduler
works great for RAMDir merging, how does it compare with
SerialMS on an FSDirectory? It seems like it shouldn'y be too much
faster given the IO sequential access bottleneck?
{quote}

By far the biggest win of CMS over SMS is in the first merge, because
it does not block the further addition of docs.  Thus an app can
continue indexing into RAM buffer (consuming CPU & RAM resources)
while a BG thread consumes RAM + IO resources.  This is very much a
win.

Beyond the first merge...in theory, modern IO systems have concurrency
(eg the NCQ in a single SATA drive) so you should "gain" by having
several threads performing IO at once.  The OS & hard drives attempt
to re-order the request in a more optimal way (like an elevator,
sweeping floors).  I haven't explictly tested this with Lucene...

I believe SSDs handle concurrent requests very well since under the
hood most of them are multi-channel basically RAID0 devices (eg Intel
X25M has 10 channels).


> Realtime Search
> ---------------
>
>                 Key: LUCENE-1313
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1313
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch,
LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Realtime search with transactional semantics.  
> Possible future directions:
>   * Optimistic concurrency
>   * Replication
> Encoding each transaction into a set of bytes by writing to a RAMDirectory enables replication.
 It is difficult to replicate using other methods because while the document may easily be
serialized, the analyzer cannot.
> I think this issue can hold realtime benchmarks which include indexing and searching
concurrently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message