lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Sat, 19 Mar 2011 14:30:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008776#comment-13008776
] 

Michael McCandless commented on LUCENE-2573:
--------------------------------------------

  * I think once we sync up to trunk again, the FP should hold the
    IW's config instance, and pull settings "live" from it?  Ie this
    way we keep our live changes to flush-by-RAM.  Also, Healthiness
    (it won't get updates to RAM buffer now).

  * Should we rename *ByRAMFP --> *ByRAMOrDocCountFP?  Since it "ors"
    docCount and RAM usage trigger right?  Oh, I see, not quite -- it
    requires RAM buffer be set.  I think we should relax that?  Ie a
    single flush policy (the default) flushes by either/or?

  * Shouldn't these flush policies also trigger by
    maxBufferedDelCount?

  * Maybe FP.init should throw IllegalStateExc not IllegalArgExc?
    (Because, no arg is allowed once the "state" of FP has already
    been init'ed).

  * Probably FP.writer should be a SetOnce?

  * Hmm we still have a FlushPolicy.message?  Can't we just make IW
    protected and then FlushPolicy impl can call IW.message?  (And
    also remove FP.setInfoStream).

  * Is IW.FlushControl not really used anymore?  We should remove it?

  * I still think LW should be 1.0 of your RAM buffer.  Ie, IW will
    start flushing once that much RAM is in use.

  * I still see "synchronized (docWriter.flushControl) {" in
    IndexWriter

  * We should jdoc that IWC.setFlushPolicy takes effect only on init
    of IW?

  * Add "for testing only" comment to IW.getDocsWriter?

  * I wonder whether we should convey "what changed" to the FP?  EG,
    we can 1) buffer a new del term, 2) add a new doc, 3) both
    (updateDocument).  It could be we have onUpdate, onAdd, onDelete?
    Or maybe we keep single method but rename to onChange?  Ie, it's
    called because *something* about the incoming DWPT has changed.

  * The flush policy shouldn't have to compute "delta" RAM like it
    does now?  Actually why can't it just call
    flushControl.activeBytes(), and we ensure the delta was already
    folded into that?  Ie we'd call commmitPerThreadBytes before
    FP.visit.  (Then commitPerThreadBytes wouldn't ever add to
    flushBytes, which is sort of spooky -- like flushBytes should get
    incr'd only when we pull a DWPT out for flushing).

  * I don't think we should ever markAllWritersPending, ie, that's
    not the right "reaction" when flushing is too slow (eg you're on a
    slow hard drive) since over time this will result in flushing lots
    of tiny segments unnecessarily.  A better reaction is to stall the
    incoming threads; this way the flusher threads catch up, and once
    you resume, then the small DPWTs have a chance to get big before
    they are flushed.

  * Misspelled: markLargesWriterPending -> markLargestWriterPending


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across
all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:
 
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush
at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using
total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for
simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110%
for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message