lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Mon, 14 Mar 2011 19:07:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006579#comment-13006579
] 

Michael McCandless commented on LUCENE-2573:
--------------------------------------------

I still see a healtiness (mis-spelled) in DW.

I'd rather not have the stalling/healthiness be baked into the API, at
all.  Can we put the hijack logic entirely private in the flush-by-ram
policies?  (Ie remove isStalled()/hijackThreadsForFlush()).

Instead of

{noformat}
+    synchronized (docWriter.docWriterSession) {
+      netBytes = docWriter.docWriterSession.netBytes();
+    }
{noformat}

, shouldn't we just make that method sync'd?


Be careful defaulting TermsHash.trackAllocations to true -- eg term
vectors wants this to be false.

Can we move FlushSpecification out of FlushPolicy?  Ie, it's a private
impl detail of DW right?  (Not part of FlushPolicy's API).  Actually
why do we need it?  Can't we just return the DWPT?

Why do we have a separate DocWriterSession?  Can't this be absorbed
into DocWriter?

Instead of FlushPolicy.message, can't the policy call DW.message?

On the by-RAM flush policies... when you hit the high water mark, we
should 1) flush all DWPTs and 2) stall any other threads.

Why do we dereference the DWPTs with their ord?  EG, can't we just
store their "state" (active or flushPending) on the DWPT instead of in
a separate states[]?

Do we really need FlushState.Aborted?  And if not... do we really need
FlushState (since it just becomes 2 states, ie, Active or Flushing,
which I think is then redundant w/ flushPending boolean?).

I think the default low water should be 1X of your RAM buffer?  And
high water maybe 2X?  (For both flush-by-RAM policies).


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across
all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:
 
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush
at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using
total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for
simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110%
for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message