lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Tue, 15 Mar 2011 16:43:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007003#comment-13007003
] 

Michael McCandless commented on LUCENE-2573:
--------------------------------------------

bq. it currently holds the ram usage for that DWPT when it was checked out so that I can reduce
the flushBytes accordingly. We can maybe get rid of it entirely but I don't want to rely on
the DWPT bytesUsed() though.

Hmm, but, once a DWPT is pulled from production, its bytesUsed()
should not be changing anymore?  Like why can't we use it to hold its
bytesUsed?

bq. I generally don't like cluttering DocWriter and let it grow like IW. DocWriterSession
might not be the ideal name for this class but its really a ram tracker for this DW. Yet,
we can move out some parts that do not directly relate to mem tracking. Maybe DocWriterBytes?

Well DocWriter is quite small now :) (On RT branch).  And adding
another class means we have to be careful about proper sync'ing (lock
order, to avoid deadlock)... and I think it should get smaller if we
can remove state[] array, FlushState enum, etc. but, OK I guess we can
leave it as separate for now.  How about DocumentsWriterRAMUsage?
RAMTracker?

{quote}
bq. Instead of FlushPolicy.message, can't the policy call DW.message?

I don't want to couple that API to DW. What would be the benefit beside from saving a single
method?
{quote}

Hmm, good point.  Though, it already has a SetOnce<DocumentsWriter> --
how come?  Can the policy call IW.message?  I just think FlushPolicy
ought to be very lean, ie show you exactly what you need to
implement...

{quote}
bq. On the by-RAM flush policies... when you hit the high water mark, we
should 1) flush
all DWPTs and 2) stall any other threads.

Well I am not sure if we should do that. I don't really see why we should forcefully stop
the world here. Incoming threads will pick up a flush immediately and if we have enough resources
to index further why should we wait until all DWPT are flushed. if we stall I fear that we
could queue up threads that could help flushing while stalling would simply stop them doing
anything, right? You can still control this with the healthiness though. We currently do flush
all DWPT btw. once we hit the HW.
{quote}

As long as we default the high mark to something "generous" (2X low
mark), I think this approach should work well.

Ie, we "begin" flushing as soon as low mark is crossed on active RAM.
We pick the biggest DWPT and take it of rotation, and immediately
deduct its RAM usage from the active pool.  If, while we are still
flushing, active RAM again grows above the low mark, then we pull
another DWPT, etc.  But then if ever the total flushing + active
exceeds the high mark, we stall.

BTW why do we track flushPending RAM vs flushing RAM?  Is that
distinction necessary?  (Can't we just track "flushing" RAM?).


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across
all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:
 
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush
at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using
total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for
simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110%
for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message