lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Tue, 07 Sep 2010 13:47:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906798#action_12906798
] 

Jason Rutherglen commented on LUCENE-2573:
------------------------------------------

bq. shouldn't tiered flushing take care of this

Faulty thinking for a few minutes.

{quote}but this won't be most efficient, in general? Ie we could end up creating tiny segments
depending on luck-of-the-thread-scheduling?{quote}

True.  Instead, we may want to simply not-flush the current DWPT if it is in fact not the
highest RAM user.  When addDoc is called on the thread with the highest RAM usage, we can
then flush it.

bq. there's no longer a need to track per-doc pending RAM

I'll remove it from the code.

{quote}If a buffer is not in the pool (ie not free), then it's in use and we count that as
RAM used{quote}

Ok, I'll make the change.  

{quote}we have to track net allocated, in order to trim the buffers (drop them, so GC can
reclaim) when we are over the .setRAMBufferSizeMB{quote}

I haven't seen this in the realtime branch.  Reclamation of extra allocated free blocks may
need to be reimplemented.  

I'll increment num bytes used when a block is returned for use.

On this topic, do you have any thoughts yet about how to make the block pools concurrent?
 I'm still leaning towards a random access file (seek style) interface because this is easy
to make concurrent, and hides the underlying block management mechanism, rather than directly
exposes it like today, which can lend itself to problematic usage in the future.

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across
all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:
 
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush
at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using
total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for
simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110%
for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message