lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: [jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Thu, 31 Mar 2011 14:31:32 GMT
Dr
On Mar 31, 2011 9:44 AM, "Simon Willnauer (JIRA)" <jira@apache.org> wrote:
>
> [
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013974#comment-13013974]
>
> Simon Willnauer commented on LUCENE-2573:
> -----------------------------------------
>
> I run a couple of benchmarks with interesting results the graph below show
documents per second for the RT branch with DWPT yielding a very good IO/CPU
utilization and overall throughput is much better than trunks.
> !http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png!
> Yet, when we look at trunk the peak performance is much better on trunk
than on DWPT. The reason for that I think is that we flush concurrently
which takes at most one thread out of the loop, those are the little drops
in docs/sec. This does not yet explain the reason for the constantly lower
max indexing rate, I suspect that this is at least influenced due to the
fact that flushing is very very CPU intensive. At the same time CMS might
kick in way more often since we are writing more segments which are also
smaller compared to trunk. Eventually, I need to run a profiler and see what
is going on.
> !http://people.apache.org/~simonw/Trunk_dps.png!
>
> Interesting is that beside the nice CPU utilization we also have an nearly
perfect IO utilization. The graph below shows that we are consistently using
IO to flush segments. the width of the bars show the time it took to flush a
single DWPT, there is almost no overlap.
> !http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png!
>
> Overall those are super results! Good job everybody!
>
> simon
>
>> Tiered flushing of DWPTs by RAM with low/high water marks
>> ---------------------------------------------------------
>>
>> Key: LUCENE-2573
>> URL: https://issues.apache.org/jira/browse/LUCENE-2573
>> Project: Lucene - Java
>> Issue Type: Improvement
>> Components: Index
>> Reporter: Michael Busch
>> Assignee: Simon Willnauer
>> Priority: Minor
>> Fix For: Realtime Branch
>>
>> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch
>>
>>
>> Now that we have DocumentsWriterPerThreads we need to track total
consumed RAM across all DWPTs.
>> A flushing strategy idea that was discussed in LUCENE-2324 was to use a
tiered approach:
>> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
>> - Flush all DWPTs at a high water mark (e.g. at 110%)
>> - Use linear steps in between high and low watermark: E.g. when 5 DWPTs
are used, flush at 90%, 95%, 100%, 105% and 110%.
>> Should we allow the user to configure the low and high water mark values
explicitly using total values (e.g. low water mark at 120MB, high water mark
at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
config method and use something like 90% and 110% for the water marks?
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

Mime
View raw message