lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Thu, 31 Mar 2011 13:44:05 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013974#comment-13013974
] 

Simon Willnauer commented on LUCENE-2573:
-----------------------------------------

I run a couple of benchmarks with interesting results the graph below show documents per second
for the RT branch with DWPT yielding a very good IO/CPU utilization and overall throughput
is much better than trunks.
!http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png! 
Yet, when we look at trunk the peak performance is much better on trunk than on DWPT. The
reason for that I think is that we flush concurrently which takes at most one thread out of
the loop, those are the little drops in docs/sec. This does not yet explain the reason for
the constantly lower max indexing rate, I suspect that this is at least influenced due to
the fact that flushing is very very CPU intensive. At the same time CMS might kick in way
more often since we are writing more segments which are also smaller compared to trunk. Eventually,
I need to run a profiler and see what is going on.
!http://people.apache.org/~simonw/Trunk_dps.png! 

Interesting is that beside the nice CPU utilization we also have an nearly perfect IO utilization.
The graph below shows that we are consistently using IO to flush segments. the width of the
bars show the time it took to flush a single DWPT, there is almost no overlap.
!http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png! 

Overall those are super results! Good job everybody!

simon

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across
all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:
 
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush
at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using
total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for
simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110%
for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message