lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Date Thu, 31 Mar 2011 15:27:11 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014016#comment-13014016
] 

Jason Rutherglen commented on LUCENE-2573:
------------------------------------------

bq. influenced due to the fact that flushing is very very CPU intensive

Do you think this is due mostly to the vint decoding?  We're not interleaving postings on
flush with this patch so the CPU consumption should be somewhat lower.

bq. At the same time CMS might kick in way more often since we are writing more segments which
are also smaller compared to trunk

This's probably the more likely case.  In general, we may be able to default to a higher overall
RAM buffer size, and perhaps there won't be degradation in indexing performance like there
is with trunk?  In the future with RT we could get fancy and selectively merge segments as
we're flushing, if writing larger segments is important.  

I'd personally prefer to write out 1-2 GB segments, and limit the number of DWPTs to 2-3,
mainly for servers that are concurrently indexing and searching (eg, the RT use case).  I
think the current default number of thread states is a bit high.  

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across
all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:
 
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush
at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using
total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for
simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110%
for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message