lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
Date Wed, 21 Apr 2010 15:13:52 GMT


Michael McCandless commented on LUCENE-2324:

Sounds like we use the current thread affinity system (ie, a
hash map), that when the max threads is reached, new threads get
kind of round robined onto existing DWPTs?

Yeah something along those lines... and clearing out all mappings for
a given DWPT when it flushes.

bq. if RAM usage grows too much beyond your first trigger and before that first flush has
finished, start a 2nd DWPT flushing, etc.

What's the definition of the "too much" portion of the above

We could do something simple, eg, at 90% RAM used, you flush your
first DWPT.  At 110% RAM used, you flush all DWPTs.  And take linear
steps in between?

EG if I have 5 DWPTs, I'd flush first one at 90%, 2nd at 95%, 3rd at
100%, 4th at 105% and 5th at 110%.

Of course, if flushing is fast, then RAM is quickly freed up, then we
only flush 1 DWPT at a time... we only need these tiers to
self-regulate RAM consumed from ongoing indexing vs time it takes to
do the flush.

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>                 Key: LUCENE-2324
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>         Attachments: lucene-2324.patch, LUCENE-2324.patch
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message