lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
Date Wed, 21 Apr 2010 14:43:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859375#action_12859375
] 

Michael McCandless commented on LUCENE-2324:
--------------------------------------------

{quote}
I think the reason why we have two different APIs in mind (you: sourceID, I:
expert thread binder API) is that we're having different goals with
them?
{quote}

Yes, I think you're right!

{quote}
You want to make the out-of-the-box indexing performance as good as
possible, and users should have to set a minimum amount of
easy-to-understand parameters (such as buffer size in MB). I think
that's the right thing to do of course.  (though that doesn't prevent
us from adding an expert API in addition, as we always have)
{quote}

Right -- simple things should be simple and complex things should be
possible.

{quote}
I'm thinking a lot about real-time indexing and the searchable RAM buffer
these days, so the thread-binder API could help you to have more control over
where your docs will actually end up and which reader will see them. But I
think too that this API would be very "expert" and not many people would use
it.
{quote}

In fact now I want both :)

Ie, make it possible (optional) to declare the sourceID, and IW
optimizes based on this hint.

But, also, letting advanced apps directly control individual DWPTs.
(I think a new experimental Indexer interface can work well here...).

{quote}
bq. We can do this as a separate issue... it's fairly orthogonal.

Yeah I was just thinking the same - I agree.
{quote}

OK I'll open this...


{quote}
bq. Is this sync really so bad? First, we should move all allocators/pools to per-DWPT, so
they don't need to be sync'd.

OK cool that we agree on that. I was worried you wanted to have global pools
too, if it's only the single long it's not very complicated, I agree.
{quote}

Yeah let's not do global pools (anymore!)...

bq. Sorry if I'm being annoying :)

No, you're not!  You're asking good questions (as usual)!

{quote}
My goal is to have a default indexing chain that isn't slower than the one we
have today, but searchable and that very fast. That's not trivial, but I think
we can do it!
{quote}

This is an awesome goal, and I agree very reachable.  Though the
deletes/sequence ID/merging/NRT interaction is going to be fun...

{quote}
I'll implement the global flush trigger and make all pools DWPT-local. The
explicit thread-binder or sourceID APIs we can worry about later, as we agreed
above.
{quote}

OK thanks.


> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: lucene-2324.patch, LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message