lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
Date Tue, 16 Mar 2010 20:18:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846102#action_12846102
] 

Jason Rutherglen commented on LUCENE-2324:
------------------------------------------

Michael,

For LUCENE-2312, I think the searching isn't going to be an
issue, I've got basic per thread doc writers working (though not
thoroughly tested). I didn't see a great need to rework all the
classes, which even if we did, I'm not sure helps with the byte
array read write issues? I'd prefer to get a proof of concept
more or less working, then refine it from there. I think there's
two main design/implementation issues before we can roll
something out:

1) A new skip list implementation that at specific intervals
writes a new skip (ie, single level). Right now in trunk we have
a multilevel skiplist that requires ahead of time the number of
docs.

2) Figure out the low -> high levels of byte/char/int array
visibility to reader threads. The main challenge here is the
fact that the DW related code that utilizes this is really hard
for me to understand enough to know what can be changed, without
the side effect being bunches of other broken stuff. If there
was a Directory like class abstraction we could simply override
and reimplement, we could do that, and maybe there is one, I'm
not sure yet. 

However if reworking the PerThread classes somehow makes the tie
into the IO (eg, the byte array pooling) system abstracted and
easier, then I'm all for it.

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message