lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
Date Wed, 29 Sep 2010 16:59:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916199#action_12916199
] 

Jason Rutherglen commented on LUCENE-2575:
------------------------------------------

bq. We'd need to increase the level 0 slice size...

Yes. 

{quote}but the reader needs to read 'beyond' the end of a given
slice, still? Ie say global maxDoc is 42, and a given posting
just read doc 27 (which in fact is its last doc). It would then
try to read the next doc?{quote}

The posting-upto should stop the reader prior to reaching a byte
element whose value is 0, ie, it should never happen.

The main 'issue', which really isn't one, is that each reader
cannot maintain a copy of the byte[][] spine as it'll be
growing. New buffers will be added and the master posting-upto
will also be changing, therefore allowing 'older' readers to
possibly continue past their original point-in-time byte[][].
This is solved by adding synchronized around the obtainment of
the byte[] buffer from the BBP, thereby preventing out of bounds
exceptions.

{quote}We don't store tf now do we? Adding 4 bytes per unique
term isn't innocuous!{quote}

What I meant is, if we're merely maintaining the term freq array
during normal, non-RT indexing, then we're not constantly
creating new arrays, we're in innocuous land, though there is no
use for the array in this case, eg, it shouldn't be created
unless RT had been flipped on, modally. 

{quote}Hmm the full copy of the tf parallal array is going to
put a highish cost on reopen? So some some of transactional
(incremental copy-on-write) data structure is needed (eg
PagedInts)...{quote}

Right, this to me is the remaining 'problem', or rather
something that needs a reasonable go-ahead solution. For now we
can assume PagedInts is the answer.

In addition, to summarize the skip list. It needs to store the
doc, address into the BBP, and the length to the end of the
slice from the given address. This allows us to point to a
document anywhere in the postings BBP, and still continue with
slice iteration. In the test code I've written, the slice level
is stored as well, I'm not sure why/if that's required. I think
it's a hint to the BBP reader as to the level of the next slice.



> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message