lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2340) FixedIntBlockIndexOutput encodes unnecessary integers at the end of a list
Date Mon, 22 Mar 2010 19:57:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848305#action_12848305
] 

Michael McCandless commented on LUCENE-2340:
--------------------------------------------

bq. This can be problematic and causes a big overhead when using large blockSize (e.g., 1024),
on small segments or on rare term posting list.

The block is "shared" across postings, so a rare posting list in an otherwise big segment
should be fine?

Small segments will indeed be wasteful, but they'll presumably quickly be merged away.

bq. The new implementation of SimpleIntBlockIndex* is even more silly than the previous one,
and store a vint at the beginning of each block for recording the length of a block.

Would other less-silly impls also need to do this?  Ie the thing I want to avoid is foisting
onto all block-based codecs the need to track the size of every block...

> FixedIntBlockIndexOutput encodes unnecessary integers at the end of a list
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-2340
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2340
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Renaud Delbru
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-1458-FixedIntBlockIndexOutput.patch, LUCENE-1458-FixedIntBlockIndexOutput.patch
>
>
> At closing time, the current FixedIntBlockIndexOutput flushes blocks of blockSize even
if there is only a few integers in the block.
> This can be problematic and causes a big overhead when using large blockSize (e.g., 1024),
on small segments or on rare term posting list. 
> One solution will be to have a secondary flushBlock method with an additional paramter:
the valid length of a buffer. This method will be only called in the FixedIntBlockIndexOutput#close()
method.
> The way this particular block of integers are encoded are left to subclasses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message