lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Compressed stored fields and multiGet(sorted luceneId[])?
Date Thu, 08 Nov 2012 10:43:06 GMT
Hi!

I'll try to complete what Simon and Robert said:

On Thu, Nov 8, 2012 at 8:56 AM, eksdev <eksdev@googlemail.com> wrote:

> Just a theoretical question, would it make sense to add some sort of
> StoredDocument[] bulkGet(int[] docId) to fetch multiple stored documents in
> one go?
>
> The reasoning behind is that now with compressed blocks random-access gets
> more expensive, and in some cases  a user  needs to fetch more documents in
> one go. If it happens that more documents come from one block it is a win.
> I would also assume, even without compression , bulk access on sorted
> docIds cold be a win (sequential access)?
>
> Does that make sense, is it doable? Or even worse, does it already exist :)
>

Even with small documents (100 bytes, 160 docs per chunk) and a small index
(100K docs), there would still be 625 chunks, so the probability of two
documents of the same results page being in the same chunk is very low. So
I think we should not optimize for this case.

However, CompressingStoredFieldsFormat implements efficient sequential
iteration internally in order to improve merging performance: when merging
a segment, every chunk gets decompressed only once.

By the way, I am impressed how well compression does, even on really short
> stored documents, approx. 150b  we observe 35% reduction. Fetching 1000
> short documents on fully cached index  is observably slower (2-3 times),
> but as soon as you memory gets low, compression wins quickly.
>

Awesome! Thank you for trying it!

-- 
Adrien

Mime
View raw message