lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: DocValues space usage
Date Tue, 09 Apr 2013 16:21:20 GMT
On Tue, Apr 9, 2013 at 9:06 AM, Wei Wang <> wrote:

> Thanks for the hint. Could you point to some Codec that might do this for
> some types, even just as an side effect as you mentioned? It will be
> helpful to have something to start with.

Have a look at diskdv/ codec in the codecs/ module. Its a lot simpler than
the default codec because it doesnt have the "tradeoff speed for space"
performance hacks of the default codec. It might already do something thats
good enough for your needs.

> And could you elaborate a bit more for "the facet on tons of sparse
> fields"? I just got a vague idea from the comments.

Look at lucene/facet module. As opposed to applications like solr and
elasticsearch which would build fieldcaches/docvalues/whatever on hundreds
of "fields", I think this one uses just a single binary docvalue field to
implement ordinal storage across all "fields" (i think it calls them
dimensions or something else).

Of course you can simulate this yourself with other approaches too.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message