lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Binda <>
Subject Compressing docValues with variable length bytes[] by block of 16k ?
Date Sat, 08 Aug 2015 14:19:36 GMT

are there any plans to implement compression of the variable length 
bites[] binary doc Values,
say in blocks of 16k like for stored values ?

my .cfs file goes from 2MB to like 400k when I zip it

Best regards,

On 08/08/2015 02:32 PM, jamie wrote:
> Greetings
> Our app primarily uses Lucene for its intended purpose i.e. to search 
> across large amounts of unstructured text. However, recently our 
> requirement expanded to perform look-ups on specific documents in the 
> index based on associated custom defined unique keys. For our 
> purposes, a unique key is the string representation of a 128 bit 
> murmur hash, stored in a Lucene field named uid.  We are currently 
> using the TermsFilter to lookup Documents in the Lucene index as follows:
> List<Term> terms = new LinkedList<>();
>             for (String id : ids) {
>                 terms.add(new Term("uid", id));
> }
> TermsFilter idFilter = new TermsFilter(terms);
> ... search logic...
> At any time we may need to lookup say a couple of thousand documents. 
> Our problem is one of performance. On very large indexes with 30 
> million records or more, the lookup can be excruciatingly slow. At 
> this stage, its not practical for us to move the data over to fit for 
> purpose database, nor change the uid field to a numeric type. I fully 
> appreciate the fact that Lucene is not designed to be a database, 
> however, is there anything we can do to improve the performance of 
> these look-ups?
> Much appreciate
> Jamie

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message