lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eksdev <>
Subject Re: Compressed stored fields and multiGet(sorted luceneId[])?
Date Thu, 08 Nov 2012 13:18:37 GMT

On Nov 8, 2012, at 11:30 AM, Robert Muir <> wrote:

Thanks everybody for response, and  much more of the same for the great project

> Why are you retrieving thousands of stored fields?

 I do not  think it is all that rare that people  actually do something with information 
but display summaries?  
Clustering in solr does exactly that,  online record linkage follows exactly the same pattern.

A pattern "fetch thousands of candidates and run some heavy processing on them" is surely
not  a typical "web search engine"  usage, but  philosophically,  a model:
a) search data 
b) do something with it
c) deliver 
is not that strange?

You say, b) should not be done using stored fields, ok I trust you, but going to database/nosql/anything
 is even slower. What approach would you recommend? 

"the probability of two documents of the same results page being in the same chunk is very

Adrian, Robert, this is 100% correct, no objection there.   
In this particular case we are using locality of reference heavily.  We simply sort the data
and reindex from time to time. You have to be lucky to be able to sort the documents, but
 we do not use lucene for big chunks of text, rather for almost fully structured data and
we know how to sort this data to preserve locality of reference… Also a bit unusual, but
 I do not think all that rare scenario. 
Sorting data (where possible) was a great optimisation tip for many applications, even before

"really you should roll your own codec for this and specialise."

Yes, already started thinking about it, but we  will first try to play with chunk size to
see if we can achieve the goal without own codec …

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message