lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Savia Beson <eks...@googlemail.com>
Subject Re: Please Help solve problem of bad read performance in lucene 4.2.1
Date Sun, 07 Jul 2013 19:00:56 GMT
  
compression did a lot of good, but there is one set of use cases where we have significant
speed loss when using defaults. 

There is lovely concept of Document Values that is perfect for smallish single fields, there
are also stored fields that are prefect for clunky texts, but there is a gap in applications
needing 10-20 smallish fields, and retrieving many documents for post-processing (e.g. think
clustering-like applications). 

Stored fields went slow for this case due to compression penalty (default codec) and DV require
seek per field… 

User has a few options to tweak, 
1. write own codec with smaller chunks  to reduce compression penalty (maybe Adrien comes
up with more crazy speed-ups in compression, like static dictionaries he mentioned :)
2. pack fields into structure and and store as DV byte array and be happy with parsing it
back and forth
3. Use old non-compressing codec

Second would probably work nice (Ido not know if DVs are intended for that?), but requires
user to do serialisation of many fields into single byte[]… that kind of defeats concept
of lucene fields as a user has to pack fields into document. That would be kind of "stored
fields for collection of small-sized fields  over DVs" 

Thinking aloud as I am not really happy to hear lucene cannot retrieve more than 20-ish documents
in meaningful time, It did in the past and is more than able to do it today, maybe for the
moment not in the most comfortable way :)


  


On Jul 7, 2013, at 6:53 PM, Chris Zhang <zhangjcmail@gmail.com> wrote:

> thanks Jack,
> yes, i should evaluate lucene by query performance.
> 
> 
> On Mon, Jul 8, 2013 at 12:45 AM, Jack Krupansky <jack@basetechnology.com>wrote:
> 
>> To be clear, Lucene and Solr are "search" engines, NOT "storage" engines.
>> Has someone claimed otherwise to you?
>> 
>> What is your query performance in in 4.x vs. 3.x? That's the true, proper
>> measure of Lucene and Solr performance.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Chris Zhang
>> Sent: Sunday, July 07, 2013 12:26 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Please Help solve problem of bad read performance in lucene
>> 4.2.1
>> 
>> 
>> thianks Adrien,
>> In my project, almost all hit docs are supposed to be fetched for every
>> query, what's why I am upset by the poor reading performance. Maybe I
>> should store field values which are expected to be stored in high
>> performance storage engine.
>> In the above test case, time consuming of reading all docs in lucene 3.0 is
>> about 78 sec, that reading speed is approximately 10MB/s , but 700+ sec in
>> lucene 4.2.1, which indicates reading speed is less than 1MB/s.  So I think
>> committer of lucene should pay attention to this.
>> 
>> 
>> On Sun, Jul 7, 2013 at 10:23 PM, Adrien Grand <jpountz@gmail.com> wrote:
>> 
>> Indeed, Lucene 4.1+ may be a bit slower for indices that comptelely
>>> fit in your file-system cache. On the other hand, you should see
>>> better performance with indices which are larger than the amount of
>>> physical memory of your machine. Your reading benchmark only measures
>>> IndexReader.get(int) which should only be used to display summary
>>> results (that is, only called 10 or 20 times per displayed page). Most
>>> of time, the bottleneck is rather searching which can be made more
>>> efficient on small indices by switching to an in-memory postings
>>> format.
>>> 
>>> --
>>> Adrien
>>> 
>>> ------------------------------**------------------------------**---------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>> 
>>> 
>>> 
>> 
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message