lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sascha Janz" <Sascha.J...@gmx.net>
Subject Aw: RE: Re: Performance StringCoding.decode
Date Wed, 06 Aug 2014 15:56:42 GMT


 hi,
 
no, not for all results, but user can configure the result list size up to 100 documents.
 
i was already afraid, that this is a point where is nothing what i can do  to optimize.
 
the call for reading the docs comes from IndexSearcher.document (int n).
 
i tried also with only specific fields with IndexSearcher.document (int n,Set<String>).
but this made no difference.

We have about 10 TextFields per document, with one "large" body element. ( email content).


The results are sorted by timestamp, which is defined like this.

doc.add(new Field("timestamp", Long.toString(timestamp), Field.Store.YES, Field.Index.NOT_ANALYZED,
Field.TermVector.NO)

greetings 
sascha
 

Gesendet: Mittwoch, 06. August 2014 um 10:50 Uhr
Von: "Uwe Schindler" <uwe@thetaphi.de>
An: java-user@lucene.apache.org
Betreff: RE: Re: Performance StringCoding.decode
Hi,

It looks like you are fetching the stored fields of *all* search results. In general, Lucene
is made to return the most relevant documents to the user. Fetching stored fields is then
done only for like the 10 top-ranking results. If you do this for all results (which can be
thousands), this is of course a performance problem: the stored fields are compressed on disk
and after decompression the bytes have to be converted to UTF-16 Java Strings. There is not
much, Lucene can do.

If you use stored fields for ranking purposes (inside function queries), you should change
them to numeric docvalues fields.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Sascha Janz [mailto:Sascha.Janz@gmx.net]
> Sent: Wednesday, August 06, 2014 10:27 AM
> To: java-user@lucene.apache.org
> Subject: Aw: Re: Performance StringCoding.decode
>
> i used JMC ( Java Mission Control) from jdk7 u40+
>
>
> see here
>
>
> http://www.oracle.com/technetwork/java/javase/2col/jmc-relnotes-[http://www.oracle.com/technetwork/java/javase/2col/jmc-relnotes-]
> 2004763.html
>
>
>
> Gesendet: Dienstag, 05. August 2014 um 17:41 Uhr
> Von: "dizh@neusoft.com" <dizh@neusoft.com>
> An: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Betreff: Re: Performance StringCoding.decode how to monitor? use jprofile?
>
>
>
>
>
> From: Sascha Janz
> Date: 2014-08-05 22:36
> To: java-user@lucene.apache.org
> Subject: Performance StringCoding.decode hi,
>
> i want to speed up our search performance. so i run test and monitor them
> with java mission control.
>
> the analysis showed that one hotspot is
>
>
> sun.nio.cs.UTF_8$Decoder.decode(byte[], int, int, char[])
> - java.lang.StringCoding.decode(Charset, byte[], int, int)
> - java.lang.String.<init>(byte[], int, int, Charset) -
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.rea
> dField(DataInput,
> StoredFieldVisitor, FieldInfo, int)
> -
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visi
> tDocument(int,
> StoredFieldVisitor)
> -org.apache.lucene.index.SegmentReader.document(int, StoredFieldVisitor)
> -org.apache.lucene.index.IndexReader.document(int, Set)
>
> we use jdk 1.7.55 and lucene 4.9.0.
>
> is there a chance to speed this up? or do some changes in lucene
> IndexWriterConfig, e.g. use an other codec?
>
> we use the default values of IndexWriterConfig
>
>
> regards
> sascha
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ----------------------------------------------------------------------------------------------
> -----
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s) is intended only for the use of the intended
> recipient and may be confidential and/or privileged of Neusoft Corporation,
> its subsidiaries and/or its affiliates. If any reader of this communication is not
> the intended recipient, unauthorized use, forwarding, printing, storing,
> disclosure or copying is strictly prohibited, and may be unlawful.If you have
> received this communication in error,please immediately notify the sender
> by return e-mail, and delete the original message and all copies from your
> system. Thank you.
> ----------------------------------------------------------------------------------------------
> -----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message