lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erwin Gunadi" <festiva.s...@gmail.com>
Subject RE: Performance problem on Solr query on stemmed values
Date Wed, 26 Feb 2014 08:29:04 GMT
Hi Erick,

thank you for the reply.
Yes, I'm using the fast vector highlighter (Solr 4.3). Every request should
only deliver 10 results.

Here is my schema configuration on both field:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1" catenateNumbers="1" catenateAll="1"
			preserveOriginal="1" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
	<analyzer type="multiterm">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
</fieldType>
<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
	<analyzer type="index">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.SnowballPorterFilterFactory"
language="German2" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.StopFilterFactory" />
		<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
		<filter class="solr.ShingleFilterFactory" />
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.StandardTokenizerFactory" />
		<filter class="solr.SnowballPorterFilterFactory"
language="German2" />
		<filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.StandardFilterFactory" />
		<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
	</analyzer>
	<analyzer type="multiterm">
		<tokenizer class="solr.WhitespaceTokenizerFactory" />
		<filter class="solr.ASCIIFoldingFilterFactory" />
	</analyzer>
</fieldType> 
<field name="spell" type="textSpell" indexed="true" multiValued="true" />
<field name="content" type="text" stored="true" indexed="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />

Field content contains in average around 5000 - 6000 words (only rough
estimation).

Best regards
Erwin




-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, February 25, 2014 3:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance problem on Solr query on stemmed values

Right, highlighting may have to re-analyze the input in order to return the
highlighted data. This will be significantly slower than the search,
especially if you have a large number of rows you're returning.

You can get better performance in highlighting by using
FastVectorHighlighter. See:

https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter

1000x is unusual, though, unless your fields are very large or you're
returning a lot of documents.

Best,
Erick


On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi <festiva.sing@gmail.com>wrote:

> Hi,
>
>
>
> I would like to know whether anyone have experienced this kind of 
> phenomena.
>
>
>
> We are having performance problem regarding query on stemmed value.
>
> I've documented the symptoms which I'm currently facing:
>
>
>
>
> Search on field content
>
> Search on field spell
>
> Highlighting (on content field)
>
> Processing speed
>
>
> active
>
> active
>
> Active
>
> Slow
>
>
> active
>
> not active
>
> Active
>
> Fast
>
>
> active
>
> active
>
> not active
>
> Fast
>
>
> not active
>
> active
>
> Active
>
> Slow
>
>
> not active
>
> active
>
> not active
>
> Fast
>
>
>
> *Fast means 1000x faster than "slow".
>
>
>
> Field Content is our index field, which holds original text, and spell 
> is the field with stemmed value.
>
> According to my measurement result, search on both fields (stemmed and 
> not
> stemmed) is really fast.
>
> But when I start to take highlighting into our query it takes too long 
> to process.
>
>
>
> Best Regards
>
> Erwin
>
>


Mime
View raw message