lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sven Maurmann <sven.maurm...@kippdata.de>
Subject Re: Problem with text field in Solr
Date Fri, 15 Jan 2010 12:56:00 GMT
Hi,

from a first glance on your configuration it appears that run run 
into the
following:

You use a wildcard query to query a stemmed term (aviation becomes 
aviat)
in the index. Now if you provide a wildcard query with the trailing
asterisk as the only wildcard, this wildcard query is rewritten as a
prefix query, which is not (!) stemmed.

Therefore everything seems to be fine for your first two examples (as 
avia
and aviat are both prefixes of the stemmed aviation), but the 
remaining
three queries try to match the prefixes aviati, aviatio and aviation 
against
the stemm aviat of aviation - and fail.

You may want to consult either the Lucene documentation (on the 
QueryParser
for example) of the appropriate chapters in the excellent book Lucene 
in
Action (LIA) by Hatcher and Gospodnetic.

Hope that helps.

Sven



--On Friday, January 15, 2010 04:15:40 PM +0530 deepak agrawal 
<dk.agwl@gmail.com> wrote:

> HI,
>
> I am using Solr in which I have BODY field as text.
> But when i am searching with BODY having word like *aviation*
>
> when i am Searching *BODY:avia** (aviation is coming)
> when i am Searching *BODY:aviat** (aviation is coming)
> when i am searching *BODY:aviati** (aviation is not coming)
> when i am searching *BODY:aviatio** (aviation is not coming)
> when i am searching *BODY:aviation** (aviation is not coming)
>
> Please help me how  can i search these type of world with
> (*aviati*,** aviatio*,**aviation**)
>
> Below is the detail of How we are using BODY with Text.
>
> *<field name="BODY" type="text" indexed="true" stored="true"
> multiValued="true" termVectors="true"/>*
>
> <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query
> time         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>              enablePositionIncrements=true ensures that a 'gap' is
> left to              allow for accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> --
> DEEPAK AGRAWAL
> +91-9379433455
> GOOD LUCK.....



-- 
kippdata informationstechnologie GmbH
Sven Maurmann               Tel: 0228 98549 -12
Bornheimer Str. 33a         Fax: 0228 98549 -50
D-53111 Bonn                sven.maurmann@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann

Mime
View raw message