lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From openvictor Open <openvic...@gmail.com>
Subject Re: Terms and termscomponent questions
Date Tue, 01 Feb 2011 15:07:54 GMT
Dear Erick,

Thank you for your answer, here is my fieldtype definition. I took the
standard one because I don't need a better one for this field

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
</fieldType>

Now my field :

<field name="p_field" type="text" indexed="true" stored="true"/>

But I have a doubt now... Do I really put a space between words or is it
just a coma... If I only put a coma then the whole process is going to be
impacted ? What I don't really understand is that I find the separate words,
but also their concatenation (but again in one direction only). Let me
explain : if a have "man" "bear" "pig" I will find :
"manbearpig" "bearpig" but never pigman or anyother combination in a
different order.

Thank you very much
Best Regards,
Victor

2011/2/1 Erick Erickson <erickerickson@gmail.com>

> Nope, this isn't what I'd expect. There are a couple of possibilities:
> 1> check out what WordDelimiterFilterFactory is doing, although
>     if you're really sending spaces that's probably not it.
> 2> Let's see the <field> and <fieldType> definitions for the field
>     in question. type="text" doesn't say anything about analysis,
>     and that's where I'd expect you're having trouble. In particular
>     if your analysis chain uses KeywordTokenizerFactory for instance.
> 3> Look at the admin/schema browse page, look at your field and
>     see what the actual tokens are. That'll tell you what TermsComponents
>     is returning, perhaps the concatenation is happening somewhere
>     else.
>
> Bottom line: Solr will not concatenate terms like this unless you tell it
> to,
> so I suspect you're telling it to, you just don't realize it <G>...
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open <openvictor@gmail.com
> >wrote:
>
> > Dear Solr users,
> >
> > I am currently using SolR and TermsComponents to make an auto suggest for
> > my
> > website.
> >
> > I have a field called p_field indexed and stored with type="text" in the
> > schema xml. Nothing out of the usual.
> > I feed to Solr a set of words separated by a coma and a space such as
> (for
> > two documents) :
> >
> > Document 1:
> > word11, word12, word13. word14
> >
> > Document 2:
> > word21, word22, word23. word24
> >
> >
> > When I use my newly designed field I get things for the prefix "word1" :
> > word11, word12, word13. word14 word11word12 word11word13 etc...
> > Is it normal to have the concatenation of words and not only the words
> > indexed ? Did I miss something about Terms ?
> >
> > Thank you very much,
> > Best regards all,
> > Victor
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message