lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Folkman <nathan.folk...@gmail.com>
Subject Tokenizer Question
Date Wed, 04 Feb 2009 22:41:37 GMT
I'm having trouble getting the following queries to work as I'd expect:

tag_calais:"company" -> should match: company:IBM Business Partners
tag_calais:"products" -> should match: industryterm:business products,  
industryterm:Industrial products, industryterm:Consumer products
domain:"com.*"
domain:"com.ibm*"

I thought it might have something to do with how the indexed data was  
getting tokenized?

schema.xml:

<types>
     <fieldType name="calais" class="solr.StrField">
         <analyzer>
         <tokenizer class="solr.PatternTokenizerFactory" pattern=": *"  
group="-1" />
     </analyzer>
     </fieldType>
     <fieldType name="domain" class="solr.StrField">
     <analyzer>
         <tokenizer class="solr.PatternTokenizerFactory" pattern=". *"  
group="-1" />
     </analyzer>
     </fieldType>
     ...
</types>
<fields>
     <field name="domain" type="domain" indexed="true" stored="true"  
required="true" />
     <field name="tag_calais" type="calais" indexed="true"  
stored="true" multiValued="true" />
	...
</fields>

Example document:

<?xml version="1.0" ?>
<add>
   <doc>
     <field name="domain">
       com.ibm
     </field>
     <field name="tag_calais">
       industryterm:business products
     </field>
     <field name="tag_calais">
       industryterm:Industrial products
     </field>
     <field name="tag_calais">
       industryterm:Consumer products
     </field>
     <field name="tag_calais">
       country:United States
     </field>
     <field name="tag_calais">
       company:IBM Business Partners
     </field>
     ...
   </doc>
</add>

Any suggestions? Thanks!

- n

Mime
View raw message