lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Query on Synonyms feature in Solr
Date Mon, 13 Jun 2011 12:44:32 GMT
I think the point is that you need to expand synonyms at
index time but not at query time. In the field type definitions you
provided, the expansion happens both at index and query
time....

Or have you tried that already?

Best
Erick

On Mon, Jun 13, 2011 at 7:46 AM, rajini maski <rajinimaski@gmail.com> wrote:
> Karsten,
>
>   I have tried for both the cases you mentioned below.
>
> For "WhitespaceTokenizerFactory" that generates two tokens: "private"
> "schools" and so i don't get results as required. It will initially split
> "private schools" as "private" and "schools" and then try to match in
> synonym filter. This fails the match because my synonym flat file has list
> like this :Private schools,NGO Schools,Unaided schools
>
> So after split, it is trying to find synonym filter for "private" and not
> for "Private Schools".This fails the match
>
>
> In case of KeywordTokenizerFactory, It takes the entire content in that
> field as one key word.
> eg: document_data = "Tamil Nadu state private school fee determination
> committee headed by Justice Raviraja has submitted the private schools fees
> structure to the district educational officers on Monday"
>
> is considered as one key word. But note that  "private school" is just the
> part of that field or the part of the sentence in that field.
> And thus this will also not match our search :(
>
> Any other suggestions to fix this?
>
> Regards,
> Rajani Maski
>
>
>
> On Mon, Jun 13, 2011 at 4:54 PM, <karsten-solr@gmx.de> wrote:
>
>> Hi rajini,
>>
>> multi-word synonyms like "private schools" normally make problems.
>>
>> See e.g. Solr-1-4-Enterprise-Search-Server Page 56:
>> "For multi-word synonyms to work, the analysis must be applied at
>> index-time and with expansion so that both the original words and the
>> combined word get indexed. ..."
>>
>> Your problem:
>> The input of Synonym Filter must be the exact !Token! "Private schools".
>>
>> So "WhitespaceTokenizerFactory" generates two tokens: "private" "schools"
>> and for "KeywordTokenizerFactory" the whole text is one token.
>>
>> Beste regards
>>  Karsten
>>
>>
>>
>> -------- Original-Nachricht --------
>> > Datum: Mon, 13 Jun 2011 16:07:35 +0530
>> > Von: rajini maski <rajinimaski@gmail.com>
>> > An: solr-user@lucene.apache.org
>> > Betreff: Query on Synonyms feature in Solr
>>
>> > Synonyms feature to be enabled on documents in Solr.
>> >
>> >
>> > I have one field in solr that has the content of a document.( say field
>> > name
>> > : document_data).
>> >
>> > The data in that field is :
>> >
>> > "Tamil Nadu state private school fee determination committee headed by
>> > Justice Raviraja has submitted the private schools fees structure to the
>> > district educational officers on Monday"
>> >
>> > Synonyms for private school in synonym flat file are :
>> >
>> > Private schools,NGO Schools,Unaided schools
>> >
>> >
>> > Now when i search on this field as  document_data=unaided schools.  I
>> need
>> > to get the results.
>> >
>> > What are the token, analyser filter that i can apply  to the
>> > "document_dataFIELD" in order to get the results above
>> >
>> >
>> >
>> >
>> > This is the indexed document :
>> > <add>
>> > <doc>
>> > <field name="ID">SOLR200</field>
>> > <field name="document_data">Tamil Nadu state private school fee
>> > determination committee headed by Justice Raviraja has submitted the
>> > private
>> > schools fees structure to the district educational officers on
>> > Monday</field>
>> > </doc>
>> > </add>
>> >
>> >
>> > Right now i tried for these 2 fields type.. And i couldn't get the above
>> > results
>> >
>> >  <fieldType name="Synonym_document" class="solr.TextField"
>> > positionIncrementGap="100" >
>> >         <analyzer>
>> >   <tokenizer class="solr.KeywordTokenizerFactory"/>
>> >     <filter class="solr.SynonymFilter" synonyms="Taxonomy.txt"
>> > ignoreCase="true" expand="true"/>
>> >  <filter class="solr.LowerCaseFilterFactory"/>
>> >  <filter class="solr.SnowballPorterFilterFactory" language="English"
>> > protected="protwords.txt"/>
>> >       </analyzer>
>> >     </fieldType>
>> >
>> >
>> >  <fieldType name="Synonym_document" class="solr.TextField"
>> > positionIncrementGap="100" >
>> >         <analyzer>
>> >      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >     <filter class="solr.SynonymFilter" synonyms="Taxonomy.txt"
>> > ignoreCase="true" expand="true"/>
>> >  <filter class="solr.LowerCaseFilterFactory"/>
>> >  <filter class="solr.SnowballPorterFilterFactory" language="English"
>> > protected="protwords.txt"/>
>> >       </analyzer>
>> >     </fieldType>
>> >
>> >
>> >  <field name="document_data" type="Synonym_document" indexed="true"
>> > multiValued="true"/>
>> >
>> > Both didn't work for my query.
>> > Anyone please guide me with the token, analyser filter that i can apply
>> > to
>> > the "document_data FIELD" in order to get the results above
>> >
>> >
>> > Regards,
>> > Rajani
>>
>

Mime
View raw message