Hi,

Please have a look at http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ and a working plugin to Solr to deboost the expanded synonyms. The plugin code currently lacks ability to configure different dictionaries for each field, but that could be added. Also see SOLR-4381 for eventual inclusion in Solr.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

5. mars 2013 kl. 17:26 skrev Tom Burton-West <tburtonw@umich.edu>:

Thanks Erick,

Payloads might work but I'm looking at a more general problem

Here is another use case:

We have a mix of Traditional and Simplified Chinese documents indexed in the same OCR field.  
 When a user searches using Traditional Chinese, I would like to also search in Simplified Chinese, but rank the results matching Traditional Chinese higher.   Similarly, if a user enters a query in Simplified Chinese, I want to also search in Traditional Chinese but rank matches of the Simplified Chinese query terms higher.

Since it is not always possible to determine whether a short query is in Simplified or Traditional Chinese here is what I would like to do.

1) Convert the query to Traditional Chinese
2) Convert the query to Simplified Chinese
(One of these two steps would not be necessary if I could reliably determine the nature of the query)

q1=QueryAsEntered^10 OR QueryTraditional^1 OR QuerySimplifed^1.

Again, this could be done with copy fields, but that would increase my index size too much.  What I really want to be able to do is to query the same index (i.e. document as created ) with the user's query processed/analyzed in 3 different ways.

I could do this myself in the app layer, but I would really like to be able to use Solr.


Tom



On Mon, Mar 4, 2013 at 8:19 PM, Erick Erickson <erickerickson@gmail.com> wrote:
Tom:

I wonder if you could do something with payloads here. Index all terms with payloads of 10, but synonyms with 1?

Random thought off the top of my head.

Erick


    <analyzer type=index>
   <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<fieldType name="plain">
    <analyzer type=query>
   <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>

<fieldType name="syn">
    <analyzer type=index>
   <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<fieldType name="plain">
    <analyzer type=query>
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
  <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<copyField source="plain" dest="syn"/>

On Mon, Mar 4, 2013 at 4:43 PM, Jack Krupansky <jack@basetechnology.com> wrote:
Please clarify, and try providing a couple more use cases. I mean, the case you provided suggests that the contents of the index will be different between the two fields, while you told us that you wanted to share the same indexed field. In other words, it sounds like you will have two copies of similar data anyway.
 
Maybe you simply want one copy of the stored value for the field and then have one or more copyfields that index the same source data differently, but don’t re-store the copied source data.

-- Jack Krupansky
 
Sent: Monday, March 04, 2013 3:57 PM
Subject: Ability to specify 2 different query analyzers for same indexed field in Solr
 
Hello,
 
We would like to be able to specify two different fields that both use the same indexed field but use different analyzers.   An example use-case for this might be doing query-time synonym expansion with the synonyms weighted lower than an exact match.  
 
q=exact_field^10 OR synonyms^1
 
The normal way to do this in Solr, which is just to set up separate analyzer chains and use a copyfield, will not work for us because the field in question is huge.  It is about 7 TB of OCR.
 
Is there a way to do this currently in Solr?   If not ,
 
1) should I open a JIRA issue?
2) can someone point me towards the part of the code I might need to modify?
 
Tom
 
Tom Burton-West
Information Retrieval Programmer
Digital Library Production Service
University of Michigan Library