lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Re: Ability to specify 2 different query analyzers for same indexed field in Solr
Date Tue, 05 Mar 2013 16:26:28 GMT
Thanks Erick,

Payloads might work but I'm looking at a more general problem

Here is another use case:

We have a mix of Traditional and Simplified Chinese documents indexed in
the same OCR field.
 When a user searches using Traditional Chinese, I would like to also
search in Simplified Chinese, but rank the results matching Traditional
Chinese higher.   Similarly, if a user enters a query in Simplified
Chinese, I want to also search in Traditional Chinese but rank matches of
the Simplified Chinese query terms higher.

Since it is not always possible to determine whether a short query is in
Simplified or Traditional Chinese here is what I would like to do.

1) Convert the query to Traditional Chinese
2) Convert the query to Simplified Chinese
(One of these two steps would not be necessary if I could reliably
determine the nature of the query)

q1=QueryAsEntered^10 OR QueryTraditional^1 OR QuerySimplifed^1.

Again, this could be done with copy fields, but that would increase my
index size too much.  What I really want to be able to do is to query the
same index (i.e. document as created ) with the user's query
processed/analyzed in 3 different ways.

I could do this myself in the app layer, but I would really like to be able
to use Solr.


Tom



On Mon, Mar 4, 2013 at 8:19 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Tom:
>
> I wonder if you could do something with payloads here. Index all terms
> with payloads of 10, but synonyms with 1?
>
> Random thought off the top of my head.
>
> Erick
>
>
>>     <analyzer type=index>
>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>   <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> <fieldType name="plain">
>>     <analyzer type=query>
>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>   <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>>
>> <fieldType name="syn">
>>     <analyzer type=index>
>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>   <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> <fieldType name="plain">
>>     <analyzer type=query>
>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>   <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> <copyField source="plain" dest="syn"/>
>>
>> On Mon, Mar 4, 2013 at 4:43 PM, Jack Krupansky <jack@basetechnology.com>wrote:
>>
>>>   Please clarify, and try providing a couple more use cases. I mean,
>>> the case you provided suggests that the contents of the index will be
>>> different between the two fields, while you told us that you wanted to
>>> share the same indexed field. In other words, it sounds like you will have
>>> two copies of similar data anyway.
>>>
>>> Maybe you simply want one copy of the stored value for the field and
>>> then have one or more copyfields that index the same source data
>>> differently, but don’t re-store the copied source data.
>>>
>>> -- Jack Krupansky
>>>
>>>  *From:* Tom Burton-West <tburtonw@umich.edu>
>>> *Sent:* Monday, March 04, 2013 3:57 PM
>>> *To:* dev@lucene.apache.org
>>> *Subject:* Ability to specify 2 different query analyzers for same
>>> indexed field in Solr
>>>
>>> Hello,
>>>
>>> We would like to be able to specify two different fields that both use
>>> the same indexed field but use different analyzers.   An example use-case
>>> for this might be doing query-time synonym expansion with the synonyms
>>> weighted lower than an exact match.
>>>
>>> q=exact_field^10 OR synonyms^1
>>>
>>> The normal way to do this in Solr, which is just to set up separate
>>> analyzer chains and use a copyfield, will not work for us because the field
>>> in question is huge.  It is about 7 TB of OCR.
>>>
>>> Is there a way to do this currently in Solr?   If not ,
>>>
>>> 1) should I open a JIRA issue?
>>> 2) can someone point me towards the part of the code I might need to
>>> modify?
>>>
>>> Tom
>>>
>>>  Tom Burton-West
>>> Information Retrieval Programmer
>>> Digital Library Production Service
>>> University of Michigan Library
>>> http://www.hathitrust.org/blogs/large-scale-search
>>>
>>>
>>>
>>
>>
>

Mime
View raw message