lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Re: Ability to specify 2 different query analyzers for same indexed field in Solr
Date Thu, 07 Mar 2013 19:03:20 GMT
Thanks Jan,

The blog post is very good, I didn't quite realize all those various
pitfalls with synonyms.

  I would still like the ability to specify two different query analysis
chains with one index, rather than having to write a custom parser for each
use case.   For example the Traditional/Simplified Chinese use case in my
previous message could probably be solved with a custom query parser along
the lines of the synonym solution in the blog post but if there were a way
to specify two different query analysis chains for the same indexed field,
I would not have to write a custom query parser.

Tom



On Tue, Mar 5, 2013 at 5:39 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:

> Hi,
>
> Please have a look at
> http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ and a
> working plugin to Solr to deboost the expanded synonyms. The plugin code
> currently lacks ability to configure different dictionaries for each field,
> but that could be added. Also see SOLR-4381 for eventual inclusion in Solr.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 5. mars 2013 kl. 17:26 skrev Tom Burton-West <tburtonw@umich.edu>:
>
> Thanks Erick,
>
> Payloads might work but I'm looking at a more general problem
>
> Here is another use case:
>
> We have a mix of Traditional and Simplified Chinese documents indexed in
> the same OCR field.
>  When a user searches using Traditional Chinese, I would like to also
> search in Simplified Chinese, but rank the results matching Traditional
> Chinese higher.   Similarly, if a user enters a query in Simplified
> Chinese, I want to also search in Traditional Chinese but rank matches of
> the Simplified Chinese query terms higher.
>
> Since it is not always possible to determine whether a short query is in
> Simplified or Traditional Chinese here is what I would like to do.
>
> 1) Convert the query to Traditional Chinese
> 2) Convert the query to Simplified Chinese
> (One of these two steps would not be necessary if I could reliably
> determine the nature of the query)
>
> q1=QueryAsEntered^10 OR QueryTraditional^1 OR QuerySimplifed^1.
>
> Again, this could be done with copy fields, but that would increase my
> index size too much.  What I really want to be able to do is to query the
> same index (i.e. document as created ) with the user's query
> processed/analyzed in 3 different ways.
>
> I could do this myself in the app layer, but I would really like to be
> able to use Solr.
>
>
> Tom
>
>
>
> On Mon, Mar 4, 2013 at 8:19 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> Tom:
>>
>> I wonder if you could do something with payloads here. Index all terms
>> with payloads of 10, but synonyms with 1?
>>
>> Random thought off the top of my head.
>>
>> Erick
>>
>>
>>>     <analyzer type=index>
>>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>>   <filter class="solr.LowerCaseFilterFactory"/>
>>> </analyzer>
>>> <fieldType name="plain">
>>>     <analyzer type=query>
>>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>>   <filter class="solr.LowerCaseFilterFactory"/>
>>> </analyzer>
>>>
>>> <fieldType name="syn">
>>>     <analyzer type=index>
>>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>>   <filter class="solr.LowerCaseFilterFactory"/>
>>> </analyzer>
>>> <fieldType name="plain">
>>>     <analyzer type=query>
>>>    <tokenizer class="solr.StandardTokenizerFactory"/>
>>>    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="true"/>
>>>   <filter class="solr.LowerCaseFilterFactory"/>
>>> </analyzer>
>>> <copyField source="plain" dest="syn"/>
>>>
>>> On Mon, Mar 4, 2013 at 4:43 PM, Jack Krupansky <jack@basetechnology.com>wrote:
>>>
>>>>   Please clarify, and try providing a couple more use cases. I mean,
>>>> the case you provided suggests that the contents of the index will be
>>>> different between the two fields, while you told us that you wanted to
>>>> share the same indexed field. In other words, it sounds like you will have
>>>> two copies of similar data anyway.
>>>>
>>>> Maybe you simply want one copy of the stored value for the field and
>>>> then have one or more copyfields that index the same source data
>>>> differently, but don’t re-store the copied source data.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>>  *From:* Tom Burton-West <tburtonw@umich.edu>
>>>> *Sent:* Monday, March 04, 2013 3:57 PM
>>>> *To:* dev@lucene.apache.org
>>>> *Subject:* Ability to specify 2 different query analyzers for same
>>>> indexed field in Solr
>>>>
>>>> Hello,
>>>>
>>>> We would like to be able to specify two different fields that both use
>>>> the same indexed field but use different analyzers.   An example use-case
>>>> for this might be doing query-time synonym expansion with the synonyms
>>>> weighted lower than an exact match.
>>>>
>>>> q=exact_field^10 OR synonyms^1
>>>>
>>>> The normal way to do this in Solr, which is just to set up separate
>>>> analyzer chains and use a copyfield, will not work for us because the field
>>>> in question is huge.  It is about 7 TB of OCR.
>>>>
>>>> Is there a way to do this currently in Solr?   If not ,
>>>>
>>>> 1) should I open a JIRA issue?
>>>> 2) can someone point me towards the part of the code I might need to
>>>> modify?
>>>>
>>>> Tom
>>>>
>>>>  Tom Burton-West
>>>> Information Retrieval Programmer
>>>> Digital Library Production Service
>>>> University of Michigan Library
>>>> http://www.hathitrust.org/blogs/large-scale-search
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Mime
View raw message