lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elisabeth benoit <elisaelisael...@gmail.com>
Subject Re: Multi-words synonyms matching
Date Wed, 25 Apr 2012 13:39:58 GMT
I'm not at the office until next Wednesday, and I don't have my Solr under
hand, but isn't debugQuery=on giving informations only about q parameter
matching and nothing about fq parameter? Or do you mean
"parsed_filter_querie"s gives information about fq?

CATEGORY_ANALYZED is being populated by a copyField instruction in
schema.xml, and has the same field type as my catchall field, the search
field for my searchHandler (the one being used by q parameter).

CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is text)

CATEGORY (a string) is copied in catchall field (field type is text), and a
lot of other fields are copied too in that catchall field.

So as far as I can see, the same analysis should be done in both cases, but
obviously I'm missing something, and the only thing I can think of is a
different behavior between q and fq parameter.

I'll check that parsed_filter_querie first thing in the morning next
Wednesday.

Thanks a lot for your help.

Elisabeth


2012/4/24 Erick Erickson <erickerickson@gmail.com>

> Elisabeth:
>
> What shows up in the debug section of the response when you add
> &debugQuery=on? There should be some bit of that section like:
> "parsed_filter_queries"
>
> My other question is "are you absolutely sure that your
> CATEGORY_ANALYZED field has the correct content?". How does it
> get populated?
>
> Nothing jumps out at me here....
>
> Best
> Erick
>
> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
> <elisaelisaelisa@gmail.com> wrote:
> > yes, thanks, but this is NOT my question.
> >
> > I was wondering why I have multiple matches with q="hotel de ville" and
> no
> > match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case I'm
> > searching in the same solr fieldType.
> >
> > Why is q parameter behaving differently in that case? Why do the quotes
> > work in one case and not in the other?
> >
> > Does anyone know?
> >
> > Thanks,
> > Elisabeth
> >
> > 2012/4/24 Jeevanandam <jeeva@myjeeva.com>
> >
> >>
> >> usage of q and fq
> >>
> >> q => is typically the main query for the search request
> >>
> >> fq => is Filter Query; generally used to restrict the super set of
> >> documents without influencing score (more info.
> >> http://wiki.apache.org/solr/**CommonQueryParameters#q<
> http://wiki.apache.org/solr/CommonQueryParameters#q>
> >> )
> >>
> >> For example:
> >> ------------
> >> q="hotel de ville" ===> returns 100 documents
> >>
> >> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===>
> >> returns 40 documents from super set of 100 documents
> >>
> >>
> >> hope this helps!
> >>
> >> - Jeevanandam
> >>
> >>
> >>
> >> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
> >>
> >>> Hello,
> >>>
> >>> I'd like to resume this post.
> >>>
> >>> The only way I found to do not split synonyms in words in synonyms.txt
> it
> >>> to use the line
> >>>
> >>>  <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> >>> ignoreCase="true" expand="true"
> >>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
> >>>
> >>> in schema.xml
> >>>
> >>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
> >>>
> >>> instructs SynonymFilterFactory not to break synonyms into words on
> white
> >>> spaces when parsing synonyms file.
> >>>
> >>> So now it works fine, "mairie" is mapped into "hotel de ville" and
> when I
> >>> send request q="hotel de ville" (quotes are mandatory to prevent
> analyzer
> >>> to split hotel de ville on white spaces), I get answers with word
> >>> "mairie".
> >>>
> >>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
> >>> doesn't work!!!
> >>>
> >>> CATEGORY_ANALYZED is same field type as default search field. This
> means
> >>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
> >>> ville", solr uses the same analyzer, the one with the line
> >>>
> >>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> >>> ignoreCase="true" expand="true"
> >>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
> >>>
> >>> Anyone as a clue what is different between q analysis behaviour and fq
> >>> analysis behaviour?
> >>>
> >>> Thanks a lot
> >>> Elisabeth
> >>>
> >>> 2012/4/12 elisabeth benoit <elisaelisaelisa@gmail.com>
> >>>
> >>>  oh, that's right.
> >>>>
> >>>> thanks a lot,
> >>>> Elisabeth
> >>>>
> >>>>
> >>>> 2012/4/11 Jeevanandam Madanagopal <jeeva@myjeeva.com>
> >>>>
> >>>>  Elisabeth -
> >>>>>
> >>>>> As you described, below mapping might suit for your need.
> >>>>> mairie => hotel de ville, mairie
> >>>>>
> >>>>> mairie gets expanded to "hotel de ville" and "mairie" at index time.
>  So
> >>>>> "mairie" and "hotel de ville" searchable on document.
> >>>>>
> >>>>> However, still white space tokenizer splits at query time will be
a
> >>>>> problem as described by Markus.
> >>>>>
> >>>>> --Jeevanandam
> >>>>>
> >>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
> >>>>>
> >>>>> > <<Have you tried the "=>' mapping instead? Something
> >>>>> > <<like
> >>>>> > <<hotel de ville => mairie
> >>>>> > <<might work for you.
> >>>>> >
> >>>>> > Yes, thanks, I've tried it but from what I undestand it doesn't
> solve
> >>>>> my
> >>>>> > problem, since this means hotel de ville will be replace by
mairie
> at
> >>>>> > index time (I use synonyms only at index time). So when user
will
> ask
> >>>>> > "hôtel de ville", it won't match.
> >>>>> >
> >>>>> > In fact, at index time I have mairie in my data, but I want
user
> to be
> >>>>> able
> >>>>> > to request "mairie" or "hôtel de ville" and have mairie as
answer,
> and
> >>>>> not
> >>>>> > have mairie as an answer when requesting "hôtel".
> >>>>> >
> >>>>> >
> >>>>> > <<To map `mairie` to `hotel de ville` as single token
you must
> escape
> >>>>> your
> >>>>> > white
> >>>>> > <<space.
> >>>>> >
> >>>>> > <<mairie, hotel\ de\ ville
> >>>>> >
> >>>>> > <<This results in  a problem if your tokenizer splits
on white
> space
> >>>>> at
> >>>>> > query
> >>>>> > <<time.
> >>>>> >
> >>>>> > Ok, I guess this means I have a problem. No simple solution
since
> at
> >>>>> query
> >>>>> > time my tokenizer do split on white spaces.
> >>>>> >
> >>>>> > I guess my problem is more or less one of the problems discussed
in
> >>>>> >
> >>>>> >
> >>>>>
> >>>>> http://lucene.472066.n3.**nabble.com/Multi-word-**
> >>>>> synonyms-td3716292.html#**a3717215<
> http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
> >
> >>>>> >
> >>>>> >
> >>>>> > Thanks a lot for your answers,
> >>>>> > Elisabeth
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > 2012/4/10 Erick Erickson <erickerickson@gmail.com>
> >>>>> >
> >>>>> >> Have you tried the "=>' mapping instead? Something
> >>>>> >> like
> >>>>> >> hotel de ville => mairie
> >>>>> >> might work for you.
> >>>>> >>
> >>>>> >> Best
> >>>>> >> Erick
> >>>>> >>
> >>>>> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
> >>>>> >> <elisaelisaelisa@gmail.com> wrote:
> >>>>> >>> Hello,
> >>>>> >>>
> >>>>> >>> I've read several post on this issue, but can't find
a real
> solution
> >>>>> to
> >>>>> >> my
> >>>>> >>> multi-words synonyms matching problem.
> >>>>> >>>
> >>>>> >>> I have in my synonyms.txt an entry like
> >>>>> >>>
> >>>>> >>> mairie, hotel de ville
> >>>>> >>>
> >>>>> >>> and my index time analyzer is configured as followed
for
> synonyms.
> >>>>> >>>
> >>>>> >>> <filter class="solr.**SynonymFilterFactory"
> synonyms="synonyms.txt"
> >>>>> >>> ignoreCase="true" expand="true"/>
> >>>>> >>>
> >>>>> >>> The problem I have is that now "mairie" matches with
"hotel" and
> I
> >>>>> would
> >>>>> >>> only want "mairie" to match with "hotel de ville" and
"mairie".
> >>>>> >>>
> >>>>> >>> When I look into the analyzer, I see that "mairie"
is mapped into
> >>>>> >> "hotel",
> >>>>> >>> and words "de ville" are added in second and third
position. To
> >>>>> change
> >>>>> >>> that, I tried to do
> >>>>> >>>
> >>>>> >>> <filter class="solr.**SynonymFilterFactory"
> synonyms="synonyms.txt"
> >>>>> >>> ignoreCase="true" expand="true"
> >>>>> >>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
(as I read in
> >>>>> one
> >>>>> post)
> >>>>> >>>
> >>>>> >>> and I can see now in the analyzer that "mairie" is
mapped to
> "hotel
> >>>>> de
> >>>>> >>> ville", but now when I have query "hotel de ville",
it doesn't
> match
> >>>>> at
> >>>>> >> all
> >>>>> >>> with "mairie".
> >>>>> >>>
> >>>>> >>> Anyone has a clue of what I'm doing wrong?
> >>>>> >>>
> >>>>> >>> I'm using Solr 3.4.
> >>>>> >>>
> >>>>> >>> Thanks,
> >>>>> >>> Elisabeth
> >>>>> >>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message