lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parvesh Garg <parv...@zettata.com>
Subject Re: Compound words
Date Mon, 28 Oct 2013 20:41:34 GMT
Hi Roman, thanks for the link, will go through it.

Erick, will try with expand=true once and check out the results. Will
update this thread with the findings. I remember we rejected expand=true
because of some weird spaghetti problem. Will check it out again.

Thanks,

Parvesh Garg
http://www.zettata.com


On Mon, Oct 28, 2013 at 9:01 PM, Roman Chyla <roman.chyla@gmail.com> wrote:

> Hi Parvesh,
> I think you should check the following jira
> https://issues.apache.org/jira/browse/SOLR-5379. You will find there links
> to other possible solutions/problems:-)
> Roman
> On 28 Oct 2013 09:06, "Erick Erickson" <erickerickson@gmail.com> wrote:
>
> > Consider setting expand=true at index time. That
> > puts all the tokens in your index, and then you
> > may not need to have any synonym
> > processing at query time since all the variants will
> > already be in the index.
> >
> > As it is, you've replaced the words in the original with
> > synonyms, essentially collapsed them down to a single
> > word and then you have to do something at query time
> > to get matches. If all the variants are in the index, you
> > shouldn't have to. That's what I meant by "raw".
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg <parvesh@zettata.com>
> wrote:
> >
> > > Hi Erick,
> > >
> > > Thanks for the suggestion. Like I said, I'm an infant.
> > >
> > > We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit
> =>
> > > sea biscuit and didn't understand exactly how it worked. But I just
> > checked
> > > the analysis tool, and it seems to work perfectly fine at index time.
> > Now,
> > > I can happily discard my own filter and 4 days of work. I'm happy I got
> > to
> > > know a few ways on how/when not to write a solr filter :)
> > >
> > > I tried the string "sea biscuit sea bird" with expand=false and the
> > tokens
> > > i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But
> > at
> > > query time, when I enter the same term "sea biscuit sea bird", using
> > > edismax and qf, pf2, and pf3, the parsedQuery looks like this:
> > >
> > > +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit
> > sea\")
> > > (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
> > > bird\"))"
> > >
> > > What I wanted instead was this
> > >
> > > "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
> > > (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
> > >
> > > Looks like there isn't any other way than to pre-process query myself
> and
> > > create the compound word. What do you mean by "just query the raw
> > string"?
> > > Am I still missing something?
> > >
> > > Parvesh Garg
> > > http://www.zettata.com
> > > (This time I did remove my phone number :) )
> > >
> > > On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson <
> erickerickson@gmail.com
> > > >wrote:
> > >
> > > > Why did you reject using synonyms? You can have multi-word
> > > > synonyms just fine at index time, and at query time, since the
> > > > multiple words are already substituted in the index you don't
> > > > need to do the same substitution, just query the raw strings.
> > > >
> > > > I freely acknowledge you may have very good reasons for doing
> > > > this yourself, I'm just making sure you know what's already
> > > > there.
> > > >
> > > > See:
> > > >
> > > >
> > >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > > >
> > > > Look particularly at the explanations for "sea biscuit" in that
> > section.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > >
> > > > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg <parvesh@zettata.com>
> > > wrote:
> > > >
> > > > > One more thing, Is there a way to remove my "accidentally sent
> phone
> > > > number
> > > > > in the signature" from the previous mail? aarrrggghhh
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message