lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Bickerstaff <j...@johnbickerstaff.com>
Subject Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
Date Wed, 01 Jun 2016 18:48:01 GMT
Thanks Jeff.  I've installed "out of the box" with 5.4 and didn't make any
modifications on Ubuntu - so I'm not sure why it wouldn't get picked up,
but I'll keep chipping away at it...

I appreciate the new one to try.  That's a good test.

On Wed, Jun 1, 2016 at 12:45 PM, Jeff Wartes <jwartes@whitepages.com> wrote:

> In the interests of the specific questions to me:
>
> I’m using 5.4, solrcloud.
> I’ve never used the blob store thing, didn’t even know it existed before
> this thread.
>
> I’m uncertain how not finding the class could be specific to hon, it
> really feels like a general solr config issue, but you could try some other
> foreign jar and see if that works.
> Here’s one I use: https://github.com/whitepages/SOLR-4449 (although this
> one is also why I use WEB-INF/lib, because it overrides a protected method,
> so it might not be the greatest example)
>
>
> On 5/31/16, 4:02 PM, "John Bickerstaff" <john@johnbickerstaff.com> wrote:
>
> >Thanks Jeff,
> >
> >I believe I tried that, and it still refused to load..  But I'd sure love
> >it to work since the other process is a bit convoluted - although I see
> >it's value in a large Solr installation.
> >
> >When I "locate" the jar on the linux command line I get:
> >
>
> >/opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar
> >
> >But the log file is still carrying class not found exceptions when I
> >restart...
> >
> >Are you in "Cloud" mode?  What version of Solr are you using?
> >
> >On Tue, May 31, 2016 at 4:08 PM, Jeff Wartes <jwartes@whitepages.com>
> wrote:
> >
> >> I’ve generally been dropping foreign plugin jars in this dir:
> >> server/solr-webapp/webapp/WEB-INF/lib/
> >> This is because it then gets loaded by the same classloader as Solr
> >> itself, which can be useful if you’re, say, overriding some
> >> solr-protected-space method.
> >>
> >> If you don’t care about the classloader, I believe you can use whatever
> >> dir you want, with the appropriate bit of solrconfig.xml to load it.
> >> Something like:
> >> <lib regex=".*\.jar" dir="${solr.install.dir}/dist"/>
> >>
> >>
> >> On 5/31/16, 2:13 PM, "John Bickerstaff" <john@johnbickerstaff.com>
> wrote:
> >>
> >> >All --
> >> >
> >> >I'm now attempting to use the hon_lucene_synonyms project from github.
> >> >
> >> >I found the documents that were infered by the dead links on the
> readme in
> >> >the repository -- however, given that I'm using Solr 5.4.x, I no longer
> >> >have the need to integrate into a war file (as far as I can see).
> >> >
> >> >The suggestion on the readme is that I can drop the hon_lucene_synonyms
> >> jar
> >> >file into the $SOLR_HOME directory, but this does not seem to be
> working -
> >> >I'm getting class not found exceptions.
> >> >
> >> >Does anyone on this list have direct experience with getting this
> plugin
> >> to
> >> >work in Solr 5.x?
> >> >
> >> >Thanks in advance...
> >> >
> >> >On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsminkey@gmail.com>
> >> wrote:
> >> >
> >> >> It's been awhile since I installed it so I really can't say. I'm more
> >> of a
> >> >> code monkey than a server gal (particularly Linux... I'm amazed I got
> >> Solr
> >> >> installed in the first place, LOL!) So I had asked our network guy
to
> >> look
> >> >> it over recently and see if it looked like I did it okay. He said
> since
> >> it
> >> >> shows up in the list of jars in the Solr admin that it's
> installed....
> >> if
> >> >> that's not necessarily true, I probably need to point him in the
> right
> >> >> direction for what else to do since he really doesn't know Solr well
> >> >> either.
> >> >>
> >> >> Mary Jo
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
> >> >> john@johnbickerstaff.com>
> >> >> wrote:
> >> >>
> >> >> > Thanks for the comment Mary Jo...
> >> >> >
> >> >> > The error loading the class rings a bell - did you find and follow
> >> >> > instructions for adding that to the WAR file?  I vaguely remember
> >> seeing
> >> >> > something about that.
> >> >> >
> >> >> > I'm going to try my own tests on the auto phrasing one..  If I'm
> >> >> > successful, I'll post back.
> >> >> >
> >> >> > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <
> mjsminkey@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > This is a very timely discussion for me as well as we're
trying
> to
> >> >> tackle
> >> >> > > the multi term synonym issue as well and have not been able
to
> >> >> hon-lucene
> >> >> > > plugin to work, the jar shows up as installed but when we
set up
> the
> >> >> > sample
> >> >> > > request handler it throws this error:
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> >> >> > > Error loading class
> >> >> > >
> >> >> >
> >> >>
> >>
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
> >> >> > >
> >> >> > > I have tried the auto-phrasing one as well (I did set up
a field
> >> using
> >> >> > copy
> >> >> > > to configure it on) but when testing it didn't seem to return
the
> >> >> > synonyms
> >> >> > > as expected. So gave up on that one too (am willing to give
it
> >> another
> >> >> > try
> >> >> > > though, that was awhile ago). Would definitely like to hear
what
> >> other
> >> >> > > people have found works on the latest versions of Solr 5.x
> and/or 6.
> >> >> Just
> >> >> > > sucks that this issue has never been fixed in the core product
> such
> >> >> that
> >> >> > > you still need to mess with plugins and patches to get such
a
> basic
> >> >> > > functionality working properly.
> >> >> > >
> >> >> > >
> >> >> > > *Mary Jo Sminkey*
> >> >> > > *Senior ColdFusion Developer*
> >> >> > >
> >> >> > > *CF Webtools*
> >> >> > > You Dream It... We Build It. <https://www.cfwebtools.com/>
> >> >> > > 11204 Davenport Suite 100
> >> >> > > Omaha, Nebraska 68154
> >> >> > > O: 402.408.3733 x128
> >> >> > > E:  maryjo.sminkey@cfwebtools.com
> >> >> > > Skype: maryjos.cfwebtools
> >> >> > >
> >> >> > >
> >> >> > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> >> >> > > john@johnbickerstaff.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > > So I'm looking at the solution mentioned here:
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >> >> > > >
> >> >> > > > The thing that's troubling me slightly is that the way
it's
> >> >> documented
> >> >> > it
> >> >> > > > seems to be missing a small but important link...
> >> >> > > >
> >> >> > > > What exactly causes the results listed to be returned?
> >> >> > > >
> >> >> > > > Here's my thought process:
> >> >> > > >
> >> >> > > > 1. The entry for /autophrase searchHandler does not
specify a
> >> default
> >> >> > > > search field.
> >> >> > > > 2. The field type "text_autophrase" is set up as the
one with
> the
> >> >> > > > AutoPhrasingFilterFactory as part of it's indexing
> >> >> > > >
> >> >> > > > There isn't any mention (perhaps because it's too obvious)
of
> the
> >> >> need
> >> >> > to
> >> >> > > > copy or otherwise get data into the "text_autophrase"
field at
> >> index
> >> >> > > time.
> >> >> > > >
> >> >> > > > There isn't any explicit listing of "text_autophrase"
as the
> >> default
> >> >> > > search
> >> >> > > > field in the /autophrase search handler
> >> >> > > >
> >> >> > > > There isn't any explicit statement of "df=text_autophrase"
in
> the
> >> >> query
> >> >> > > > statment: [/autophrase?q=New+York]
> >> >> > > >
> >> >> > > > Therefore it seems to me that if someone tries to implement
> this,
> >> >> > they're
> >> >> > > > going to be disappointed in the results unless they:
> >> >> > > > a. copy or otherwise get ALL the text they're interested
in --
> >> into
> >> >> the
> >> >> > > > "text_autophrase" field as part of the schema.xml setup
(to
> >> happen at
> >> >> > > index
> >> >> > > > time)
> >> >> > > > b. somehow explicitly declare "text_autophrase" as the
default
> >> search
> >> >> > > field
> >> >> > > > - either in the searchHandler or wherever else the default
> field
> >> is
> >> >> > > > configured.
> >> >> > > >
> >> >> > > > If anyone out there has done this specific approach
- could you
> >> >> > validate
> >> >> > > > whether my thought process is correct and / or if I'm
missing
> >> >> > something?
> >> >> > > > Yes - I get that I can set it all up and try - but it's
what I
> >> don't
> >> >> > > know I
> >> >> > > > don't know that bothers me...
> >> >> > > >
> >> >> > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> >> >> > > > john@johnbickerstaff.com
> >> >> > > > > wrote:
> >> >> > > >
> >> >> > > > > Thank you Steve -- very helpful.
> >> >> > > > >
> >> >> > > > > I can see that whatever implementation I decide
to try, some
> >> >> testing
> >> >> > > will
> >> >> > > > > be in order.  If anyone is aware of significant
gotchas with
> >> this
> >> >> > > synonym
> >> >> > > > > thing that are not mentioned in the already-listed
URLs,
> please
> >> >> feel
> >> >> > > free
> >> >> > > > > to comment.
> >> >> > > > >
> >> >> > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <
> sarowe@gmail.com>
> >> >> > wrote:
> >> >> > > > >
> >> >> > > > >> I’m working on addressing problems using
multi-term
> synonyms at
> >> >> > query
> >> >> > > > >> time in Lucene and Solr.
> >> >> > > > >>
> >> >> > > > >> I recommend these two blogs for understanding
the issues
> (the
> >> >> second
> >> >> > > one
> >> >> > > > >> was mentioned earlier in this thread):
> >> >> > > > >>
> >> >> > > > >> <
> >> >> > > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >> >> > > > >> >
> >> >> > > > >> <
> >> >> >
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> >> >> > > > >>
> >> >> > > > >> In addition to the already-mentioned projects,
there is
> also:
> >> >> > > > >>
> >> >> > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
> >> >> > > > >>
> >> >> > > > >> All of these projects try in various ways to
work around the
> >> fact
> >> >> > that
> >> >> > > > >> Lucene’s QueryParser splits on whitespace
before sending
> text
> >> to
> >> >> > > > analysis,
> >> >> > > > >> one token at a time, so in a synonym filter,
multi-word
> >> synonyms
> >> >> can
> >> >> > > > never
> >> >> > > > >> match and add alternatives.  See <
> >> >> > > > >> https://issues.apache.org/jira/browse/LUCENE-2605>,
where
> I’ve
> >> >> > > posted a
> >> >> > > > >> patch to directly address that problem - note
that it’s
> still a
> >> >> work
> >> >> > > in
> >> >> > > > >> progress.
> >> >> > > > >>
> >> >> > > > >> Once LUCENE-2605 has been fixed, there is still
work to do
> >> getting
> >> >> > > > >> (e)dismax to work with the modified Lucene
QueryParser, and
> >> >> > addressing
> >> >> > > > >> problems with how queries are constructed from
Lucene’s
> >> >> “sausagized”
> >> >> > > > token
> >> >> > > > >> stream.
> >> >> > > > >>
> >> >> > > > >> --
> >> >> > > > >> Steve
> >> >> > > > >> www.lucidworks.com
> >> >> > > > >>
> >> >> > > > >> > On May 26, 2016, at 2:21 PM, John Bickerstaff
<
> >> >> > > > john@johnbickerstaff.com>
> >> >> > > > >> wrote:
> >> >> > > > >> >
> >> >> > > > >> > Thanks Chris --
> >> >> > > > >> >
> >> >> > > > >> > The two projects I'm aware of are:
> >> >> > > > >> >
> >> >> > > > >> > https://github.com/healthonnet/hon-lucene-synonyms
> >> >> > > > >> >
> >> >> > > > >> > and the one referenced from the Lucidworks
page here:
> >> >> > > > >> >
> >> >> > > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >> >> > > > >> >
> >> >> > > > >> > ... which is here :
> >> >> > > > >> https://github.com/LucidWorks/auto-phrase-tokenfilter
> >> >> > > > >> >
> >> >> > > > >> > Is there anything else out there that
you would recommend
> I
> >> look
> >> >> > at?
> >> >> > > > >> >
> >> >> > > > >> > On Thu, May 26, 2016 at 12:01 PM, Chris
Morley <
> >> >> > chris@depahelix.com
> >> >> > > >
> >> >> > > > >> wrote:
> >> >> > > > >> >
> >> >> > > > >> >> Chris Morley here, from Wayfair. 
(Depahelix = my domain)
> >> >> > > > >> >>
> >> >> > > > >> >> Suyash Sonawane and I have worked
on multiple word
> synonyms
> >> at
> >> >> > > > Wayfair.
> >> >> > > > >> >> We worked mostly off of Ted Sullivan's
work and also off
> of
> >> >> some
> >> >> > > > >> >> suggestions from Koorosh Vakhshoori.
 We have gotten to a
> >> point
> >> >> > > where
> >> >> > > > >> we
> >> >> > > > >> >> have a more sophisticated internal
implementation,
> however,
> >> >> we've
> >> >> > > > found
> >> >> > > > >> >> that it is very difficult to make
it do what you want it
> to
> >> do,
> >> >> > and
> >> >> > > > >> also be
> >> >> > > > >> >> sufficiently performant.  Watch out
for exceptional
> >> situations
> >> >> > with
> >> >> > > > mm
> >> >> > > > >> >> (minimum should match).
> >> >> > > > >> >>
> >> >> > > > >> >> Trey Grainger (now at Lucidworks)
and Simon Hughes of
> >> Dice.com
> >> >> > have
> >> >> > > > >> also
> >> >> > > > >> >> done work in this area.
> >> >> > > > >> >>
> >> >> > > > >> >> It should be very possible to get
this kind of thing
> >> working on
> >> >> > > > >> >> SolrCloud.  I haven't tried it yet
but I think
> >> theoretically,
> >> >> it
> >> >> > > > should
> >> >> > > > >> >> just work.  The synonyms stuff is
mostly about doing
> things
> >> at
> >> >> > > index
> >> >> > > > >> time
> >> >> > > > >> >> and query time.  The index time stuff
should translate to
> >> >> > SolrCloud
> >> >> > > > >> >> directly, while the query time stuff
might pose some
> issues,
> >> >> but
> >> >> > > > >> probably
> >> >> > > > >> >> not too bad, if there are any issues
at all.
> >> >> > > > >> >>
> >> >> > > > >> >> I've had decent luck porting our various
plugins from
> >> 4.10.x to
> >> >> > > 5.5.0
> >> >> > > > >> >> because a lot of stuff is just Java,
and it still works
> >> within
> >> >> > the
> >> >> > > > >> Jetty
> >> >> > > > >> >> context.
> >> >> > > > >> >>
> >> >> > > > >> >> -Chris.
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >> >> ----------------------------------------
> >> >> > > > >> >> From: "John Bickerstaff" <john@johnbickerstaff.com>
> >> >> > > > >> >> Sent: Thursday, May 26, 2016 1:51
PM
> >> >> > > > >> >> To: solr-user@lucene.apache.org
> >> >> > > > >> >> Subject: Re: Solr Cloud and Multi-word
Synonyms ::
> >> >> > synonym_edismax
> >> >> > > > >> parser
> >> >> > > > >> >> Hey Jeff (or anyone interested in
multi-word synonyms)
> here
> >> are
> >> >> > > some
> >> >> > > > >> >> potentially interesting links...
> >> >> > > > >> >>
> >> >> > > > >> >> http://wiki.apache.org/solr/QueryParser
(search the page
> >> for
> >> >> > > > >> >> synonum_edismax)
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > >
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> >> >> > > > >> (blog
> >> >> > > > >> >> post about what became the synonym_edissmax
Query Parser)
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >>
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >> >> > > > >> >>
> >> >> > > > >> >> This last was useful for lots of reasons
and contains
> links
> >> to
> >> >> > > other
> >> >> > > > >> >> interesting, related web pages...
> >> >> > > > >> >>
> >> >> > > > >> >> On Thu, May 26, 2016 at 11:45 AM,
Jeff Wartes <
> >> >> > > > jwartes@whitepages.com>
> >> >> > > > >> >> wrote:
> >> >> > > > >> >>
> >> >> > > > >> >>> Oh, interesting. I've certainty
encountered issues with
> >> >> > multi-word
> >> >> > > > >> >>> synonyms, but I hadn't come across
this. If you end up
> >> using
> >> >> it
> >> >> > > > with a
> >> >> > > > >> >>> recent solr verison, I'd be glad
to hear your
> experience.
> >> >> > > > >> >>>
> >> >> > > > >> >>> I haven't used it, but I am aware
of one other project
> in
> >> this
> >> >> > > vein
> >> >> > > > >> that
> >> >> > > > >> >>> you might be interested in looking
at:
> >> >> > > > >> >>> https://github.com/LucidWorks/auto-phrase-tokenfilter
> >> >> > > > >> >>>
> >> >> > > > >> >>>
> >> >> > > > >> >>> On 5/26/16, 9:29 AM, "John Bickerstaff"
<
> >> >> > john@johnbickerstaff.com
> >> >> > > >
> >> >> > > > >> >> wrote:
> >> >> > > > >> >>>
> >> >> > > > >> >>>> Ahh - for question #3 I may
have spoken too soon. This
> >> line
> >> >> > from
> >> >> > > > the
> >> >> > > > >> >>>> github repository readme suggests
a way.
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> Update: We have tested to
run with the jar in
> >> $SOLR_HOME/lib
> >> >> as
> >> >> > > > well,
> >> >> > > > >> >> and
> >> >> > > > >> >>>> it works (Jetty).
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> I'll try that and only respond
back if that doesn't
> work.
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> Questions 1 and 2 still stand
of course... If anyone on
> >> the
> >> >> > list
> >> >> > > > has
> >> >> > > > >> >>>> experience in this area...
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> Thanks.
> >> >> > > > >> >>>>
> >> >> > > > >> >>>> On Thu, May 26, 2016 at 10:25
AM, John Bickerstaff <
> >> >> > > > >> >>> john@johnbickerstaff.com
> >> >> > > > >> >>>>> wrote:
> >> >> > > > >> >>>>
> >> >> > > > >> >>>>> Hi all,
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> I'm creating a Solr Cloud
that will index and search
> >> medical
> >> >> > > text.
> >> >> > > > >> >>>>> Multi-word synonyms are
a pretty important factor.
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> I find that there are
some challenges around
> multi-word
> >> >> > synonyms
> >> >> > > > >> and I
> >> >> > > > >> >>>>> also found on the wiki
that there is a recommended
> >> 3rd-party
> >> >> > > > parser
> >> >> > > > >> >>>>> (synonym_edismax parser)
created by Nolan Lawson and
> >> found
> >> >> > here:
> >> >> > > > >> >>>>> https://github.com/healthonnet/hon-lucene-synonyms
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> Here's the thing - the
instructions on the github site
> >> >> involve
> >> >> > > > >> >> bringing
> >> >> > > > >> >>>>> the jar file into the
war file - which is not
> applicable
> >> any
> >> >> > > > more...
> >> >> > > > >> >> at
> >> >> > > > >> >>>>> least I think it's not...
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> I have three questions:
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> 1. Is this still a good
solution for multi-word
> synonyms
> >> >> (I.e.
> >> >> > > > Solr
> >> >> > > > >> >>> Cloud
> >> >> > > > >> >>>>> doesn't break it in some
way)
> >> >> > > > >> >>>>> 2. Is there a tool or
plug-in out there that the
> >> >> contributors
> >> >> > > > would
> >> >> > > > >> >>>>> recommend above this one?
> >> >> > > > >> >>>>> 3. Assuming 1 = yes and
2 = no, can anyone tell me an
> >> >> updated
> >> >> > > > >> >> procedure
> >> >> > > > >> >>>>> for bringing it in to
Solr Cloud (I'm running 5.4.x)
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>>> Thanks
> >> >> > > > >> >>>>>
> >> >> > > > >> >>>
> >> >> > > > >> >>>
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >> >>
> >> >> > > > >>
> >> >> > > > >>
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message