lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lasitha Wattaladeniya <watt...@gmail.com>
Subject Re: Stemming with SOLR
Date Mon, 19 Dec 2016 01:54:47 GMT
Thank you all for the replies.  I am considering the suggestions

On 17 Dec 2016 01:50, "Susheel Kumar" <susheel2777@gmail.com> wrote:

> To handle irregular nouns (
> http://www.ef.com/english-resources/english-grammar/
> singular-and-plural-nouns/),
> the simplest way is handle them using StemOverriderFactory.  The list is
> not so long. Or otherwise go for commercial solutions like basistech etc.
> as Alex suggested  oR you can customize Hunspell extensively to handle most
> of them.
>
> Thanks,
> Susheel
>
> On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch <arafalov@gmail.com
> >
> wrote:
>
> > If you need the full fidelity solution taking care of multiple
> > edge-cases, it could be worth looking at commercial solutions.
> >
> >
> > http://www.basistech.com/ has one, including a free-level SAAS plan.
> >
> > Regards,
> >    Alex.
> > ----
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 15 December 2016 at 21:28, Lasitha Wattaladeniya <wattale@gmail.com>
> > wrote:
> > > Hi all,
> > >
> > > Thanks for the replies,
> > >
> > > @eric, ahmet : since those stemmers are logical stemmers it won't work
> on
> > > words such as caught, ran and so on. So in our case it won't work
> > >
> > > @susheel : Yes I thought about it but problems we have is, the
> documents
> > we
> > > index are some what large text, so copy fielding these into duplicate
> > > fields will affect on the index time ( we have jobs to index data
> > > periodically) and query time. I wonder why there isn't a correct
> solution
> > > to this
> > >
> > > Regards,
> > > Lasitha
> > >
> > > Lasitha Wattaladeniya
> > > Software Engineer
> > >
> > > Mobile : +6593896893
> > > Blog : techreadme.blogspot.com
> > >
> > > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar <susheel2777@gmail.com
> >
> > > wrote:
> > >
> > >> We did extensive comparison in the past for Snowball, KStem and
> Hunspell
> > >> and there are cases where one of them works better but not other or
> > >> vice-versa. You may utilise all three of them by having 3 different
> > fields
> > >> (fieldTypes) and during query, search in all of them.
> > >>
> > >> For some of the cases where none of them works (e.g wolves, wolf
> etc).,
> > use
> > >> StemOverriderFactory.
> > >>
> > >> HTH.
> > >>
> > >> Thanks,
> > >> Susheel
> > >>
> > >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan
> > <iorixxx@yahoo.com.invalid>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > KStemFilter returns legitimate English words, please use it.
> > >> >
> > >> > Ahmet
> > >> >
> > >> >
> > >> >
> > >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> > >> > wattale@gmail.com> wrote:
> > >> > Hello devs,
> > >> >
> > >> > I'm trying to develop this indexing and querying flow where it
> > converts
> > >> the
> > >> > words to its original form (lemmatization). I was doing bit of
> > research
> > >> > lately but the information on the internet is very limited. I tried
> > using
> > >> > hunspellfactory but it doesn't convert the word to it's original
> form,
> > >> > instead it gives suggestions for some words (hunspell works for some
> > >> > english words correctly but for some it gives multiple suggestions
> or
> > no
> > >> > suggestions, i used the en_us.dic provided by openoffice)
> > >> >
> > >> > I know this is a generic problem in searching, so is there anyone
> who
> > can
> > >> > point me to correct direction or some information :)
> > >> >
> > >> > Best regards,
> > >> > Lasitha Wattaladeniya
> > >> > Software Engineer
> > >> >
> > >> > Mobile : +6593896893
> > >> > Blog : techreadme.blogspot.com
> > >> >
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message