lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Oschler <robert.osch...@gmail.com>
Subject Re: Fastest way to import a giant word list into Solr/Lucene?
Date Sat, 31 Oct 2015 03:26:00 GMT
Thanks Walter.   I believe I have what I need now.  Have a great weekend.

On Fri, Oct 30, 2015 at 11:13 PM, Walter Underwood <wunder@wunderwood.org>
wrote:

> Read the links I have sent.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 30, 2015, at 7:10 PM, Robert Oschler <robert.oschler@gmail.com>
> wrote:
> >
> > Thanks Walter.  Are there any open source spell checkers that implement
> the
> > Peter Norvig or Damerau-Levenshtein algorithms?  I'm short on time so I
> > have to keep the custom coding down to a minimum.
> >
> >
> > On Fri, Oct 30, 2015 at 8:02 PM, Walter Underwood <wunder@wunderwood.org
> >
> > wrote:
> >
> >> Dedicated spell-checkers have better algorithms than Solr. They usually
> >> handle transposed characters as well as inserted, deleted, or
> substituted
> >> characters. This is an enhanced version of Levinshtein distance. It is
> >> called Damerau-Levenshtein and is too expensive to use in Solr search.
> >> Spell correctors can also use a bigger distance than 2, unlike Solr.
> >>
> >> The Peter Norvig corrector also handles words that have been run
> together.
> >> The Norvig corrector has been translated to many different computer
> >> languages.
> >>
> >> The Norvig corrector is an interesting approach. It is well worth
> reading
> >> this short article to learn more about spelling correction.
> >>
> >> http://norvig.com/spell-correct.html <
> http://norvig.com/spell-correct.html
> >>>
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.oschler@gmail.com>
> >> wrote:
> >>>
> >>> Hello Walter and Mikhail,
> >>>
> >>> Thank you for your answers.  Do those spell checkers have the same or
> >>> better fuzzy matching capability that SOLR/Lucene has (Lichtenstein,
> max
> >>> distance 2)?  That's a critical requirement for my application.  I take
> >> it
> >>> by your suggestion of these spell checker apps they can easily be
> >> extended
> >>> with a user defined, supplementary dictionary, yes?
> >>>
> >>> Thanks.
> >>>
> >>> On Fri, Oct 30, 2015 at 3:07 PM, Mikhail Khludnev <
> >>> mkhludnev@griddynamics.com> wrote:
> >>>
> >>>> Perhaps
> >>>> FileBasedSpellChecker
> >>>> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
> >>>>
> >>>> On Fri, Oct 30, 2015 at 9:37 PM, Robert Oschler <
> >> robert.oschler@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> I have a gigantic list of industry terms that I want to import into
a
> >>>>> Solr/Lucene instance running on an AWS box.  What is the fastest
way
> to
> >>>>> import the list into my Solr/Lucene instance?  I have admin/sudo
> >>>> privileges
> >>>>> on the box.
> >>>>>
> >>>>> Also, is there a document that shows me how to set up my Solr/Lucene
> >>>> config
> >>>>> file to be optimized for fast searches on single word entries using
> >> fuzzy
> >>>>> search?  I intend to use this Solr/Lucene instance to do spell
> checking
> >>>> on
> >>>>> the big industry word list I mentioned above.  Each data record
will
> >> be a
> >>>>> single word from the file.  I'll want to take a single word query
and
> >> do
> >>>> a
> >>>>> fuzzy search on the word against the index (Lichtenstein, max
> distance
> >> 2
> >>>> as
> >>>>> per Solr/Lucene's fuzzy search feature).  So what parameters will
> >>>> configure
> >>>>> Solr/Lucene to be optimized for such a search?  Also, if a document
> >> shows
> >>>>> the best index/read parameters to support single word fuzzy searching
> >>>> then
> >>>>> that would be a big help too.  Note, the contents of the index will
> >>>> change
> >>>>> very infrequently if that affects the optimal parameter mix.
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Thanks,
> >>>>> Robert Oschler
> >>>>> Twitter -> http://twitter.com/roschler
> >>>>> http://www.RobotsRule.com/
> >>>>> http://www.Robodance.com/
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sincerely yours
> >>>> Mikhail Khludnev
> >>>> Principal Engineer,
> >>>> Grid Dynamics
> >>>>
> >>>> <http://www.griddynamics.com>
> >>>> <mkhludnev@griddynamics.com>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Robert Oschler
> >>> Twitter -> http://twitter.com/roschler
> >>> http://www.RobotsRule.com/
> >>> http://www.Robodance.com/
> >>
> >>
> >
> >
> > --
> > Thanks,
> > Robert Oschler
> > Twitter -> http://twitter.com/roschler
> > http://www.RobotsRule.com/
> > http://www.Robodance.com/
>
>


-- 
Thanks,
Robert Oschler
Twitter -> http://twitter.com/roschler
http://www.RobotsRule.com/
http://www.Robodance.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message