lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: term frequency with stemming
Date Mon, 27 Jul 2015 09:54:45 GMT
A part the funny "crypted" message by Darin xD
I would like to focus on the initial user requirement :

"get term
frequencies with fuzzy matching"

Solr/Lucene offer you the support for fuzzy query independently of the way
you token filter your terms at analysis time.
You can run fuzzy queries with the edit distance ( by default calculated
over a Levenstein Automaton) .

This will allow you to run your fuzzy query and leave your index terms as
you want  ( without affecting in this way the term frequency) .

Can you give us more details about your use of stemming ?
Usually stemming is something a little bit different from fuzzy search.
But it is a good way to solve some search requirements ( always keep in
mind that stemming degrade the precision of your system in favour to your
recall)

Cheers


2015-07-25 20:21 GMT+01:00 Aki Balogh <aki@marketmuse.com>:

> I believe I found a solution: use a third-party stemmer to stem the term
> first, then pass it to termfreq.
>
> The only trick is, each term in a phrase has to be stemmed separately (i.e.
> "end-user experience" has to be broken down into "end-user" -> "end-us" and
> "experience" -> "experi") before being passed, i.e. termfreq(body, "end-us
> experi").
>
> From what I can tell, FunctionQuery / termfreq doesn't have a way to apply
> stemming.
>
> Akos (Aki) Balogh
> Co-Founder, MarketMuse
> https://www.MarketMuse.com <https://www.marketmuse.com/>
>
>
> On Fri, Jul 24, 2015 at 12:04 PM, Aki Balogh <aki@marketmuse.com> wrote:
>
> > Hi All,
> >
> > I'm using TermVectorComponent and stemming (Porter) in order to get term
> > frequencies with fuzzy matching. I'm stemming at index and query time.
> >
> > Is there a way to get term frequency from the index?
> > * termfreq doesn't support stemming or wildcards
> > * terms component doesn't allow additional filters
> > * I could use a copyfield to save a non-stemmed version at indexing, and
> > run termfreq on that, but then I don't get any fuzzy matching
> >
> > Thanks,
> > Aki
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message