lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Combining PrefixQuery and FuzzyQuery
Date Mon, 19 Apr 2010 16:54:23 GMT
I am sorry, I dont understand what you are trying to say. Its confusing to me.

If you want a Fuzzy query, where the first 3 chars ("the") always match (a PrefixQuery) and
the rest of the term is fuzzy, use this Constructor and pass 3 as prefixlen:

http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/FuzzyQuery.html#FuzzyQuery(org.apache.lucene.index.Term,
float, int)

e.g. new FuzzyQuery(new Term("item.name","thesuffix"), 0.79, 3)

This will match all terms starting with "the" (prefix length 3) and will apply a fuzzy to
the rest ("suffix"). Is this what you want to have?

By the way neither prefix queries nor fuzzy queries have a "*" after the term, this is only
notation of QueryParser but not the term.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Lukas Österreicher [mailto:lukas.oesterreicher@austria.real.com]
> Sent: Monday, April 19, 2010 6:18 PM
> To: java-user@lucene.apache.org
> Subject: Re: Combining PrefixQuery and FuzzyQuery
> 
> Update to my last response with a sample of what I thought you might
> mean:
> 
> This does not work.
> Original query up till now:
> +(item.name:the* item.name:the)
> 
> New query would look like this (which states
> Match item.name where a term exists that is either
> Exactly the or starts with the):
> +(item.name:the*~0.79 item.name:the~0.79)
> 
> Until now these matched (I apply ignoring of cases as mentioned):
> On The Run
> The Final Cut
> Us And Them
> 
> With the change "Us And Them" will not match anymore.
> 
> What I want is a change so it would even match
> "Us and Thém"
> 
> Lukas
> 
> Am 19.04.10 17:13 schrieb "Uwe Schindler" unter <uwe@thetaphi.de>:
> 
> > Dont use PrefixQuery, only FuzzyQuery. There you pass in the whole
> term (with
> > prefix) and define how many characters are the prefix.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Lukas Österreicher
> [mailto:lukas.oesterreicher@austria.real.com]
> >> Sent: Monday, April 19, 2010 5:00 PM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Combining PrefixQuery and FuzzyQuery
> >>
> >> Well, how would this look like in code?
> >> Currently I have the prefix query like this:
> >>
> >> BooleanQuery bQuery = new BooleanQuery();
> >> PrefixQuery prefixQuery = new PrefixQuery(new Term("item.name",
> >> termText));
> >> bQuery.add( prefixQuery, Occur.MUST);
> >>
> >> I dont see any class named PrefixTerm.
> >> I'd appreciate it if you could show me how it is done in java code.
> >>
> >> Lukas
> >>
> >> Am 19.04.10 16:48 schrieb "Uwe Schindler" unter <uwe@thetaphi.de>:
> >>
> >>> How about a fuzzy query with a prefix term? Its configureable.
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Lukas Österreicher
> >> [mailto:lukas.oesterreicher@austria.real.com]
> >>>> Sent: Monday, April 19, 2010 4:43 PM
> >>>> To: java-user@lucene.apache.org
> >>>> Subject: Combining PrefixQuery and FuzzyQuery
> >>>>
> >>>> Hello.
> >>>>
> >>>> Is it possible to combine PrefixQuery and FuzzyQuery?
> >>>> The search on a term should both be fuzzy but also match with
> >> results
> >>>> that
> >>>> jut begin with that token (or an approximation of that token).
> >>>>
> >>>> If it is possible, can you give me an example on how to achieve
> >> this?
> >>>>
> >>>> Currently I only use the PrefixQuery and performance is ok.
> >>>> Would performance with such a combination be much worse?
> >>>>
> >>>> I would not even need a complete fuzzy search, it would suffice
> >>>> To have the matching be done without caring for cases (this I
> >> already
> >>>> have
> >>>> present by using a modified WhitespaceTokenizer which filters
> >>>> To lower cases) and with also matching characters where accents
> >>>> Also match, so e would match é and è.
> >>>>
> >>>> Finally, I would like to know how much sorting a string field
> >>>> Which is not too long (containing track or album title) affects
> >>>> performance
> >>>> Copared to not providing any sorting parameters.
> >>>>
> >>>> Thanx in advance,
> >>>> Lukas
> >>>
> >>>
> >>> -------------------------------------------------------------------
> --
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >> --------------------------------------------------------------------
> -
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message