lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cool Coder <techcool.ku...@yahoo.com>
Subject Re: Help on FuzzyLikeThisQuery
Date Sat, 24 Nov 2007 19:20:17 GMT
Now I can see lot improvement on my "related" help search result. Let me tell you that I have
a non-token list which removes all irrelevant tokens from selected help topic. So after filtering
all non tokens from the selected help topic, I search help system and show all results. But
I am somehow not confident on non-token list, and I feel it can be improved or maybe I am
looking for some sort of Human readable kind of tokenizer that can generate equivalent query.
Needless to mention here that I am taking help of SANDBOX/wordnet to generated all synonym
queries for User selected help topic. SO my system works something
   
  - Remove all non-tokens from User selected help topic using non-token list
  - Generate synonym queries using Lucene Sandbox/wordnet API
  - Search Help system using FuzzyLikeThisQuery.
  - Combine all results by using lucene ranking
  - Show only first 10 results to Users
   
  As I have mentioned this works fine but does not work as expected for generic topics.
   
  Somebody suggested me to user JAMA(http://math.nist.gov/javanumerics/jama/) with Lucene.
But I am not sure whether I can afford resources for R&D on Jama and use it with Lucene.
Rather I would definitely be satisfied with your suggested query and keep improving non-token
list filter.
   
  Also I would appreciate your suggestion.
   
  - RB
markharw00d <markharw00d@yahoo.co.uk> wrote:
  Cool Coder wrote:
>> Is there anyway I can specify which terms are "MUST", I mean they 
have to appear in the result and some terms are optional,

One "hands off" approach you could try with this is to rewrite the 
fuzzyQuery and then set the minimum number of terms you want a match on. 
e.g.

FuzzyLikeThisQuery flt=new FuzzyLikeThisQuery(50,new 
StandardAnalyzer());
flt.addTerms("product critical update", "title", 0.75f, 
FuzzyQuery.defaultPrefixLength);
BooleanQuery q = (BooleanQuery) flt.rewrite(r);
int minNumClauseMatches=Math.round(q.clauses().size()*0.5f);
q.setMinimumNumberShouldMatch(minNumClauseMatches);

In the above code I'm specifying at least half of the input terms must 
have a match.

If a user wants more control then they really need to be more "hands on" 
and specify precisely which of these words are important to them in the 
actual query syntax.

Cheers
Mark

> Hello,
> I am trying to use FuzzyLikeThisQuery to search my help system and show set of help entries
for user selected Help topic. For any selected Help topic, System needs to display all related
topics. This works somehow, but if query contains generic terms then result returned by FuzzyLikeThisQuery
contains all irrelevant topics. E.g. 
> if query is "product blog update" then I am getting results like
> 
> fuzzyLikeQuery.addTerms("product blog update", "title", 0.75f, FuzzyQuery.defaultPrefixLength);
> 
> --Slide Show Update - Full Control Panel
> --Product manager: sent a mail to xyz@hy.com
> 
> I would expect at least terms like "product" and "blog" should appear in the result.

> Is there anyway I can specify which terms are "MUST", I mean they have to appear in the
result and some terms are optional, I mean they need not appear in the result. 
> 
> Previously, I was using PhraseQuery, but it looks for an exact match. 
> I would appreciate your suggestion?
> 
> - BR
> 
>
> 
> ---------------------------------
> Get easy, one-click access to your favorites. Make Yahoo! your homepage.
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message