lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex <azli...@gmail.com>
Subject Filtering query results based on relevance/acuracy
Date Mon, 21 Sep 2009 22:17:53 GMT
Hi,

I'm, a total newbie with lucene and trying to understand how to achieve my
(complicated) goals. So what I'm doing is yet totally experimental for me
but is probably extremely trivial for the experts in this list :)

I use lucene and Hibernate Search to index locations by their name, type,
etc ...
The LocationType is an Object that has it's "name" field indexed both
tokenized and untokenized.

The following LocationType names are indexed
"Restaurant"
"Mexican Restaurant"
"Chinese Restaurant"
"Greek Restaurant"
etc...

Considering the following query  :

"Mexican Restaurant"

I systematically get all the entries as a result, most certainly because the
"Restaurant" keyword is present in all of them.
I'm trying to have a finer grained result set.
Obviously for "Mexican Restaurant" I want the "Mexican Restaurant" entry as
a result but NOT "Chinese Restaurant" nor "Greek Restaurant" as they are
irrelevant. But maybe "Restaurant" itself should be returned with a lower
wight/score or maybe it shouldn't ... im not sure about this one.

1)
How can I do that ?

Here is the code I use for querying :


String[] typeFields = {"name", "tokenized_name"};
        Map<String,Float> boostPerField = new HashMap<String,Float>(2);
        boostPerField.put( "name", (float) 4);
        boostPerField.put( "tokenized_name", (float) 2);


        QueryParser parser = new MultiFieldQueryParser(
                typeFields ,
                new StandardAnalyzer(),
                boostPerField
                );

        org.apache.lucene.search.Query luceneQuery;

        try {
            luceneQuery = parser.parse(queryString);
        }
        catch (ParseException e) {
            throw new RuntimeException("Unable to parse query: " +
queryString, e);
        }





I guess that there is a way to filter out results that have a score below a
given threshold or a way to filter out results based on score gap or
anything similar. But I have no idea on how to do this...


What is the best way to achieve what I want?

Thank you for your help !

Cheers,

Alex

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message