Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of azlist1@gmail.com designates
 72.14.220.156 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=Gor8U1KC2OEn3Cockp9zdIxRp/NZA08tyF/Hy3NqxONSZ6vN0KSBYohUoM8dKxOJ4u
         PPJxo5jNLs8T/OjX22Tr+JcBMRfeW8Z/7X8JyNSu4RgHOArpz7YAELIab9+N2gaJKAq/
         I6ApvJJVVJu5YK8ysQwpi4hQMdcC+vr/V0q+k=
MIME-Version: 1.0
Date: Tue, 22 Sep 2009 00:17:53 +0200
Message-ID: <76c1202b0909211517g77b3e53ck6ffcaa6a21018d2@mail.gmail.com>
Subject: Filtering query results based on relevance/acuracy
From: Alex <azlist1@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=00c09ffb4c71620ba804741ddaf1

--00c09ffb4c71620ba804741ddaf1
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I'm, a total newbie with lucene and trying to understand how to achieve my
(complicated) goals. So what I'm doing is yet totally experimental for me
but is probably extremely trivial for the experts in this list :)

I use lucene and Hibernate Search to index locations by their name, type,
etc ...
The LocationType is an Object that has it's "name" field indexed both
tokenized and untokenized.

The following LocationType names are indexed
"Restaurant"
"Mexican Restaurant"
"Chinese Restaurant"
"Greek Restaurant"
etc...

Considering the following query  :

"Mexican Restaurant"

I systematically get all the entries as a result, most certainly because the
"Restaurant" keyword is present in all of them.
I'm trying to have a finer grained result set.
Obviously for "Mexican Restaurant" I want the "Mexican Restaurant" entry as
a result but NOT "Chinese Restaurant" nor "Greek Restaurant" as they are
irrelevant. But maybe "Restaurant" itself should be returned with a lower
wight/score or maybe it shouldn't ... im not sure about this one.

1)
How can I do that ?

Here is the code I use for querying :


String[] typeFields = {"name", "tokenized_name"};
        Map<String,Float> boostPerField = new HashMap<String,Float>(2);
        boostPerField.put( "name", (float) 4);
        boostPerField.put( "tokenized_name", (float) 2);


        QueryParser parser = new MultiFieldQueryParser(
                typeFields ,
                new StandardAnalyzer(),
                boostPerField
                );

        org.apache.lucene.search.Query luceneQuery;

        try {
            luceneQuery = parser.parse(queryString);
        }
        catch (ParseException e) {
            throw new RuntimeException("Unable to parse query: " +
queryString, e);
        }


I guess that there is a way to filter out results that have a score below a
given threshold or a way to filter out results based on score gap or
anything similar. But I have no idea on how to do this...


What is the best way to achieve what I want?

Thank you for your help !

Cheers,

Alex

--00c09ffb4c71620ba804741ddaf1--