lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Numerical Range Query
Date Mon, 12 May 2008 20:33:19 GMT
Are you using NumberTools both at index and query time? Because
this works exactly as I expect....

import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumberTools;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.ConstantScoreRangeQuery;

import java.io.IOException;

/**
 * Created by: eoericks
 * Date: May 12, 2008
 * History: $Log$
 */
public class Test {
    public static void main(String args[]) {
        try {
            Test test = new Test();
            test.doIndex();
            test.doSearch();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    private void doIndex() throws IOException {

        IndexWriter w = new
IndexWriter(FSDirectory.getDirectory("C:/lucidx"), new StandardAnalyzer(),
true);
        Document doc = new Document();
        doc.add(new Field("num", NumberTools.longToString(1), Field.Store.NO,
Field.Index.UN_TOKENIZED));
        doc.add(new Field("name", "doc 1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
        w.addDocument(doc);

        doc = new Document();
        doc.add(new Field("num", NumberTools.longToString(11),
Field.Store.NO, Field.Index.UN_TOKENIZED));
        doc.add(new Field("name", "doc 11", Field.Store.YES,
Field.Index.UN_TOKENIZED));
        w.addDocument(doc);

        doc = new Document();
        doc.add(new Field("num", NumberTools.longToString(5), Field.Store.NO,
Field.Index.UN_TOKENIZED));
        doc.add(new Field("name", "doc 5", Field.Store.YES,
Field.Index.UN_TOKENIZED));
        w.addDocument(doc);

        doc = new Document();
        doc.add(new Field("num", NumberTools.longToString(9), Field.Store.NO,
Field.Index.UN_TOKENIZED));
        doc.add(new Field("name", "doc 9", Field.Store.YES,
Field.Index.UN_TOKENIZED));
        w.addDocument(doc);

        w.close();

    }

    private void doSearch() throws IOException {
        IndexSearcher r = new
IndexSearcher(FSDirectory.getDirectory("c:/lucidx"));
        oneSearch(r, 1L);
        oneSearch(r, 2L);
        oneSearch(r, 5L);
        oneSearch(r, 9L);
        oneSearch(r, 0L);

    }
    private void oneSearch(IndexSearcher r, Long lower) throws IOException {
        System.out.println("\n\nSearching for greater than " +
Long.toString(lower));
        Hits hits = r.search(new ConstantScoreRangeQuery("num",
NumberTools.longToString(lower), null,  false, true));
        for (int idx = 0; idx < hits.length(); ++idx) {
            System.out.println(hits.doc(idx).get("name"));
        }

    }
}


***output***

Searching for greater than 1
doc 11
doc 5
doc 9


Searching for greater than 2
doc 11
doc 5
doc 9


Searching for greater than 5
doc 11
doc 9


Searching for greater than 9
doc 11


Searching for greater than 0
doc 1
doc 11
doc 5
doc 9


On Mon, May 12, 2008 at 3:21 PM, Dan Hardiker <dhardiker@adaptavist.com>
wrote:

> Erick Erickson wrote:
>
> > Although I'm a bit puzzled by what you're actually getting back.
> > You might try using Luke to look at your index to see what's
> > there.
> >
>
> I've looked through with Luke and it doesn't look like much has changed
> between using NumberTools and not. NumberTools definitely does some padding
> which makes sense, however even though I'm using that, Lucene or Luke seems
> to be boiling it down to just the number. I'm not sure which.
>
>  See the NumberTools class for some help here.......
> >
> > BTW, at least in Lucene 2.1, the preferred way to go about this
> > would be ConstantScoreRangeQuery...
> >
>
> Taking your advice I'm now indexing using:
>
> document.add( new Field(RateUtils.SF_FILTERED_CNT,
> NumberTools.longToString( filteredCount ), Field.Store.YES,
> Field.Index.UN_TOKENIZED) );
>
> and searching using:
>
> I'm now
> int minRates = Long.valueOf( minRatesString ).intValue();
> luceneQuery.add( new ConstantScoreRangeQuery( RateUtils.SF_FILTERED_CNT,
> NumberTools.longToString(minRates), "", true, false ),
> BooleanClause.Occur.MUST );
>
> I get very odd results back now, but they seem to work similarly. The
> documentation for ConstantScoreRangeQuery is rather thin however I did find
> this example which suggests I'm doing the right thing:
>
>
> http://github.com/we4tech/semantic-repository/tree/master/development/idea-repository-core/src/main/java/com/ideabase/repository/core/index/ExtendedQueryParser.java
>
> The code _looks_ like it should work, it makes sense logically but it
> still doesn't do what I'm expecting.
>
> I've tried changing the indexing over to Field.Index.NO_NORMS and it makes
> the field value "0000000000000b" instead of "11", and "00000000000002"
> instead of "2" ... but that meant that the searching didn't pick up on that
> field _at all_.
>
> Surely "find me results where numeric field x is higher than y" can't be
> an uncommon request? I can think of many areas where you want to do that
> (age filtering for example).
>
> Any other suggestions of what I should be looking for, or where I can look
> to find out the next step to take?
>
>
> --
> Dan Hardiker
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message