lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject Question about the extends the query parser to support NumericField on Lucene 2.9.0
Date Thu, 22 Oct 2009 21:56:24 GMT

Hi,  I have a problem to work support the NumericField in query parser.

My environment is like this:

Windows XP with 
C:\work\> java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)

I am using the lucene 2.9.0 releases.

I write my query parser class to support this numeric field, here is copy of the override
methods:

    /**
     * Create a new range query of query parser.
     * 
     * If the filed is a numeric field, return NumericRangeQuery;
     * otherwise, let super class handle it
     * 
     * @param fieldName The file name
     * @param part1 The lower bound
     * @param part2 The high bound
     * @throws IllegalArgumentExceptoin if the field type is not supported
     * @throws NumberFormatException if the query data does not match with the field type
     */
    @Override
    protected Query newRangeQuery(String fieldName, String part1, String part2, boolean inclusive)
    {
        fieldName = fieldName.toLowerCase();
        if (LogUtil.getInstance().isDebugEnabled(DcQueryParser.class))
        {
            LogUtil.getInstance().debug(DcQueryParser.class,
                    "Create a new range query for: " + fieldName);
        }

        mFieldNames.add(fieldName);
        IFieldDefinition fieldDef = mIndexDef.getFieldDefinition(fieldName);
        if (part1.trim().startsWith("+"))
        {
            part1 = part1.substring(1);
        }
        if (part2.trim().startsWith("+"))
        {
            part2 = part2.substring(1);
        }
        if (fieldDef != null && fieldDef.isNumericField())
        {
            if (fieldDef.getFieldType() == IFieldDefinition.FieldType.INT)
            {
                return NumericRangeQuery.newIntRange(fieldDef.getName(), Integer.parseInt(part1),
Integer.parseInt(part2), inclusive, inclusive);
            } 
            else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.LONG)
            {
                  return NumericRangeQuery.newLongRange(fieldDef.getName(), Long.parseLong(part1),
Long.parseLong(part2), inclusive, inclusive);
            }
            else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.FLOAT)
            {
                   return NumericRangeQuery.newFloatRange(fieldDef.getName(), Float.parseFloat(part1),
Float.parseFloat(part2), inclusive, inclusive);
            }
            else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.DOUBLE)
            {
                   return NumericRangeQuery.newDoubleRange(fieldDef.getName(), Double.parseDouble(part1),
Double.parseDouble(part2), inclusive, inclusive);
            }
            else
            {
                throw new IllegalArgumentException("Unsupported new Numeric field type, as
the type is: " + fieldDef.getFieldType().name());
            }
        }
        else
        {
            return super.newRangeQuery(fieldName, part1, part2, inclusive);
        }
    }
    
    /**
     * Create a new term query of query parser.
     * If the filed is a numeric field, use xxxPrefixCoded
     * otherwise, let super class handle it
     * 
     * @param term The term object
     * @return The query object
     * @throws IllegalArgumentExceptoin if the field type is not supported
     * @throws NumberFormatException if the query data does not match with the field type
     */
    @Override
    protected Query newTermQuery(Term term)
    {
        System.out.println("......................1");
        String fieldName = term.field();
        if (LogUtil.getInstance().isDebugEnabled(DcQueryParser.class))
        {
            LogUtil.getInstance().debug(DcQueryParser.class,
                    "Create a new term query for: " + fieldName);
        }

        mFieldNames.add(fieldName);
        IFieldDefinition fieldDef = mIndexDef.getFieldDefinition(fieldName);
        if (fieldDef != null && fieldDef.isNumericField())
        {
            System.out.println("......................2");
            String queryString = term.text().trim();
            if (queryString.startsWith("+"))
            {
                queryString.substring(1);
            }
            if (fieldDef.getFieldType() == IFieldDefinition.FieldType.INT)
            {
                return new TermQuery(new Term(term.field(),    NumericUtils.intToPrefixCoded(Integer.parseInt(queryString))));
            } 
            else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.LONG)
            {
                return new TermQuery(new Term(term.field(),    NumericUtils.longToPrefixCoded(Long.parseLong(queryString))));
            }
            else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.FLOAT)
            {
                   return new TermQuery(new Term(term.field(),    NumericUtils.floatToPrefixCoded(Float.parseFloat(queryString))));
            }
            else if (fieldDef.getFieldType() == IFieldDefinition.FieldType.DOUBLE)
            {
                   return new TermQuery(new Term(term.field(),    NumericUtils.doubleToPrefixCoded(Double.parseDouble(queryString))));
            }
            else
            {
                throw new IllegalArgumentException("Unsupported new Numeric field type, as
the type is: " + fieldDef.getFieldType().name());
            }
        }
        else
        {
            return super.newTermQuery(term);
        }
    }

For my case, range query works as expected. The problem I met now is for the Field query.

Here is my unit test:

I indexed one line data as following:
operation,user_id,city,province,country,age,isbn,title,author,pub_year,pub_name,rating
A,56,cheyenne,wyoming,usa,-32,671623249,LONESOME DOVE,Larry McMurtry,1986,Pocket,7.0

To make my case simple, I only set the age as type int.
Right before I add the field into the document, I have to following statement to check as
the output:

            if (fieldDef.isNumericField())
            {
                System.out.println("Add the numeric field for name: " + fieldDef.getName()
+ " and value is " + docFieldValue);
                NumericField numField = new NumericField(fieldDef.getName(), Field.Store.YES,
true);
                numField.setLongValue(Long.parseLong(docFieldValue));
                doc.add(numField);
            }

which output the following message in my console:
------------------->  Add the numeric field for name: age and value is -32

which proves that I add one numeric field object into the document, the name is 'age', and
the value is '-32'.

here is my junit test case:
        IndexSearcher searcher = new IndexSearcher(new SimpleFSDirectory(indexDir), true);
        MyQueryParser queryParser = new MyQueryParser("age", defaultAnalyzer); --The default
analyzer is the stand analyzer in this case.
        TopDocs docs = searcher.search(queryParser.parse("age:-32"), 10);
        Assert.assertTrue(docs.totalHits == 1);

I expect it will pass, but it gives me back the following error message:

    [junit] Testcase: testBuildIndex took 9.516 sec
    [junit]     Caused an ERROR
    [junit] Cannot parse 'age:-32': Encountered " "-" "- "" at line 1, column 4.
    [junit] Was expecting one of:
    [junit]     "(" ...
    [junit]     "*" ...
    [junit]     <QUOTED> ...
    [junit]     <TERM> ...
    [junit]     <PREFIXTERM> ...
    [junit]     <WILDTERM> ...
    [junit]     "[" ...
    [junit]     "{" ...
    [junit]     <NUMBER> ...
    [junit]
    [junit] org.apache.lucene.queryParser.ParseException: Cannot parse 'age:-32': Encountered
" "-" "- "" at line 1, column 4.
    [junit] Was expecting one of:
    [junit]     "(" ...
    [junit]     "*" ...
    [junit]     <QUOTED> ...
    [junit]     <TERM> ...
    [junit]     <PREFIXTERM> ...
    [junit]     <WILDTERM> ...
    [junit]     "[" ...
    [junit]     "{" ...
    [junit]     <NUMBER> ...
    [junit]
    [junit]     at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:181)
    [junit]     at nokia.dc.server.build.index.IndexBuilderTest.testBuildIndex(IndexBuilderTest.java:236)
    [junit] Caused by: org.apache.lucene.queryParser.ParseException: Encountered " "-" "-
"" at line 1, column 4.
    [junit] Was expecting one of:
    [junit]     "(" ...
    [junit]     "*" ...
    [junit]     <QUOTED> ...
    [junit]     <TERM> ...
    [junit]     <PREFIXTERM> ...
    [junit]     <WILDTERM> ...
    [junit]     "[" ...
    [junit]     "{" ...
    [junit]     <NUMBER> ...
    [junit]
    [junit]     at org.apache.lucene.queryParser.QueryParser.generateParseException(QueryParser.java:1822)
    [junit]     at org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.java:1704)
    [junit]     at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1331)
    [junit]     at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1241)
    [junit]     at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1230)
    [junit]     at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:176)

My question is about to support the FieldQuery in this case. As I said, the RangeQuery works
as I expect.

The question is:

1) The above prove that I set the field name to 'age', which matched the query field name
I put in the query parser. Why I got the above error?
2) I override the newTermQuery method. I am thinking that it sould be invoked in this case.
As you can see, I system.out a line in the first statement. But before the above error show
up, I didn't see that line output, which menas the execution is not reach to newTermQuery
method when the error happened.
3) I did as above is I saw a few days ago, there is a discussion about the same topic. So
I just basically copy the idea from "Uwe Schindler" code. 
My more general question is that when should we override the newXXX method(), or when should
we override getXXXX method? What is the difference between them?
4) As you can see my above example, we want to support the query string for numerice field
with '+' in it. Even java won't support it and throw NumberFormat Exception, but my case need
to support it. So I will remove it from the query string and then send to the super class.
I would like to know it won't cause ParseException before it reaches my override methods.
5) As these numeric field features, The query parser class methods did NOT throw ParserException
in the method signature. But if I want to catch NumberFormatException, then rethrow ParserException,
so my client only need to worry the ParseException. But the ParseException is a regular exception,
and I can NOT add it into the override method signture. Any work around?

Thanks for your kind help.



 		 	   		  
_________________________________________________________________
Windows 7: It helps you do more. Explore Windows 7.
http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen3:102009
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message