lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Alves <lafa...@gmail.com>
Subject Re: Question about the extends the query parser to support NumericField on Lucene 2.9.0
Date Tue, 27 Oct 2009 20:43:21 GMT
Hi,

The new queryparser, as the same restriction.
Since +/- are operators for the lucene syntax, you need to escape them
age:\-32 or use double quotes as suggested by Uwe.

We have the idea to add queryparser extensions to the new queryparser in 
contrib in the near future,
this would allow for users to extend parts of the syntax without having 
to rewrite to queryparser.

Another option using the new queryparser is to create a 
QueryNodeProcessor class
that will undo the parsing for nodes with where the field name is "age".
This is super easy in case you are interest I can post the code here,
but you have to use the new queryparser that is in contrib, and include 
that jar in your class path.



Uwe Schindler wrote:
> If you look into the testcase I provided with my QueryParser example, you
> will see, that the negative numbers have a problem in newTermQuery.
>
> "-" is a control character in QueryParser, which means to do a "NOT" on this
> term. Because of this the syntax of the query is wrong. To hit the negative
> number there is no way around putting the number in quotes: age:"-32":
>
> http://www.lucidimagination.com/search/document/ef7a9dc1444c9d28/how_do_you_
> properly_use_numericfield#de054d728e252174
>
> Sorry, I see no other solution without changing the query parser JavaCC
> syntax. Maybe the new Contrib QueryParser will handle this better in future
> (there is an open issue about that).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: java8964 java8964 [mailto:java8964@hotmail.com]
>> Sent: Thursday, October 22, 2009 11:56 PM
>> To: java-user@lucene.apache.org
>> Subject: Question about the extends the query parser to support
>> NumericField on Lucene 2.9.0
>>
>>
>> Hi,  I have a problem to work support the NumericField in query parser.
>>
>> My environment is like this:
>>
>> Windows XP with
>> C:\work\> java -version
>> java version "1.6.0_10"
>> Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
>> Java HotSpot(TM) Client VM (build 11.0-b15, mixed mode, sharing)
>>
>> I am using the lucene 2.9.0 releases.
>>
>> I write my query parser class to support this numeric field, here is copy
>> of the override methods:
>>
>>     /**
>>      * Create a new range query of query parser.
>>      *
>>      * If the filed is a numeric field, return NumericRangeQuery;
>>      * otherwise, let super class handle it
>>      *
>>      * @param fieldName The file name
>>      * @param part1 The lower bound
>>      * @param part2 The high bound
>>      * @throws IllegalArgumentExceptoin if the field type is not supported
>>      * @throws NumberFormatException if the query data does not match with
>> the field type
>>      */
>>     @Override
>>     protected Query newRangeQuery(String fieldName, String part1, String
>> part2, boolean inclusive)
>>     {
>>         fieldName = fieldName.toLowerCase();
>>         if (LogUtil.getInstance().isDebugEnabled(DcQueryParser.class))
>>         {
>>             LogUtil.getInstance().debug(DcQueryParser.class,
>>                     "Create a new range query for: " + fieldName);
>>         }
>>
>>         mFieldNames.add(fieldName);
>>         IFieldDefinition fieldDef =
>> mIndexDef.getFieldDefinition(fieldName);
>>         if (part1.trim().startsWith("+"))
>>         {
>>             part1 = part1.substring(1);
>>         }
>>         if (part2.trim().startsWith("+"))
>>         {
>>             part2 = part2.substring(1);
>>         }
>>         if (fieldDef != null && fieldDef.isNumericField())
>>         {
>>             if (fieldDef.getFieldType() == IFieldDefinition.FieldType.INT)
>>             {
>>                 return NumericRangeQuery.newIntRange(fieldDef.getName(),
>> Integer.parseInt(part1), Integer.parseInt(part2), inclusive, inclusive);
>>             }
>>             else if (fieldDef.getFieldType() ==
>> IFieldDefinition.FieldType.LONG)
>>             {
>>                   return
>> NumericRangeQuery.newLongRange(fieldDef.getName(), Long.parseLong(part1),
>> Long.parseLong(part2), inclusive, inclusive);
>>             }
>>             else if (fieldDef.getFieldType() ==
>> IFieldDefinition.FieldType.FLOAT)
>>             {
>>                    return
>> NumericRangeQuery.newFloatRange(fieldDef.getName(),
>> Float.parseFloat(part1), Float.parseFloat(part2), inclusive, inclusive);
>>             }
>>             else if (fieldDef.getFieldType() ==
>> IFieldDefinition.FieldType.DOUBLE)
>>             {
>>                    return
>> NumericRangeQuery.newDoubleRange(fieldDef.getName(),
>> Double.parseDouble(part1), Double.parseDouble(part2), inclusive,
>> inclusive);
>>             }
>>             else
>>             {
>>                 throw new IllegalArgumentException("Unsupported new
>> Numeric field type, as the type is: " + fieldDef.getFieldType().name());
>>             }
>>         }
>>         else
>>         {
>>             return super.newRangeQuery(fieldName, part1, part2,
>> inclusive);
>>         }
>>     }
>>
>>     /**
>>      * Create a new term query of query parser.
>>      * If the filed is a numeric field, use xxxPrefixCoded
>>      * otherwise, let super class handle it
>>      *
>>      * @param term The term object
>>      * @return The query object
>>      * @throws IllegalArgumentExceptoin if the field type is not supported
>>      * @throws NumberFormatException if the query data does not match with
>> the field type
>>      */
>>     @Override
>>     protected Query newTermQuery(Term term)
>>     {
>>         System.out.println("......................1");
>>         String fieldName = term.field();
>>         if (LogUtil.getInstance().isDebugEnabled(DcQueryParser.class))
>>         {
>>             LogUtil.getInstance().debug(DcQueryParser.class,
>>                     "Create a new term query for: " + fieldName);
>>         }
>>
>>         mFieldNames.add(fieldName);
>>         IFieldDefinition fieldDef =
>> mIndexDef.getFieldDefinition(fieldName);
>>         if (fieldDef != null && fieldDef.isNumericField())
>>         {
>>             System.out.println("......................2");
>>             String queryString = term.text().trim();
>>             if (queryString.startsWith("+"))
>>             {
>>                 queryString.substring(1);
>>             }
>>             if (fieldDef.getFieldType() == IFieldDefinition.FieldType.INT)
>>             {
>>                 return new TermQuery(new Term(term.field(),
>> NumericUtils.intToPrefixCoded(Integer.parseInt(queryString))));
>>             }
>>             else if (fieldDef.getFieldType() ==
>> IFieldDefinition.FieldType.LONG)
>>             {
>>                 return new TermQuery(new Term(term.field(),
>> NumericUtils.longToPrefixCoded(Long.parseLong(queryString))));
>>             }
>>             else if (fieldDef.getFieldType() ==
>> IFieldDefinition.FieldType.FLOAT)
>>             {
>>                    return new TermQuery(new Term(term.field(),
>> NumericUtils.floatToPrefixCoded(Float.parseFloat(queryString))));
>>             }
>>             else if (fieldDef.getFieldType() ==
>> IFieldDefinition.FieldType.DOUBLE)
>>             {
>>                    return new TermQuery(new Term(term.field(),
>> NumericUtils.doubleToPrefixCoded(Double.parseDouble(queryString))));
>>             }
>>             else
>>             {
>>                 throw new IllegalArgumentException("Unsupported new
>> Numeric field type, as the type is: " + fieldDef.getFieldType().name());
>>             }
>>         }
>>         else
>>         {
>>             return super.newTermQuery(term);
>>         }
>>     }
>>
>> For my case, range query works as expected. The problem I met now is for
>> the Field query.
>>
>> Here is my unit test:
>>
>> I indexed one line data as following:
>> operation,user_id,city,province,country,age,isbn,title,author,pub_year,pub
>> _name,rating
>> A,56,cheyenne,wyoming,usa,-32,671623249,LONESOME DOVE,Larry
>> McMurtry,1986,Pocket,7.0
>>
>> To make my case simple, I only set the age as type int.
>> Right before I add the field into the document, I have to following
>> statement to check as the output:
>>
>>             if (fieldDef.isNumericField())
>>             {
>>                 System.out.println("Add the numeric field for name: " +
>> fieldDef.getName() + " and value is " + docFieldValue);
>>                 NumericField numField = new
>> NumericField(fieldDef.getName(), Field.Store.YES, true);
>>                 numField.setLongValue(Long.parseLong(docFieldValue));
>>                 doc.add(numField);
>>             }
>>
>> which output the following message in my console:
>> ------------------->  Add the numeric field for name: age and value is -32
>>
>> which proves that I add one numeric field object into the document, the
>> name is 'age', and the value is '-32'.
>>
>> here is my junit test case:
>>         IndexSearcher searcher = new IndexSearcher(new
>> SimpleFSDirectory(indexDir), true);
>>         MyQueryParser queryParser = new MyQueryParser("age",
>> defaultAnalyzer); --The default analyzer is the stand analyzer in this
>> case.
>>         TopDocs docs = searcher.search(queryParser.parse("age:-32"), 10);
>>         Assert.assertTrue(docs.totalHits == 1);
>>
>> I expect it will pass, but it gives me back the following error message:
>>
>>     [junit] Testcase: testBuildIndex took 9.516 sec
>>     [junit]     Caused an ERROR
>>     [junit] Cannot parse 'age:-32': Encountered " "-" "- "" at line 1,
>> column 4.
>>     [junit] Was expecting one of:
>>     [junit]     "(" ...
>>     [junit]     "*" ...
>>     [junit]     <QUOTED> ...
>>     [junit]     <TERM> ...
>>     [junit]     <PREFIXTERM> ...
>>     [junit]     <WILDTERM> ...
>>     [junit]     "[" ...
>>     [junit]     "{" ...
>>     [junit]     <NUMBER> ...
>>     [junit]
>>     [junit] org.apache.lucene.queryParser.ParseException: Cannot parse
>> 'age:-32': Encountered " "-" "- "" at line 1, column 4.
>>     [junit] Was expecting one of:
>>     [junit]     "(" ...
>>     [junit]     "*" ...
>>     [junit]     <QUOTED> ...
>>     [junit]     <TERM> ...
>>     [junit]     <PREFIXTERM> ...
>>     [junit]     <WILDTERM> ...
>>     [junit]     "[" ...
>>     [junit]     "{" ...
>>     [junit]     <NUMBER> ...
>>     [junit]
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:181)
>>     [junit]     at
>> nokia.dc.server.build.index.IndexBuilderTest.testBuildIndex(IndexBuilderTe
>> st.java:236)
>>     [junit] Caused by: org.apache.lucene.queryParser.ParseException:
>> Encountered " "-" "- "" at line 1, column 4.
>>     [junit] Was expecting one of:
>>     [junit]     "(" ...
>>     [junit]     "*" ...
>>     [junit]     <QUOTED> ...
>>     [junit]     <TERM> ...
>>     [junit]     <PREFIXTERM> ...
>>     [junit]     <WILDTERM> ...
>>     [junit]     "[" ...
>>     [junit]     "{" ...
>>     [junit]     <NUMBER> ...
>>     [junit]
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.generateParseException(QueryPars
>> er.java:1822)
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.jav
>> a:1704)
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1331)
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1241)
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1
>> 230)
>>     [junit]     at
>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:176)
>>
>> My question is about to support the FieldQuery in this case. As I said,
>> the RangeQuery works as I expect.
>>
>> The question is:
>>
>> 1) The above prove that I set the field name to 'age', which matched the
>> query field name I put in the query parser. Why I got the above error?
>> 2) I override the newTermQuery method. I am thinking that it sould be
>> invoked in this case. As you can see, I system.out a line in the first
>> statement. But before the above error show up, I didn't see that line
>> output, which menas the execution is not reach to newTermQuery method when
>> the error happened.
>> 3) I did as above is I saw a few days ago, there is a discussion about the
>> same topic. So I just basically copy the idea from "Uwe Schindler" code.
>> My more general question is that when should we override the newXXX
>> method(), or when should we override getXXXX method? What is the
>> difference between them?
>> 4) As you can see my above example, we want to support the query string
>> for numerice field with '+' in it. Even java won't support it and throw
>> NumberFormat Exception, but my case need to support it. So I will remove
>> it from the query string and then send to the super class. I would like to
>> know it won't cause ParseException before it reaches my override methods.
>> 5) As these numeric field features, The query parser class methods did NOT
>> throw ParserException in the method signature. But if I want to catch
>> NumberFormatException, then rethrow ParserException, so my client only
>> need to worry the ParseException. But the ParseException is a regular
>> exception, and I can NOT add it into the override method signture. Any
>> work around?
>>
>> Thanks for your kind help.
>>
>>
>>
>>
>> _________________________________________________________________
>> Windows 7: It helps you do more. Explore Windows 7.
>> http://www.microsoft.com/Windows/windows-
>> 7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-
>> US:WWL_WIN_evergreen3:102009
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message