lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Bug? QueryParser may not correctly interpret RangeQuery text
Date Wed, 05 Jun 2002 05:17:24 GMT
Hello,

Just curious what the status of this issue is, as the discussion seems
to have stopped.

--- "Eric D. Friedman" <eric@conveysoftware.com> wrote:
> Instead of reinventing the wheel for representing dates, how about
> using an existing standard?  ISO 8601 defines a simple lexical
> representation for dates, times (with optional millisecond
> precision),
> and timezones that is easy to implement.  This is what's used in the
> XML Schema "dateTime" datatype.
> 
> A summary of the ISO 8601 notation is available here:
> http://www.cl.cam.ac.uk/~mgk25/iso-time.html
> 
> The documentation for the XML Schema dateTime datatype is here:
> http://www.w3.org/TR/xmlschema-2/#dateTime

I agree, that is why I immediately suggested YYYY-MM-DD.  I dislike
U.S.-centric or Europe-centric approaches when there is a standard
format.

> I whipped up a JavaCC parser to handle this lexical representation
> (see
> attachment).
> 
> Note that for this to be useful in QueryParser, it's going to need
> its
> own lexical state.  This makes sense anyway, since it would be a
> mistake to have the query syntax infer magical properties about
> strings
> that appear to be dates.  Better is to have a keyword in the query
> syntax that introduces a date value:  something like date(<VALUE>)
> would work.  So would to_date(<VALUE>) for those who know SQL. I
> would
> have suggested date:<VALUE> but I think that already means something
> in
> the QueryParser's lexical specification. (I don't actually use
> QueryParser because the patches I've submitted previously haven't
> made
> it in yet, and until they do, QP is fatally crippled for my
> purposes).

I'll try to look for your patches in the archives (if you have the URL
handly please send it to me), so that I can put it on the TODO list, if
it makes sense to do so.
As for the above comments about the parser, I'm afraid I'm still a
JavaCC neophite. I don't dislike date(<VALUE>) approach.  If users can
grasp field:value they shouldn't have a problem with field:date(value),
I think.

Otis


> On Sun, 2 Jun 2002, Peter Carlson wrote:
> 
> > I like this idea of [GOOP:GOOP] as it gives the most flexibility.
> However,
> > this requires the field to have a known characteristic like a date
> field,
> > number field or text field correct? If you just use the static
> Field.Date
> > this would require adding a new attribute the field class? I like
> this idea
> > but I donšt know the difficulty / backward compatibility issues.
> >
> > If the extra field attribute is too difficult, then I suggest we
> use the
> > nnnn-nn-nn format method so we can use the pattern to determine the
> data
> > type.
> >
> > For number fields, should this support only integers, or decimal
> numbers
> > too?
> >
> > I don't think we should use the : character, because we probably
> want to
> > support time formats in the date format. Something like 03/01/2001
> at
> > 00:01:00. Maybe something like ">" or "|" or even "->" ?
> >
> > Also, inclusive vs. exclusive should be accounted for with the [ vs
> {
> > characters.  I think this might already be done, but just wanted to
> throw it
> > out there.
> >
> > --Peter
> >
> >
> > On 6/2/02 2:13 AM, "Brian Goetz" <brian@quiotix.com> wrote:
> >
> > >>> How about:
> > >>>
> > >>>  DATE = nnnn-nn-nn
> > >>>  NUMBER = n*
> > >>>  RANGE = [ DATE : DATE ] | [ NUMBER : NUMBER ]
> > >>>
> > >>> An alternate, less parse-oriented approach would be this:
> > >>>   RANGE = [ GOOP : GOOP ]
> > >>> where
> > >>>   GOOP = any string of letters/numbers not containing : or ].
> > >>
> > >> I'd go for the first one as it's more explicit.  However,
> perhaps the
> > >> second approach is more extensible?
> > >
> > > When I first did the query parser, I defined terms by inclusion
> > > (stating valid characters) instead of exclusion (excluding
> non-term
> > > characters.)  Turns out I missed quite a few in the first go
> around,
> > > which taught me the lesson (again) that sometimes trying to be
> too
> > > specific is a rats nest.  What about dates like 02-Mai-2002 (not
> a
> > > typo, french for May)?  Letting DateFormat figure it out has some
> > > merit.
> > >
> > >> DateField(Date) and NumberField(int) sounds right, but wouldn't
> Field
> > >> class make more sense?
> > >
> > > I had in mind static methods of Field, just like Field.Text --
> > > Field.Date, Field.Number.   Sorry if that wasn't clear.  This
> seems
> > > an easy addition.
> > >
> > > --
> > > To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
> > PARSER_BEGIN(ISO8601Parser)
> 
> import java.io.*;
> import java.util.*;
> import java.text.*;
> 
> public class ISO8601Parser {
> 
>   static DateFormat fmt;
> 
>   public static void main(String args[]) throws ParseException {
>     String date;
> 
>     //date = "1999-05-31T13:20:00Z";
>     //date = "1999-05-31T13:20:00-00:01";
>     date = "1999-05-31T13:20:00.999-08:00";
> 
>     TimeZone utc = TimeZone.getTimeZone("UTC");
>     fmt = DateFormat.getDateTimeInstance();
>     fmt.setTimeZone(utc);
> 
>     ISO8601Parser parser = new ISO8601Parser(new StringReader(date));
>     Date d = parser.date();
>     System.out.println(fmt.format(d));
>   }
> }
> 
> PARSER_END(ISO8601Parser)
> 
> TOKEN :
> {
>   <#DIGIT: ["0"-"9"]>
> | <TWOD: <DIGIT><DIGIT>>         // two digits used for day, month,
> hours, minutes, seconds
> | <MILLIS: <TWOD><DIGIT>>        // millisecond precision is 000 ..
> 999
> | <YEAR: <TWOD><TWOD>(<DIGIT>)*> // at least 4 digits, but possibly
> more
> | <DASH: "-">                    // delimiter for CCYY-MM-DD; doubles
> as minus sign for signed ints
> | <COLON: ":">                   // delimiter for hh:mm:ss
> | <DOT: ".">                     // delimiter for ss.mmm
> (milliseconds)
> | <T: "T" >                      // delimiter between date and time
> | <Z: "Z" >                      // UTC timezone
> | <PLUS: "+">                    // indicates positive offset from
> UTC
> }
> 
> /**
>  * Input to this production is a series of tokens matching the
> following specification:
>  * CCYY-MM-DD -- a date with no time specification<br>
>  * CCYY-MM-DDThh:mm:ss -- a timestamp implicitly in the UTC
> timezone<br>
>  * CCYY-MM-DDThh:mm:ssZ -- a timestamp explicitly in the UTC
> timezone<br>
>  * CCYY-MM-DDThh:mm:ss-08:00 -- a timestamp with a negative 8 hour
> offset from UTC<br>
>  * CCYY-MM-DDThh:mm:ss.mmm -- a timestamp with millisecond
> precision<br>
>  * -CCYY-MM-DD -- a date whose year is before the common era
> (BCE)<br>
>  * NNCCYY-MM-DD -- a date whose year is > 9999<br>
>  *
>  * <p> Note that years greater than 9999 are allowed, but that 0000
> is not a valid year.
>  * Negative numbers are allowed when representing years BCE.
>  * </p>
>  *
>  * <p>Milliseconds are optional in the seconds field.  The timezone
> indicator is optional.
>  * </p>
>  *
>  *@return a java.util.Date instance in the UTC timezone, with
> millisecond precision.
>  */
> Date date() :
> {
>   int CCYY = 0, MM = 0, DD = 0, hh = 0, mm = 0, ss = 0, millis = 0;
>   int deltahh = 0, deltamm = 0;
>   boolean deltaPlus = true;
>   Calendar c = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
> }
> {
>   CCYY = year() <DASH>
>   MM = twod() <DASH>
>   DD = twod()
>   {
>     MM--; // months are 0 based
>     c.set(c.YEAR, CCYY);
>     c.set(c.MONTH, MM);
>     c.set(c.DAY_OF_MONTH, DD);
>   }
>   (
>     <T>
>     hh = twod() <COLON>
>     mm = twod() <COLON>
>     ss = twod()
>     {
>       c.set(c.HOUR_OF_DAY, hh);
>       c.set(c.MINUTE, mm);
>       c.set(c.SECOND, ss);
>     }
>     (
>       <DOT>
>       millis = millis()
>       {
>         c.set(c.MILLISECOND, millis);
>       }
>     )?
>     (
>       <Z> // we're already in UTC, so no adjustment needed
>       |
>       (
>         (
>           <PLUS> // somewhere ahead of UTC (east of Greenwich)
>           |
>           <DASH> // behind UTC (west of Greenwich)
>           {
>             deltaPlus = false;
>           }
>         )
>         deltahh = twod() <COLON>
>         deltamm = twod()
>         {
>           if (! deltaPlus) {
>             deltahh = -deltahh;
>             deltamm = -deltamm;
>           }
>           // millisecond offset
>           int offsetFromUTC = ((deltahh * 60) + deltamm) * 60 * 1000;
>           c.set(c.ZONE_OFFSET, offsetFromUTC);
>         }
>       )
>     )?
>   )?
>   {
>     return c.getTime();
>   }
> }
> 
> int millis() :
> {
>   Token t;
> }
> {
>   t = <MILLIS> {
>     return Integer.parseInt(t.image);
>   }
> }
> 
> int twod() :
> {
>   Token t;
> }
> {
>   t = <TWOD> {
>     return Integer.parseInt(t.image);
>   }
> }
> 
> int year() :
> {
>   Token t;
>   boolean positive = true;
> }
> {
>   (
>     <DASH>
>     {
>       positive = false;
>     }
>   )?
>   t = <YEAR> {
>     int year = Integer.parseInt(t.image);
>     if (year == 0) {
>       throw new IllegalArgumentException("0000 is not a legal year");
>     }
>     return positive ? year : -year;
>   }
> }
> > --
> To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message