lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric D. Friedman" <e...@conveysoftware.com>
Subject Re: Bug? QueryParser may not correctly interpret RangeQuery text
Date Mon, 03 Jun 2002 04:38:11 GMT
Instead of reinventing the wheel for representing dates, how about
using an existing standard?  ISO 8601 defines a simple lexical
representation for dates, times (with optional millisecond precision),
and timezones that is easy to implement.  This is what's used in the
XML Schema "dateTime" datatype.

A summary of the ISO 8601 notation is available here:
http://www.cl.cam.ac.uk/~mgk25/iso-time.html

The documentation for the XML Schema dateTime datatype is here:
http://www.w3.org/TR/xmlschema-2/#dateTime

I whipped up a JavaCC parser to handle this lexical representation (see
attachment).

Note that for this to be useful in QueryParser, it's going to need its
own lexical state.  This makes sense anyway, since it would be a
mistake to have the query syntax infer magical properties about strings
that appear to be dates.  Better is to have a keyword in the query
syntax that introduces a date value:  something like date(<VALUE>)
would work.  So would to_date(<VALUE>) for those who know SQL. I would
have suggested date:<VALUE> but I think that already means something in
the QueryParser's lexical specification. (I don't actually use
QueryParser because the patches I've submitted previously haven't made
it in yet, and until they do, QP is fatally crippled for my purposes).

Eric

On Sun, 2 Jun 2002, Peter Carlson wrote:

> I like this idea of [GOOP:GOOP] as it gives the most flexibility. However,
> this requires the field to have a known characteristic like a date field,
> number field or text field correct? If you just use the static Field.Date
> this would require adding a new attribute the field class? I like this idea
> but I donĀ¹t know the difficulty / backward compatibility issues.
>
> If the extra field attribute is too difficult, then I suggest we use the
> nnnn-nn-nn format method so we can use the pattern to determine the data
> type.
>
> For number fields, should this support only integers, or decimal numbers
> too?
>
> I don't think we should use the : character, because we probably want to
> support time formats in the date format. Something like 03/01/2001 at
> 00:01:00. Maybe something like ">" or "|" or even "->" ?
>
> Also, inclusive vs. exclusive should be accounted for with the [ vs {
> characters.  I think this might already be done, but just wanted to throw it
> out there.
>
> --Peter
>
>
> On 6/2/02 2:13 AM, "Brian Goetz" <brian@quiotix.com> wrote:
>
> >>> How about:
> >>>
> >>>  DATE = nnnn-nn-nn
> >>>  NUMBER = n*
> >>>  RANGE = [ DATE : DATE ] | [ NUMBER : NUMBER ]
> >>>
> >>> An alternate, less parse-oriented approach would be this:
> >>>   RANGE = [ GOOP : GOOP ]
> >>> where
> >>>   GOOP = any string of letters/numbers not containing : or ].
> >>
> >> I'd go for the first one as it's more explicit.  However, perhaps the
> >> second approach is more extensible?
> >
> > When I first did the query parser, I defined terms by inclusion
> > (stating valid characters) instead of exclusion (excluding non-term
> > characters.)  Turns out I missed quite a few in the first go around,
> > which taught me the lesson (again) that sometimes trying to be too
> > specific is a rats nest.  What about dates like 02-Mai-2002 (not a
> > typo, french for May)?  Letting DateFormat figure it out has some
> > merit.
> >
> >> DateField(Date) and NumberField(int) sounds right, but wouldn't Field
> >> class make more sense?
> >
> > I had in mind static methods of Field, just like Field.Text --
> > Field.Date, Field.Number.   Sorry if that wasn't clear.  This seems
> > an easy addition.
> >
> > --
> > To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>

Mime
View raw message