lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <kor...@lycos.com>
Subject Re: Bug? QueryParser may not correctly interpret RangeQuery text
Date Wed, 05 Jun 2002 17:20:22 GMT
 what about timestamp and do a number search or range search ( "[123 124]" )??

--

On Wed, 05 Jun 2002 00:01:46  
 Peter Carlson wrote:
>I guess from my perspective we are at
>
>field:[<goop>-><goop>]
>
>The delimiter is not yet defined, but the options currently discussed are
>-
>->
>;
>:
>|
>>
>
>The problem with - and : is that they may be part of a date format.
>
>The action taken by the QueryParser would depend on the type of field we
>were using (if that were an easy change). For Date fields, it would convert
>the <goop> to a Date using the SimpleDateFormat and try to guess the format
>(I think it will handle the ISO 8601 formats).
>
>OR
>
>If adding a type to a field is difficult, then the next option is to just
>support a date range and assume the data is a date.
>
>OR
>
>If adding a type to a field is difficult and we don't want to just support a
>Date format, then we would create a specific format like
>YYYY/MM/DDTHH:MM:SS
>For dates and just a set of digits for numbers.
>
>
>Does that sound about right? If so what's are people preference?
>
>
>My preferences are 
>Solve with Option 3 now, but determine how to solve with option 1.
>
>Delimiter preference would be ">" It seem intuitive to me.
>
>--Peter
>
>
>
>On 6/4/02 10:17 PM, "Otis Gospodnetic" <otis_gospodnetic@yahoo.com> wrote:
>
>> Hello,
>> 
>> Just curious what the status of this issue is, as the discussion seems
>> to have stopped.
>> 
>> --- "Eric D. Friedman" <eric@conveysoftware.com> wrote:
>>> Instead of reinventing the wheel for representing dates, how about
>>> using an existing standard?  ISO 8601 defines a simple lexical
>>> representation for dates, times (with optional millisecond
>>> precision),
>>> and timezones that is easy to implement.  This is what's used in the
>>> XML Schema "dateTime" datatype.
>>> 
>>> A summary of the ISO 8601 notation is available here:
>>> http://www.cl.cam.ac.uk/~mgk25/iso-time.html
>>> 
>>> The documentation for the XML Schema dateTime datatype is here:
>>> http://www.w3.org/TR/xmlschema-2/#dateTime
>> 
>> I agree, that is why I immediately suggested YYYY-MM-DD.  I dislike
>> U.S.-centric or Europe-centric approaches when there is a standard
>> format.
>> 
>>> I whipped up a JavaCC parser to handle this lexical representation
>>> (see
>>> attachment).
>>> 
>>> Note that for this to be useful in QueryParser, it's going to need
>>> its
>>> own lexical state.  This makes sense anyway, since it would be a
>>> mistake to have the query syntax infer magical properties about
>>> strings
>>> that appear to be dates.  Better is to have a keyword in the query
>>> syntax that introduces a date value:  something like date(<VALUE>)
>>> would work.  So would to_date(<VALUE>) for those who know SQL. I
>>> would
>>> have suggested date:<VALUE> but I think that already means something
>>> in
>>> the QueryParser's lexical specification. (I don't actually use
>>> QueryParser because the patches I've submitted previously haven't
>>> made
>>> it in yet, and until they do, QP is fatally crippled for my
>>> purposes).
>> 
>> I'll try to look for your patches in the archives (if you have the URL
>> handly please send it to me), so that I can put it on the TODO list, if
>> it makes sense to do so.
>> As for the above comments about the parser, I'm afraid I'm still a
>> JavaCC neophite. I don't dislike date(<VALUE>) approach.  If users can
>> grasp field:value they shouldn't have a problem with field:date(value),
>> I think.
>> 
>> Otis
>> 
>> 
>>> On Sun, 2 Jun 2002, Peter Carlson wrote:
>>> 
>>>> I like this idea of [GOOP:GOOP] as it gives the most flexibility.
>>> However,
>>>> this requires the field to have a known characteristic like a date
>>> field,
>>>> number field or text field correct? If you just use the static
>>> Field.Date
>>>> this would require adding a new attribute the field class? I like
>>> this idea
>>>> but I don?t know the difficulty / backward compatibility issues.
>>>> 
>>>> If the extra field attribute is too difficult, then I suggest we
>>> use the
>>>> nnnn-nn-nn format method so we can use the pattern to determine the
>>> data
>>>> type.
>>>> 
>>>> For number fields, should this support only integers, or decimal
>>> numbers
>>>> too?
>>>> 
>>>> I don't think we should use the : character, because we probably
>>> want to
>>>> support time formats in the date format. Something like 03/01/2001
>>> at
>>>> 00:01:00. Maybe something like ">" or "|" or even "->" ?
>>>> 
>>>> Also, inclusive vs. exclusive should be accounted for with the [ vs
>>> {
>>>> characters.  I think this might already be done, but just wanted to
>>> throw it
>>>> out there.
>>>> 
>>>> --Peter
>>>> 
>>>> 
>>>> On 6/2/02 2:13 AM, "Brian Goetz" <brian@quiotix.com> wrote:
>>>> 
>>>>>>> How about:
>>>>>>> 
>>>>>>>  DATE = nnnn-nn-nn
>>>>>>>  NUMBER = n*
>>>>>>>  RANGE = [ DATE : DATE ] | [ NUMBER : NUMBER ]
>>>>>>> 
>>>>>>> An alternate, less parse-oriented approach would be this:
>>>>>>>   RANGE = [ GOOP : GOOP ]
>>>>>>> where
>>>>>>>   GOOP = any string of letters/numbers not containing : or ].
>>>>>> 
>>>>>> I'd go for the first one as it's more explicit.  However,
>>> perhaps the
>>>>>> second approach is more extensible?
>>>>> 
>>>>> When I first did the query parser, I defined terms by inclusion
>>>>> (stating valid characters) instead of exclusion (excluding
>>> non-term
>>>>> characters.)  Turns out I missed quite a few in the first go
>>> around,
>>>>> which taught me the lesson (again) that sometimes trying to be
>>> too
>>>>> specific is a rats nest.  What about dates like 02-Mai-2002 (not
>>> a
>>>>> typo, french for May)?  Letting DateFormat figure it out has some
>>>>> merit.
>>>>> 
>>>>>> DateField(Date) and NumberField(int) sounds right, but wouldn't
>>> Field
>>>>>> class make more sense?
>>>>> 
>>>>> I had in mind static methods of Field, just like Field.Text --
>>>>> Field.Date, Field.Number.   Sorry if that wasn't clear.  This
>>> seems
>>>>> an easy addition.
>>>>> 
>>>>> --
>>>>> To unsubscribe, e-mail:
>>> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>>>>> For additional commands, e-mail:
>>> <mailto:lucene-dev-help@jakarta.apache.org>
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe, e-mail:
>>> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>>>> For additional commands, e-mail:
>>> <mailto:lucene-dev-help@jakarta.apache.org>
>>>> 
>>>> PARSER_BEGIN(ISO8601Parser)
>>> 
>>> import java.io.*;
>>> import java.util.*;
>>> import java.text.*;
>>> 
>>> public class ISO8601Parser {
>>> 
>>>   static DateFormat fmt;
>>> 
>>>   public static void main(String args[]) throws ParseException {
>>>     String date;
>>> 
>>>     //date = "1999-05-31T13:20:00Z";
>>>     //date = "1999-05-31T13:20:00-00:01";
>>>     date = "1999-05-31T13:20:00.999-08:00";
>>> 
>>>     TimeZone utc = TimeZone.getTimeZone("UTC");
>>>     fmt = DateFormat.getDateTimeInstance();
>>>     fmt.setTimeZone(utc);
>>> 
>>>     ISO8601Parser parser = new ISO8601Parser(new StringReader(date));
>>>     Date d = parser.date();
>>>     System.out.println(fmt.format(d));
>>>   }
>>> }
>>> 
>>> PARSER_END(ISO8601Parser)
>>> 
>>> TOKEN :
>>> {
>>>   <#DIGIT: ["0"-"9"]>
>>> | <TWOD: <DIGIT><DIGIT>>         // two digits used for day,
month,
>>> hours, minutes, seconds
>>> | <MILLIS: <TWOD><DIGIT>>        // millisecond precision is
000 ..
>>> 999
>>> | <YEAR: <TWOD><TWOD>(<DIGIT>)*> // at least 4 digits,
but possibly
>>> more
>>> | <DASH: "-">                    // delimiter for CCYY-MM-DD; doubles
>>> as minus sign for signed ints
>>> | <COLON: ":">                   // delimiter for hh:mm:ss
>>> | <DOT: ".">                     // delimiter for ss.mmm
>>> (milliseconds)
>>> | <T: "T" >                      // delimiter between date and time
>>> | <Z: "Z" >                      // UTC timezone
>>> | <PLUS: "+">                    // indicates positive offset from
>>> UTC
>>> }
>>> 
>>> /**
>>>  * Input to this production is a series of tokens matching the
>>> following specification:
>>>  * CCYY-MM-DD -- a date with no time specification<br>
>>>  * CCYY-MM-DDThh:mm:ss -- a timestamp implicitly in the UTC
>>> timezone<br>
>>>  * CCYY-MM-DDThh:mm:ssZ -- a timestamp explicitly in the UTC
>>> timezone<br>
>>>  * CCYY-MM-DDThh:mm:ss-08:00 -- a timestamp with a negative 8 hour
>>> offset from UTC<br>
>>>  * CCYY-MM-DDThh:mm:ss.mmm -- a timestamp with millisecond
>>> precision<br>
>>>  * -CCYY-MM-DD -- a date whose year is before the common era
>>> (BCE)<br>
>>>  * NNCCYY-MM-DD -- a date whose year is > 9999<br>
>>>  *
>>>  * <p> Note that years greater than 9999 are allowed, but that 0000
>>> is not a valid year.
>>>  * Negative numbers are allowed when representing years BCE.
>>>  * </p>
>>>  *
>>>  * <p>Milliseconds are optional in the seconds field.  The timezone
>>> indicator is optional.
>>>  * </p>
>>>  *
>>>  *@return a java.util.Date instance in the UTC timezone, with
>>> millisecond precision.
>>>  */
>>> Date date() :
>>> {
>>>   int CCYY = 0, MM = 0, DD = 0, hh = 0, mm = 0, ss = 0, millis = 0;
>>>   int deltahh = 0, deltamm = 0;
>>>   boolean deltaPlus = true;
>>>   Calendar c = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
>>> }
>>> {
>>>   CCYY = year() <DASH>
>>>   MM = twod() <DASH>
>>>   DD = twod()
>>>   {
>>>     MM--; // months are 0 based
>>>     c.set(c.YEAR, CCYY);
>>>     c.set(c.MONTH, MM);
>>>     c.set(c.DAY_OF_MONTH, DD);
>>>   }
>>>   (
>>>     <T>
>>>     hh = twod() <COLON>
>>>     mm = twod() <COLON>
>>>     ss = twod()
>>>     {
>>>       c.set(c.HOUR_OF_DAY, hh);
>>>       c.set(c.MINUTE, mm);
>>>       c.set(c.SECOND, ss);
>>>     }
>>>     (
>>>       <DOT>
>>>       millis = millis()
>>>       {
>>>         c.set(c.MILLISECOND, millis);
>>>       }
>>>     )?
>>>     (
>>>       <Z> // we're already in UTC, so no adjustment needed
>>>       |
>>>       (
>>>         (
>>>           <PLUS> // somewhere ahead of UTC (east of Greenwich)
>>>           |
>>>           <DASH> // behind UTC (west of Greenwich)
>>>           {
>>>             deltaPlus = false;
>>>           }
>>>         )
>>>         deltahh = twod() <COLON>
>>>         deltamm = twod()
>>>         {
>>>           if (! deltaPlus) {
>>>             deltahh = -deltahh;
>>>             deltamm = -deltamm;
>>>           }
>>>           // millisecond offset
>>>           int offsetFromUTC = ((deltahh * 60) + deltamm) * 60 * 1000;
>>>           c.set(c.ZONE_OFFSET, offsetFromUTC);
>>>         }
>>>       )
>>>     )?
>>>   )?
>>>   {
>>>     return c.getTime();
>>>   }
>>> }
>>> 
>>> int millis() :
>>> {
>>>   Token t;
>>> }
>>> {
>>>   t = <MILLIS> {
>>>     return Integer.parseInt(t.image);
>>>   }
>>> }
>>> 
>>> int twod() :
>>> {
>>>   Token t;
>>> }
>>> {
>>>   t = <TWOD> {
>>>     return Integer.parseInt(t.image);
>>>   }
>>> }
>>> 
>>> int year() :
>>> {
>>>   Token t;
>>>   boolean positive = true;
>>> }
>>> {
>>>   (
>>>     <DASH>
>>>     {
>>>       positive = false;
>>>     }
>>>   )?
>>>   t = <YEAR> {
>>>     int year = Integer.parseInt(t.image);
>>>     if (year == 0) {
>>>       throw new IllegalArgumentException("0000 is not a legal year");
>>>     }
>>>     return positive ? year : -year;
>>>   }
>>> }
>>>> --
>>> To unsubscribe, e-mail:
>>> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>>> For additional commands, e-mail:
>> <mailto:lucene-dev-help@jakarta.apache.org>
>> 
>> 
>> __________________________________________________
>> Do You Yahoo!?
>> Yahoo! - Official partner of 2002 FIFA World Cup
>> http://fifaworldcup.yahoo.com
>> 
>> --
>> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>> 
>> 
>
>
>--
>To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>
>


_______________________________________________________
WIN a first class trip to Hawaii.  Live like the King of Rock and Roll
on the big Island. Enter Now!
http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message