lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Exception: cannot determine sort type
Date Fri, 24 Dec 2004 06:52:38 GMT

On Dec 23, 2004, at 6:15 PM, Kauler, Leto S wrote:
> Erik Hatcher wrote:
>> *Everything* in Lucene is indexed as a string.  But how a
>> date looks as
>> a string is a topic unto itself.  I prefer to use YYYYMMDD as a date
>> formatted as a string (but when sorting, this could be treated as a
>> numeric).
>
> Will RangeQuery still work with that?  We do have separate date fields
> which are indexed like the following code, but a move to the YYYYMMDD
> format might be good as then we could apply a blanket String-type sort.
>
> public static Date parseDate( String s )
>    DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
>    return dateFormat.parse(s);
> }
> doc[0].add(Field.Keyword(field, parseDate( dateInString )));

Using YYYYMMDD works better for RangeQuery than Field.Keyword(String, 
Date) does.  Using the built-in Date field goes down to the millisecond 
level.  If you have lots of documents on the same day, but different 
milliseconds, you end up with lots of terms.  RangeQuery expands into a 
BooleanQuery OR'd with all the matching terms.  BooleanQuery has a 
built-in default of 1,024 allowed clauses, otherwise you get a 
TooManyClauses exception.

YYYYMMDD is a numeric, and to sort by that field I'd recommend you use 
a numeric type as it'll use much less memory.  But certainly doing some 
tests between using a numeric vs. String sorting type is advisable and 
see how it performs with each.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message