lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: year range field, proper data type?
Date Wed, 07 Jul 2010 19:59:46 GMT
This isn't a very worrisome case. Most of the messages you see on the board
about
the dangers of dates arise because dates can be stored with many unique
values if
they include milliseconds. Then, when sorting on date your memory explodes
because
all the dates are loaded into memory.

In your case, there are a max of 10,000 years, which isn't the same
magnitude of problem
as, say, 10,000,000 documents each with a unique timestamp.

That said, you might as well go for as much speed as you can get and use a
trie int, that
way you won't be tripped up by three-digit years being out of lexical
order.....

Best
Erick

On Wed, Jul 7, 2010 at 10:55 AM, Jonathan Rochkind <rochkind@jhu.edu> wrote:

> So I will have a solr field that contains "years", ie, "1990", "2010",
> maybe even "1492", "1209" and "907"/"0907".
>
> I will be doing range limits over this field.  Ie, [1950 TO 1975] or what
> have you.  The data represents publication dates of books on a large library
> shelves; there will be around 3 million documents, with the range of data
> being concentrated in recent years, but with a long tail stretching off into
> the past.
>
> So it seems to me clear that I should use a trie field of some type, to
> efficiently accomodate the range querries.
>
> It seems to me that I probably don't need/want an actual date field, since
> the data isn't complex to demand it, it's just a four-digit year.
>
> So that pretty much leaves storing as a trie integer, or as a trie string.
>   Any advice on which is probably better in this case? Or on how to set up
> the trie field for this kind of data? Thanks for any,
>
> Jonathan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message