Thanks very much for this; I'll give it a shot.
Keith.
On 4 Jul 2008, at 00:02, Paul Smith wrote:
>>
>>
>> (there are around 6,000,000 posts on the message board database)
>>
>> Date encoded as yyMMdd: appears to be using around 30M
>> Date encoded as yyMMddHHmmss: appears to be using more than 400M!
>>
>> I guess I would have understood if I was seeing the usage double
>> for sure, or even a little more; no idea how you guys encode the
>> indexes, if at all, but it's gone up over tenfold, which I can't
>> explain.
>
> Sort memory cost is based on the total # of unique terms for the
> given field (multiplied by the number of locale's involved if you
> have to do that too! but in temporal sorting you don't).
>
> This is easier than you think, just use 2 fields (date, time) and
> sort by both. This means the Date field's unique term count grows
> only 1 term per day. The Time field can be set to minutes (if you
> can get away with that) meaning that you only have fairly
> insignificant total term count for the time field. We use this at
> Aconex, and have indexes with millions of records (weekly 'work'
> searcher refreshed every 5 seconds, archive searcher is held in
> memory, with a Multisearcher done over the 2) and it works a treat.
> We regularly need to return million+ results from a search (don't
> ask) using this sort of sorting and the overall search time is only
> a few seconds.
>
> On a related note, work hard not to need to use Locale sensitive
> sorting if you can for any other fields, for large results the CPU
> penalty is horrific (even once you get past the synchronization
> bottleneck in the CollationKey stuff).
>
> cheers,
>
> Paul Smith
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|