jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@day.com>
Subject Re: Query that sorts a large result set.
Date Thu, 18 Jun 2009 11:25:18 GMT
On Thu, Jun 18, 2009 at 09:37, Ard Schrijvers <a.schrijvers@onehippo.com> wrote:
> If you happen to find the holy grail solution, I suppose you'll let us know
> :-) Also if you would have some memory usage numbers with and without the
> suggestion of mine regarding reducing the precision of you Date field, this
> would be very valuable.

hmm, I'm been thinking about a solution that I would call
flyweight-substring-collation-key. it assumes that there is usually a
major overlap of substrings of the the values to sort on. i.e. a
lastModified value. so instead of always keeping the entire value we'd
have a collation key that references multiple reusable substrings.

assume we have the following values:

- msqyw2shb
- msqyw2t93
- msqyw2u0v
- msqyw2usn
- msqyw2vkf
- msqyw2wc7
- msqyw2x3z
- msqyw2xvr
- msqyw2ynj
- msqyw2zfb

(those are date property values each 1 second after the previous one)

we could create collation keys for use as comparable in the field
cache like this:

substring cache:
[0] msq
[1] shb
[2] t93
[3] u0v
[4] usn
[5] vkf
[6] wc7
[7] x3z
[8] xvr
[9] ynj
[10] yw2
[11] zfb

and then the actual comparable that reference the substrings in the cache:

- {0, 10, 1}
- {0, 10, 2}
- {0, 10, 3}
- {0, 10, 4}
- {0, 10, 5}
- {0, 10, 6}
- {0, 10, 7}
- {0, 10, 8}
- {0, 10, 9}
- {0, 10, 11}

this will result in a lower memory consumption and using the reference
indexes could even speed up the comparison.

a quick test with 1 million dates values showed that the memory
consumption drops to 50% with this approach.

regards
 marcel

Mime
View raw message