incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: SortWriter memory costs
Date Mon, 28 Dec 2009 01:10:32 GMT
On Sun, Dec 27, 2009 at 4:45 PM, Nathan Kurz <> wrote:
> This would be hard, though, so I'd suggest just sticking with your
> current approach and making your hash slightly more efficient.  Then
> think about how you can handle non-text fields directly --- instead of
> ords and offsets, handle dates, times, ints, floats, ratings etc.
> directly.  You can keep using an ord file for text, but treat this as
> an optimization rather than a requirement.    And think about how to
> make the intersegment merges efficient.

Simplifying up rereading my poorly organized response:

I think numeric types are likely the main use for sorting.  While
converting strings to ords does convert allow them to be compared as
ints, this seems like it should be the special case rather than the
one determining the architecture.  I'd love to see a model of how
efficient date sorting based on UNIX timestamps would work first, and
then figure out how to add in free text afterward.  It might turn out
that ords are great, but I think that a direct sorting model might be
simpler and more general.

Nathan Kurz

View raw message