directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: Mavibot vs JDBM results
Date Mon, 11 Nov 2013 20:49:56 GMT
On Mon, Nov 11, 2013 at 10:46 PM, Howard Chu <hyc@symas.com> wrote:

> Alex Karasulu wrote:
>
>>     Now, this is an approach where we used plain Keys (ie, Keys can have
>>     various sizes, which is not really efficient, as we may have to
>>     allocate more pages than necessary to store nodes and leaves.
>>     Open LDAP uses another approach, which is smarter : they use the hash
>>     value of each key to retrieve the element. Obviously, this leads to
>>     compare the keys when we reach the leaf, as we may have more than one
>>     key with the same hash value, and it also destroys the ordering (one
>>     can't compare two hash values as the ordering will be different) but
>>     most of the case, it's not really a big deal.
>>     The main advantage of such an approach is that suddenly, Nodes have a
>>     fixed size (a hash can be stored as an int, and the references to a
>>     page are longs), so in a fixed page size, we can store a fixed number
>>     of elements. Assuming that a node needs at least 28 bytes to store its
>>     header and PageIO, in a 512 bytes page we can store (512 - 28) /
>>     ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
>>     and 17 values (272 bytes). We hve 148 bytes remaining in this case.
>>     Atm, we store 16 element per node, which requires many physical pages,
>>     ie, many disk access.
>>
>>     This is something that worth being investigated in the near future.
>>
>>
>> Sounds like we need a minimal perfect order preserving hash function.
>>
>
> These are a pain to try to use for this purpose. All of the perfect order
> preserving hashes I've found only work because you generate the hash
> function based on knowing the entire set of data in advance. When you add
> new records you need to create a new hash function, and thus the hash
> values of every key changed.
>
> I.e., these are only useful for static data sets.
>

That's lame ... no silver bullets I guess.

-- 
Best Regards,
-- Alex

Mime
View raw message