directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selcuk AYA <ayasel...@gmail.com>
Subject Re: Question about JDBM key/value replacement
Date Fri, 27 Apr 2012 16:35:39 GMT
On Fri, Apr 27, 2012 at 9:08 AM, Emmanuel Lécharny <elecharny@gmail.com> wrote:
> Le 4/27/12 5:46 PM, Selcuk AYA a écrit :
>
>> On Fri, Apr 27, 2012 at 6:06 AM, Emmanuel Lécharny<elecharny@gmail.com>
>>  wrote:
>>>
>>> Le 4/27/12 2:09 PM, Selcuk AYA a écrit :
>>>
>>>> Hi,
>>>> On Thu, Apr 26, 2012 at 4:52 PM, Emmanuel Lécharny<elecharny@gmail.com>
>>>>  wrote:
>>>>>
>>>>> Hi guys,
>>>>>
>>>>> currently, we can replace an existing value when its associated key
>>>>> exists
>>>>> in the BTree. What I'd like to do is to replace both key and value in
>>>>> this
>>>>> case.
>>>>>
>>>>> The rational is that the key mau contain some extra information that
>>>>> are
>>>>> not
>>>>> used to find the key, like the nbChildren/nbDescendant for a
>>>>> ParentIdAndrdn
>>>>> key in the RdnIndex.
>>>>>
>>>>> Is there an easy way to do that with JDBM, or should we extend the API
>>>>> it
>>>>> offers ?
>>>>
>>>> as I understand you are trying to store some auxiliary information per
>>>> key?
>>>
>>> Yep.
>>>
>>>
>>>>  With JDBM we can:
>>>>   - provide<K, aux>    tuple as the key and pass the key comparator
to
>>>> be the comparator for the actual keys.
>>>>   - change JDBM code to update key inside when replacing an entry.
>>>>
>>>> however, in general, I do not think we can expect a key/value store to
>>>> update the key when replacing an entry. So I am not sure it is a good
>>>> idea to do this kind of thing.
>>>
>>>
>>> The pb is that those auxiliary informations are not part of the compare
>>> being done. Ie, two keys may be seen as equals, but the auxiliary
>>> information might be different. In this very case, I'd like the new
>>> version
>>> of the key replaced.
>>> It may be very specific to my need, and we can also consider that the
>>> value
>>> might cary those extra informations, too, instead of storing them in the
>>> key.
>>>
>>> So for the<ParentIdAndRdn, ID>  tuple, where the data structure is :
>>> ParentIdAndRdn :
>>> {
>>>    ID parentId,
>>>    Rdn[] rdns,
>>>    int nbChildren,
>>>    int nbDescendant
>>> }
>>>
>>> we could inject the two ints after having read the value if this value
>>> structure was :
>>>
>>> AugmentedID
>>> {
>>>    ID id
>>>    int nbChildren,
>>>    int nbDescendant
>>> }
>>
>> I think this would be better. However, note that in txns branch, we
>> removed most of the generics and assumed that indices store UUID as
>> their value. I hope we go  with removal/add ( I guess this is what you
>> are doing now)
>
> Should not be a big problem here. Generics are an issue anyway...
>
OK, on txn branch the assumption that we store UUID as the value of an
index has to be removed then.

>>
>>> and serializing the ParentIdAndRdn ID and Rdn[] only.
>>
>> I am kind of confused, I have a couple of questions:
>>
>> 1) Can you explain why we need to store nbChildren and nbDescedant but
>> do not need to serialize it?
>
> We *do* serialize those elements. What I suggest is that we store those
> information in the values, not in the keys, as in any case, we will get a
> tuple <key, value>, so we will always have the extended informations.
>
>
>> What if the page storing an entry is
>> kicked out of cache and we loose nbChildren and nbDescendant?
>
> Then we will read them from disk, if the cache does not contain those
> information, right ?
>
>
>>
>> 2)Connected with the above question, how do we maintain these
>> auxiliary data? Are there maintained only in memory or on disk as
>> well? Do they have to absolutely correct to get consistent data out of
>> the server?
>
> On memory and on disk. They are absolutely critical as they are used to
> browe the tree while doing a One/Subtree scope search.
>
>
>>
>> 3) Lets say I have<rdn1, rdn2,rdn3>  and<rdn1,rdn2, rdn4>  . Thread
1
>> adds<rdn1, rdn2,rdn3, rdn5>  and thread 2 add<rdn1,rdn2, rdn4,rdn6>
>> concurrently.
>> These two threads run concurrently but they should not
>> logically conflict. Now they have to read nbDescendants of<rdn1,rdn2>
>> and increment it but they have to do it in such a way that the net
>> effect of these two threads is serialized.
>
> yep.
>
>
>> I am not sure how to do
>> this.
>
> do you mean that the indexes aren't gloablly protected when we modify some
> data ? I thought that a modification will protect all the index it will
> update until they have all been updated, then the next modification can
> operate.

No.
>
> We don't really care if that serialize the modifications, because the server
> does not have to be fast when we inject data, only when we read data. At
> least, we could think about a serialization over the operations on one index
> like the RdnIndex (as an entry modification may update more than one entry
> in this index).
>
> Is that a problem ?

Depends. What I have been describing in the emails and trying to
implement is an optimistic locking scheme where modifications can go
in parallel unless they conflict. It seems we could just get a big
lock for write txns  rather than dealing with optimistic concurrency
control.
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

thanks
Selcuk

Mime
View raw message