db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: BTree package documentation
Date Tue, 06 Sep 2005 23:23:59 GMT
No, currently branch rows also include the RowLocation.  For most of
the code RowLocation is basically opaque, so most of the code just
treats the btree as follows:
o btree is created with N columns
o Every row must be unique when comparing all N columns
o Branch rows contain N+1 columns - all the columns from the leaf
  entry plus one to tell it which page is the sibling.

There is no reason that the branch rows always need the row location,
the minimum requirement is that 2 branchrows must uniquely discriminate
the keys on 2 different leaf pages.  Again for simplicity it is assumed
that the keys on a branch page are never duplicate, this simplifies
binary search somewhat.

Imagine a one column SQL non-unique index for which all the rows have
the same value.  In the case of a delete on a row in the base table,
a search on the index which includes the RowLocation would not be
able to efficiently find the right row if the RowLocation was not
included in the branch rows.

One could create compressed branch entries if the RowLocation were
not needed to differentiate between 2 adjacent entries (ie.
for 2 adjacent branch entries: mike, row loc 1, leaf page 113; stan,
row loc 2, leaf page 112 -- for these 2 branch rows the minimum
suffix of mike/stan would be enough without the row location).
Because the page/row format in derby does not require constant number
of column rows such a change would not be too hard.  One could also
do suffix compression within a column - for instance in the previous
example only m/s is really necessary, but such a decision is very
datatype dependent and needs to be supported by the datatype itself -
not a decision by store looking at the bytes.

Dibyendu Majumdar wrote:

> Mike Matrigali wrote:
>> See comments below, once updated I will work on getting it into the
>> codeline.
>> Comments on unique key handling.
>>> From one perspective the current btree implementation assumes all rows
>> it handles are unique.  This is because the row includes the the last
>> column which is the address of the base table row.  One could not use
>> the current btree implementation to store 2 rows which were exactly
>> the same including the row location.  At this level having all rows
>> being unique makes the code simpler as binary searches don't have to
>> worry about edge cases of duplicate keys.
>> Having said that the btree implementation does support rejecting rows
>> which are not unique given a subset of the columns of the row.
>> Currently this is only used to compare whether index rows are unique
>> comparing all columns other than the last one.  So to create a unique
>> SAL index you create a btree telling it to enforce uniqueness on
>> number of columns - 1; and to create a non-unique SQL index you create a
>> btree telling it to enforce uniqueness on all columns.
> Mike,
> I assume you mean that at the leaf level, no two IndexRows can contain
> the same set of keys
> and also the RowLocation. I assume my understanding is correct that
> BranchRows do not
> contain the RowLocation.
> I will send you an updated html - do you want the links in Javadoc format?
> Thanks
> Dibyendu

View raw message