hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1249) Rearchitecting of server, client, API, key format, etc for 0.20
Date Wed, 29 Apr 2009 19:25:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704271#action_12704271

Jonathan Gray commented on HBASE-1249:

(this was written together with erik holstad)


We don't have any profiling stats or end-to-end numbers at the moment, but we do have are
some early naive timing test that compare the 2 implementations and already for very few inserts
like 10 to 15 entries with no deletes in 2 storefiles + memcache I saw around 2x improvment.
These test were only done on the server and I have not tested for bigger tables and from the
client, since I only got the rpc to work the other day and we've done some reworks since then,
but it looks promising.

When we looked at the get code and how to make it faster we started to look what structures
were used and how many compares you have to do to get your result. Because to me the way to
speed things up is to cut down the total number of compares that you need to do to get your
result. Of course the total fetch time is made up by a number of causes, but on the server,
outside of HFile/HDFS, where I think most time is spent is comparing entries to see if they
should be included in the result or not (this decision made by looking at the query to see
if it was asked for, comparing for deletes, checking for ttl, maxVersions requested, timeRange,

So what I saw when looking at the old code is that for every new entry from the storefile
you compare it to all entries in the get set and then for every match between those you have
to check to see it is has been deleted.

Let {{*k*}} be the number KeyValues we will look at.  Let {{*l*}} be the number of columns
asked for by the client.

Old Way: We compare every entry from the storefile (k) with every entry from the getset (l)
which will give you {{*k * l*}} compares in total.

If the getset is sorted (storefile/memcache always is), then you are just doing a sorted merge
between two sorted lists.

New Way: We do a sorted merge between the entries in the storefile (k) with the entries in
the sorted getset (l) which will give you {{*k + l*}} compares in total.

The difference between {{*k * l*}} and {{*k + l*}} can be significant if you are matching
more than one column.

- current plan is to build the sortedset behind the scenes on the client in the Get
- this insertion sort will have a negligible impact on clients and no additional work will
be required server-side
- we will send the sortedset over the wire as a list.  in that case, it will not have to be
(this last optimization is important.  right now we do a LOT of serializing a sorted tree,
sending it over the wire, and then deserializing it.  the deserializing of trees actually
rebuilds the entire tree, thus reperforming the sort.  using lists (which are sorted) will
give us time and memory savings.)

On the delete part, we have two things to deal with.  Checking current KVs against the current
DeleteSet, and adding newly seen deletes into the DeleteSet.

Let {{*k*}} be the number of KeyValues we will look at.  Let {{*m*}} be the number of deletes
we have already seen.  Let {{*n*}} be the number of new deletes we will see.

Old way of checking current KVs agains the current DeleteSet:
- DeleteSet is a sortedSet, so it has {{*log(m)*}} inserts/gets/deletes/contains operations.
- For each KV seen (k), we have a {{*log(m)*}} operation to see if it is deleted.
- This will give you {{*k * log(m)*}} total comparisons.

Old way of inserting newly seen delete KVs into current DeleteSet:
- When we see a delete, we insert it to the deleteset in {{*log(m)*}}.  This has downsides.
- We're adding deletes which we know will not be relevant in this storefile (increasing m
increases subsequent delete checks so we have log(m + n) eventually).
- We're doing work which we may not potentially ever have to do (merging newly seen delete
with previous set).  If we are going to early-out of this storefile for some reason, there
would never have been a reason to pay anything but constant-time cost for deletes related
to older storefiles which will not end up being opened.
- Trees are bad!  Their allocation of lots of small nodes is a disaster for GC.  We should
use lists everywhere we can.
- This gives you {{*n * log(m)*}} total comparisons.

The new approach is similiar to how we now handle the get list.  Deletes will be stored in
two lists.  A currentDeleteList and a newDeleteList.

To start, the memcache has no deletes to look at.  You will, however, add any deletes seen
to the newDeleteList (in constant time, no significant extra work).  Since memcache (and storefiles)
are sorted, newDeleteList is sorted.

When moving to the next storefile, you will merge newDeleteList with currentDeleteList.

New way of handling deletes:
- currentDeleteList is a sorted ArrayList of deletes.  there is a DeleteFamily special-case
that just stores a stamp (which implies it's existence).
- We do a sorted merge of KVs seen (k) with the currentDeleteList (m), for {{*k + m*}} total
- Each seen new delete is a O(1) append to the newDeleteList.
- At the end of the storefile, you will then need to merge the delete lists, for {{*m + n*}}
total comparisons.
- This will give you {{*(k + m) + (m + n)*}} total comparisons for each storefile.

The difference in total is: Old = {{*(k + n) * log(m)*}} vs New = {{*(k + n) + 2m*}}

The algorithmic improvements seem clear.  The replacement of Trees/SortedSets with (Sorted)Lists/ArrayLists
will also be beneficial for GC/heap usage.

Additional complexity in things like merging delete lists are not a big deal if we make thorough
unit tests (which we have already done in that case).

> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>         Attachments: HBASE-1249-Example-v1.pdf, HBASE-1249-Example-v2.pdf, HBASE-1249-GetQuery-v1.pdf,
HBASE-1249-GetQuery-v2.pdf, HBASE-1249-GetQuery-v3.pdf, HBASE-1249-GetQuery-v4.pdf, HBASE-1249-StoreFile-v1.pdf,
> To discuss all the new and potential issues coming out of the change in key format (HBASE-1234):
zero-copy reads, client binary protocol, update of API (HBASE-880), server optimizations,

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message