hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1249) Rearchitecting of server, client, API, key format, etc for 0.20
Date Tue, 10 Mar 2009 22:17:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680649#action_12680649
] 

Jonathan Gray commented on HBASE-1249:
--------------------------------------

In the reworking of basically everything, I'd like to propose we change server-side methods
to allow optimizations wherever possible and client APIs to more closely reflect implementation.

A _very_ rough draft to show what i'm talking about:

getColumnsLatest(byte [] row, byte [][] columns)  - only takes columns, no families
getFamiliesLatest(byte [] row, byte [][] families)  - only takes families

getColumnsVersions(byte [] row, byte [][] columns, int numVersions)

getColumnsVersionsAfter(byte [] row, byte [][] columns, long afterStamp)
getColumnsVersionsBefore(byte [] row, byte [][] columns, long beforeStamp)

getLatest(byte [] row) implementation is the same as getFamiliesLatest() with all families
specified.


It's easy to see now how splitting families and columns into two fields will not at all work
with the current API.  Need a more hierarchical client api, client utilities, something more
like BatchUpdate even for reads, ...

Also, when dealing with versions (or latest), we will not be able to do most of the optimizations
if the client can manually specify the timestamp as described above.

A few reasons to do this.  For one, it is more clear to users how things are being implemented.
 But more importantly, it makes sure we're writing a server-side method for all the different
cases for which we can make optimizations.  Right now getting explicitly listed columns shares
code with getting all columns for explicitly listed families.  These two things each contain
their own unique possibilities for optimization.  There are also different optimizations to
be made for deletes and more well-defined read types will make the cell cache easier.

> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> To discuss all the new and potential issues coming out of the change in key format (HBASE-1234):
zero-copy reads, client binary protocol, update of API (HBASE-880), server optimizations,
etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message