hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy" <sa...@pearsonwholesale.com>
Subject Re: hypertable
Date Sat, 16 Feb 2008 07:18:02 GMT
I thank we are on the right path in providing a stable platform as we go 
then add features and improvements where needed and when everything is 
running correctly and MTBF is higher. I see us added improvements as we go 
now but there reveling them self in stability not test numbers.

Currently there is not many issues open that would keep me from running 
hbase in production setting
only major one is hadoop appends for hbases logs, but I see this issue is 
making progress with its sub issues in the hadoop project.

I also thank now that we are a subproject of hadoop we will pick up speed in 
fixing bugs and improvements and releases and feedback  will be more common.

Billy


"stack" <stack@duboce.net> wrote in message 
news:47B5FDB4.10004@duboce.net...
>A couple of us (JimK, Chad, and myself) went down to see the Hypertable 
>fellas, Doug Judd, Luke Lu and Gordon Rios.  The lads were gracious hosts; 
>they bought us lunch and fed us good coffee.
>
> What we learned:
>
> + They have an interface that each FileSystem implements.  Its basic: 
> open, close, seek, read, write, flush, pread.  They underlined presence of 
> asynchronous read in API.
>
> + To get to a filesytem implementation -- e.g. HDFS -- they go via a 
> 'broker'.  Broker is a server that implements the FileSystem interface. 
> This extra-hop abstraction will allow them to go against stores other than 
> HDFS.
>
> + They have their own file format rather than depend on FileSystem types 
> such as SequenceFile as hbase does.  Its made of blocks (64k or 64M, I 
> don't remember which).  At end of file is a block index.  Blocks are 
> compressed.  Keys are 
> row/single-byte-column-family/column/single-byte-type/timestamp (IIRC). 
> The single-byte-column-family is used to lookup into their chubby (called 
> Hyperspace) where database schema is stored (schema has the column family 
> name, attributes, etc).  The single-byte-type indicates cell type whether 
> insert, delete, column-family delete or row delete.
>
> + To read, they open, read a block, and then run the decompress, parse 
> keys and values over in C++ land.  Talked up fact that they can do 
> read-ahead; i.e. prefetch the next block so its too hand when scanner 
> crosses over into it.
>
> + To write, they just call append.   In the HDFS case, the broker just 
> saves up the data and then writes it out when close is called.
>
> + They haven't played with random reads.  Currently, if a rangeserver goes 
> down, the cluster is hosed (This is their highest priority at moment and 
> should be addressed soon).  We'll probably standardize on the bigtable 
> Performance Evaluation though it intentionally frustrates compression --  
> it uses random values -- so their compression work won't have a chance to 
> shine.
>
>
> Thoughts:
>
> + Their keying is better than hbase's.  We're missing the typing (we use 
> 'special' values to indicate cell delete).  Using codes to represent 
> families we should also do (I've been thinking we need such a thing for 
> both tables and columns every time I look at a meta scan in our master 
> logs).  We should consider using the code in keys also.
>
> + At first I was thinking the read-ahead a nice idea but thinking on it 
> more, methinks it won't buy us much.  IIRC, DFSClient blocks when you go 
> off the end of one block while it closes socket to current datanode and 
> puts up socket against the datanode that has the next block.  But hbase 
> usually writes out files that are the HDFS 64M block size or less.  This 
> means, usually, running a compaction of flushes, we shouldn't be doing 
> reads over the top of socket reconnects.  Lets measure.  Regardless, we 
> should fix this blocking, if this is indeed the case, either in DFSClient 
> or at the application layer at Tom Whites' block caching level.
> + In their postings on hypertable -- on their website and in responses to 
> the slashdotting of hypertable -- there is the implication that HT is a 
> more 'true' implementation of bigtable paper.  One area in particular that 
> comes up is hbase's lack of support for 'locality groups'.  No one has as 
> yet asked for this "store of stores" feature.   I can see that if you've 
> botched your schema design up front or your access pattern changes over 
> the life of the application and you want to 'join' two column families, 
> it'd be useful (no need to change how the client accesses the table).  We 
> should probably add this facility, but seems low priority to me.
>
> + Their postings also talk up better compression options and of how they 
> include this and that compression algorithm lib natively whereas java has 
> to go across JNI chasms and even then, java takes 2 to 3 times the memory 
> C++ does and even then, its accesses are slower, etc.   On compression, 
> we've done little in hbase.  Its possible to enable it but we've not done 
> any profiling using SequenceFile compression options, record vs. block, 
> etc.  What with i/o always being orders of magnitude slower than any other 
> accesses and what with CPUs getting faster and faster, there is a point at 
> which using compression to get more data off the disk all in the one go 
> becomes a win.  We should spend some time looking at our options here when 
> we go about making HBaseMapFile.
>
> + If I was to synopsize my impression of HT using a single word only, 
> 'performance' would seem to be foremost.  For hbase, performance is 
> important but at the moment, our roadmap has 'robustness' and 
> 'scalability' as focus.  Should we be spending more time on performance 
> issues?
>
> Comments?
> St.Ack
>
> 




Mime
View raw message