hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuya6...@gmail.com>
Subject Re: Items to contribute (plan)
Date Tue, 25 Jan 2011 04:06:48 GMT
Thanks all for your replies. 

So besides the Japanese book chapter which has the strict deadline, item 1 "RADOS (Ceph) integration"
will be the first thing to work on. I'll open issue and work publicly as Stack suggested.
Hopefully, I'll start to play with the API soon and write a design proposal for critique.


Item 2 "mapreduce.HFileInputFormat" will come next, but it seems we need more discussions
on the features and benefits so we can be sure if it's worth to work on. I'll post separate
reply later.

For now, item 3 "Enhance Get/Scan performance on RS" gets the lowest priority, and I'll work
on it as a personal project.

- Tatsuya

--
Tatsuya Kawano
Tokyo, Japan


On Jan 25, 2011, at 2:54 AM, Stack <stack@duboce.net> wrote:

> On Sat, Jan 22, 2011 at 5:18 PM, Tatsuya Kawano <tatsuya6502@gmail.com> wrote:
>> 1. RADOS integration
>> 
>> Run HBase not only on HDFS but also RADOS distributed object store (the lower layer
of Ceph), so that the following options will become available to HBase users:
>> 
>> -- No SPOF (RADOS doesn't have the name node(s), but only ZK-like monitors and data
nodes)
>> -- Instant backup of HBase tables (RADOS provides copy-on-write snapshot per object
pool)
>> -- Extra durability option on WAL (RADOS can do both synchronous and asynchronous
disk flush. HDFS doesn't have the earlier option)
>> 
>> Note:
>> RADOS object = HFile, WAL
>> object pool = group of HFiles or WAL
>> 
>> Current status: Design phase
>> 
> 
> I know a few people are interested in this Tatsuya so would suggest
> that you open issue now and work publicly.
> 
> 
>> 2. mapreduce.HFileInputFormat
>> 
>> MR library to read data directly from HFiles. (Roughly 2.5 times faster than TableInputFormat
in my tests)
>> 
>> Current status: Completed a proof-of-concept prototype and measured performance.
>> 
> 
> What about the in-memory edits?  Or you thinking of reading the WALs too?
> 
> 
> 
>> 3. Enhance Get/Scan performance of RS
>> 
>> Add an hash code and a couple of flags to HFile at the flush time and change scanner
implementation so that:
>> 
>> -- Get/Scan operations will get faster. (less key comparisons for reconstructing
a row: O(h * c) -> O(h).  [h = number of HFiles for the row, c = number of columns in an
HFile])
>> -- The size of HFiles will become a bit smaller. (The flags will eliminate duplicate
bytes in keys (row, column family and qualifier) from HFiles.)
>> 
>> Current status: Completed a proof-of-concept prototype and measured performance.
>> 
>> Detals:
>> https://github.com/tatsuya6502/hbase-mr-pof/
>> (I meant "poc" not "pof"...)
>> 
> 
> Sounds great.
> 
> 
>> 4. Writing Japanese books and documents
>> 
>> -- Currently I'm authoring a book chapter about HBase for a Japanese NOSQL book
>> -- I'll translate The Apache HBase Book to Japanese
>> 
>> 
> 
> All of the above sound great Tatsuya.
> Thanks,
> St.Ack

Mime
View raw message