hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Items to contribute (plan)
Date Sun, 23 Jan 2011 01:45:22 GMT
#1 looks similar to what MapR has done.

On Sat, Jan 22, 2011 at 5:18 PM, Tatsuya Kawano <tatsuya6502@gmail.com>wrote:

>
> Hi,
>
> I wanted to let you know that I'm planning to contribute the following
> items to the HBase community. These are my spare time projects and I'll only
> be able to spend my time about 7 hours a week, so the progress will be very
> slow. I want some feedback from you guys to prioritize them. Also, if
> someone/team wants to work on them (with me or alone), I'll be happy to
> provide more details.
>
>
> 1. RADOS integration
>
> Run HBase not only on HDFS but also RADOS distributed object store (the
> lower layer of Ceph), so that the following options will become available to
> HBase users:
>
> -- No SPOF (RADOS doesn't have the name node(s), but only ZK-like monitors
> and data nodes)
> -- Instant backup of HBase tables (RADOS provides copy-on-write snapshot
> per object pool)
> -- Extra durability option on WAL (RADOS can do both synchronous and
> asynchronous disk flush. HDFS doesn't have the earlier option)
>
> Note:
> RADOS object = HFile, WAL
> object pool = group of HFiles or WAL
>
> Current status: Design phase
>
>
> 2. mapreduce.HFileInputFormat
>
> MR library to read data directly from HFiles. (Roughly 2.5 times faster
> than TableInputFormat in my tests)
>
> Current status: Completed a proof-of-concept prototype and measured
> performance.
>
>
> 3. Enhance Get/Scan performance of RS
>
> Add an hash code and a couple of flags to HFile at the flush time and
> change scanner implementation so that:
>
> -- Get/Scan operations will get faster. (less key comparisons for
> reconstructing a row: O(h * c) -> O(h).  [h = number of HFiles for the row,
> c = number of columns in an HFile])
> -- The size of HFiles will become a bit smaller. (The flags will eliminate
> duplicate bytes in keys (row, column family and qualifier) from HFiles.)
>
> Current status: Completed a proof-of-concept prototype and measured
> performance.
>
> Detals:
> https://github.com/tatsuya6502/hbase-mr-pof/
> (I meant "poc" not "pof"...)
>
>
> 4. Writing Japanese books and documents
>
> -- Currently I'm authoring a book chapter about HBase for a Japanese NOSQL
> book
> -- I'll translate The Apache HBase Book to Japanese
>
>
> Thank you,
>
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>
> http://twitter.com/#!/tatsuya6502 <http://twitter.com/#%21/tatsuya6502>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message