hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase
Date Mon, 17 Sep 2012 19:22:51 GMT
Hi Jason,

On Mon, Sep 17, 2012 at 6:55 AM, Dai, Jason <jason.dai@intel.com> wrote:
> I'd like to announce Project Panthera, our open source efforts that showcase better data
analytics capabilities on Hadoop/HBase (through both SW and HW improvements), available at
> 2)      A document store (built on top of HBase) for better query processing
>    Under Project Panthera, we will gradually make our implementation of the document
store available as an extension to HBase (https://github.com/intel-hadoop/hbase-0.94-panthera).
Specifically, today's release provides document store support in HBase by utilizing co-processors,
which brings up-to 3x reduction in storage usage and up-to 1.8x speedup in query processing.
Going forward, we will also use HBase-6800<https://issues.apache.org/jira/browse/HBASE-6800>
as the umbrella JIRA to track our efforts to get the document store idea reviewed and hopefully
incorporated into Apache HBase.

Thank you for your interest in contributing to the HBase project. I
have two initial comments/suggestions. These are also at

1) From the attached document, it appears that the existing
coprocessor framework was sufficient for the implementation of the DOT
system on top, which is great to see. There has been some discussion
in the HBase PMC, documented in the archives of the
dev@hbase.apache.org mailing list, that coprocessor based applications
should begin as independent code contributions, perhaps hosted in a
GitHub repository. In your announcement on general@ I see you have
sort-of done this already at:
https://github.com/intel-hadoop/hbase-0.94-panthera , except this is a
full fork of the HBase source tree with all history of individual
changes lost (a single commit of a source drop). It would be helpful
if only the changes on top of stock HBase code appear here. Otherwise,
what you have done is in effect forked the HBase project, which is not
ideally conducive to contribution.

2) From the design document: "The co-processor framework needs to be
extended to provide observers for the filter operations, similar to
the observers of the data access operations." We would be delighted to
work with you on the necessary coprocessor framework extensions. I'd
recommend a separate JIRA specifically for this. Let's discuss what
Coprocessor API extensions or additions are necessary. Do you have a

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

View raw message