hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing
Date Mon, 17 Sep 2012 19:21:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457242#comment-13457242

Andrew Purtell commented on HBASE-6800:

Thank you for your interest in contributing to the HBase project. I have two initial comments/suggestions:

1) From the attached document, it appears that the existing coprocessor framework was sufficient
for the implementation of the DOT system on top, which is great to see. There has been some
discussion in the HBase PMC, documented in the archives of the dev@hbase.apache.org mailing
list, that coprocessor based applications should begin as independent code contributions,
perhaps hosted in a GitHub repository. In your announcement on general@ I see you have sort-of
done this already at: https://github.com/intel-hadoop/hbase-0.94-panthera , except this is
a full fork of the HBase source tree with all history of individual changes lost (a single
commit of a source drop). It would be helpful if only the changes on top of stock HBase code
appear here. Otherwise, what you have done is in effect forked the HBase project, which is
not conducive to contribution. 

2) From the design document: "The co-processor framework needs to be extended to provide observers
for the filter operations, similar to the observers of the data access operations." We would
be delighted to work with you on the necessary coprocessor framework extensions. I'd recommend
a separate JIRA specifically for this. Let's discuss what Coprocessor API extensions or additions
are necessary. Do you have a proposal?

> Build a Document Store on HBase for Better Query Processing
> -----------------------------------------------------------
>                 Key: HBASE-6800
>                 URL: https://issues.apache.org/jira/browse/HBASE-6800
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors, performance
>    Affects Versions: 0.96.0
>            Reporter: Jason Dai
>         Attachments: dot-deisgn.pdf
> In the last couple of years, increasingly more people begin to stream data into HBase
in near time, and 
> use high level queries (e.g., Hive) to analyze the data in HBase directly. While HBase
already has very effective MapReduce integration with its good scanning performance, query
processing using MapReduce on HBase still has significant gaps compared to HDFS: ~3x space
overheads and 3~5x performance overheads according to our measurement.
> We propose to implement a document store on HBase, which can greatly improve query processing
on HBase (by leveraging the relational model and read-mostly access patterns). According to
our prototype, it can reduce space usage by up-to ~3x and speedup query processing by up-to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message