hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing
Date Tue, 18 Sep 2012 00:41:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457508#comment-13457508

Jason Dai commented on HBASE-6800:

bq. coprocessor based applications should begin as independent code contributions, perhaps
hosted in a GitHub repository
bq. It would be helpful if only the changes on top of stock HBase code appear here.
This could work, though I think we need to figure out how to address several implications
brought by the proposal, such as:
(1) How do the users figure out what co-processor applications are stable, so that they can
use in their production deployment?
(2) How do we ensure the co-processor applications continue to be compatible with the changes
in the HBase project, and compatible with each other?
(3) How do the users get the co-processor applications? They can no longer get these from
the Apache HBase release, and may need to perform manual integrations - not something average
business users will do, and the main reason that we put the full HBase source tree out (several
of our users and customers want to get a prototype of DOT to try it out).

bq. We would be delighted to work with you on the necessary coprocessor framework extensions.
I'd recommend a separate JIRA specifically for this.
Yes, we do plan to submit the proposal for observers for the filter operations as a separate
JIRA (the original plan was to make it a sub task of this JIRA).

> Build a Document Store on HBase for Better Query Processing
> -----------------------------------------------------------
>                 Key: HBASE-6800
>                 URL: https://issues.apache.org/jira/browse/HBASE-6800
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors, performance
>    Affects Versions: 0.96.0
>            Reporter: Jason Dai
>         Attachments: dot-deisgn.pdf
> In the last couple of years, increasingly more people begin to stream data into HBase
in near time, and 
> use high level queries (e.g., Hive) to analyze the data in HBase directly. While HBase
already has very effective MapReduce integration with its good scanning performance, query
processing using MapReduce on HBase still has significant gaps compared to HDFS: ~3x space
overheads and 3~5x performance overheads according to our measurement.
> We propose to implement a document store on HBase, which can greatly improve query processing
on HBase (by leveraging the relational model and read-mostly access patterns). According to
our prototype, it can reduce space usage by up-to ~3x and speedup query processing by up-to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message