hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing
Date Tue, 18 Sep 2012 16:52:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457945#comment-13457945

Andrew Purtell commented on HBASE-6800:

bq. If moving your code into Apache is a goal, you could also start the co-processor project
in the apache incubator.  You could do that while being consistent with andrew's suggested
methodology (not forking HBase, mavenized integration...).

This is a good suggestion. Panthera isn't so much an enhancement to HBase but rather a full
application on top, and with wider scope than just HBase -- also Hive, and additional new
components. In the scope of the HBase project alone, API changes, core changes, and (incorporating
my earlier comment) utility coprocessors of sufficient generality make a lot of sense, as
well as addressing the meta issues raised (I.e. should HBase have Eclipse plugin like tooling
for getting and installing CPs). HBase should be a good platform for your work, let us know
what you need.
> Build a Document Store on HBase for Better Query Processing
> -----------------------------------------------------------
>                 Key: HBASE-6800
>                 URL: https://issues.apache.org/jira/browse/HBASE-6800
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors, performance
>    Affects Versions: 0.96.0
>            Reporter: Jason Dai
>         Attachments: dot-deisgn.pdf
> In the last couple of years, increasingly more people begin to stream data into HBase
in near time, and 
> use high level queries (e.g., Hive) to analyze the data in HBase directly. While HBase
already has very effective MapReduce integration with its good scanning performance, query
processing using MapReduce on HBase still has significant gaps compared to HDFS: ~3x space
overheads and 3~5x performance overheads according to our measurement.
> We propose to implement a document store on HBase, which can greatly improve query processing
on HBase (by leveraging the relational model and read-mostly access patterns). According to
our prototype, it can reduce space usage by up-to ~3x and speedup query processing by up-to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message