hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7697) Consolidate tools for getting data into, out of HBase
Date Wed, 30 Jan 2013 17:31:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566671#comment-13566671
] 

Nick Dimiduk commented on HBASE-7697:
-------------------------------------

That's exactly my point: those existing applications are just simple examples over the input
and output formats. They're not fully-features data movement applications. HBase is shaping
up to be a world-class database, it should ship world-class tool to help users and administrators
manage their "planet-sized data".

Yes, I believe snapshot management falls into the domain of this/these tools as well.
                
> Consolidate tools for getting data into, out of HBase
> -----------------------------------------------------
>
>                 Key: HBASE-7697
>                 URL: https://issues.apache.org/jira/browse/HBASE-7697
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, mapreduce
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>
> The user experience for importing data into HBase and getting a dump out of HBase is
pretty poor. The existing tools as I understand them include:
> - org.apache.hadoop.hbase.mapreduce.Export,
> - org.apache.hadoop.hbase.mapreduce.Import,
> - org.apache.hadoop.hbase.mapreduce.ImportTsv,
> - org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles, and
> - org.apache.hadoop.hbase.mapreduce.CopyTable
> Each one provides specific features that do not necessarily overlap with the others.
For instance, Import and ImportTsv could have most of their logic combined, sharing common
driver code and leaving the details of the file-format up to the user to provide via a pluggable
mapper. Export and CopyTable both map over a target table; it's only the detail of what they
do with the data that is different. Bulk operations via HFiles could be a more common use-case
as well, not just a special case of ImportTsv.
> The list of [open issues|https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HBASE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20text%20~%20%22ImportTsv%22%20ORDER%20BY%20updatedDate%20DESC]
against ImportTsv alone indicates users are using the tool, and I certainly advise it for
people getting started with a new HBase deployment.
> I propose a single interface for getting data into and out of HBase. It would be pluggable,
allowing users to override details of their file formats and schemas. We can provide implementations
that replicate existing tool behaviors as example modules. These tools are also a reasonable
place, IMHO, to include support for creation and loading of snapshots.
> I started down the path of a specific tool intended to overcome some of the limitations
of ImportTsv and it has since refactored into a more general purpose application. Initial
patches forthcoming. Comments strongly encouraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message