Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7286E6E8 for ; Wed, 13 Mar 2013 17:08:15 +0000 (UTC) Received: (qmail 2165 invoked by uid 500); 13 Mar 2013 17:08:13 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 2092 invoked by uid 500); 13 Mar 2013 17:08:13 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 2050 invoked by uid 99); 13 Mar 2013 17:08:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 17:08:13 +0000 Date: Wed, 13 Mar 2013 17:08:13 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HBASE-7697) Consolidate tools for getting data into, out of HBase MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-7697. --------------------------------- Resolution: Invalid Closing as invalid because this is pretty vague. If you're interested, see related mapreduce improvements in HBASE-8084. > Consolidate tools for getting data into, out of HBase > ----------------------------------------------------- > > Key: HBASE-7697 > URL: https://issues.apache.org/jira/browse/HBASE-7697 > Project: HBase > Issue Type: Improvement > Components: Client, mapreduce > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > > The user experience for importing data into HBase and getting a dump out of HBase is pretty poor. The existing tools as I understand them include: > - org.apache.hadoop.hbase.mapreduce.Export, > - org.apache.hadoop.hbase.mapreduce.Import, > - org.apache.hadoop.hbase.mapreduce.ImportTsv, > - org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles, and > - org.apache.hadoop.hbase.mapreduce.CopyTable > Each one provides specific features that do not necessarily overlap with the others. For instance, Import and ImportTsv could have most of their logic combined, sharing common driver code and leaving the details of the file-format up to the user to provide via a pluggable mapper. Export and CopyTable both map over a target table; it's only the detail of what they do with the data that is different. Bulk operations via HFiles could be a more common use-case as well, not just a special case of ImportTsv. > The list of [open issues|https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HBASE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20text%20~%20%22ImportTsv%22%20ORDER%20BY%20updatedDate%20DESC] against ImportTsv alone indicates users are using the tool, and I certainly advise it for people getting started with a new HBase deployment. > I propose a single interface for getting data into and out of HBase. It would be pluggable, allowing users to override details of their file formats and schemas. We can provide implementations that replicate existing tool behaviors as example modules. These tools are also a reasonable place, IMHO, to include support for creation and loading of snapshots. > I started down the path of a specific tool intended to overcome some of the limitations of ImportTsv and it has since refactored into a more general purpose application. Initial patches forthcoming. Comments strongly encouraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira