hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
Date Sun, 08 Apr 2012 01:00:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249442#comment-13249442

Himanshu Vashishtha commented on HBASE-5741:

Yes, the javadoc says it half correct when it says that "the table must exist in HBase if
you are not using the option importtsv.bulk.output".
The behavior, in absence of the above property is to do inserts into the provided table; so
it throws an exception if the table doesn't exist.

When we use this option, importtsv tries to configure job by making an attempt to read all
the regions of this table and then use the start keys of these regions as partition-markers
for the TotalOrderPartitioner class. This usage eliminates the first start-row (which is always
a EMPTY_BYTE_ARRAY), so even if we create a new table, it will be useless as for using it
for configuring a job. 

For the case with LoadIncrementalHFile, the destination of the directory containing the hFiles
is assumed to be in a specific format:

 --where we are storing the hfiles for a specific column family in a separate sub-directory.

If we do create a table based on the parameters given to the importtsv command, it will not
be useful for the case of bulkload usecase as in the importtsv job, we dump hfiles based on
rows; so all coulmn famlies for a specific row lands in one hfile. 

It would be great to know if we actually use this workflow: create HFiles from importtsv job,
and then use bulkload to insert those HFiles in HBase table.

I think we should change the javadoc.

Please let me know if you have any questions; hbase-mapreduce use cases are exciting.
> ImportTsv does not check for table existence 
> ---------------------------------------------
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase"
(in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create
it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt
to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan
on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads"
command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv.
We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered
problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table:
myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message