hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Azza Abouzeid (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2615) M/R on bulk imported tables
Date Thu, 03 Jun 2010 00:49:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874891#action_12874891
] 

Azza Abouzeid commented on HBASE-2615:
--------------------------------------

We modified the KeyValue constructor in our data generation script to include current timestamp
and MR jobs work. Thanks, nice catch!

Perhaps, the API could only expose the constructor interfaces that require a timestamp or
add a MIN/current timestamp by default to tuples instead of MAX to guarantee it being read.

> M/R on bulk imported tables
> ---------------------------
>
>                 Key: HBASE-2615
>                 URL: https://issues.apache.org/jira/browse/HBASE-2615
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3, 0.20.4
>         Environment: os.arch=amd64; os.version=2.6.9-67.ELsmp; java.version=1.6.0_15;
java.vendor=Sun Microsystems Inc.
>            Reporter: Azza Abouzeid
>         Attachments: dummydata.tar.gz
>
>
> We are bulk importing using loadtable.rb and running M/R jobs using HBase as input.
> We're taking the following steps:
> 1a. Load HBase with a M/R job using the normal API. 
> OR
> 1b. Load HBase with bulk import.
> THEN
> 2a. Using the shell, do a "count" over the table.
> OR
> 2b. Run a M/R job that scans the whole HBase table (and nothing else).
> Of the 4 combos, 3 are fine: 1a+2a, 1a+2b, 1b+2a.  We're having trouble with 1b+2b. 
When we run the M/R job, it doesn't seem to read in any records, but there are no explicit
errors in either the Hadoop or HBase logs.
> Any ideas on what might be wrong with the bulk import to cause this problem?  We confirmed
this problem exists in both hbase-0.20.3 and hbase-0.20.4.
> We have created dummy data (see attached). This is the test case:
> After loading the data into HDFS. In hbase shell:
> create 'tiny', 'values'
> Execute: 
> {HBASE-HOME}/bin/hbase org.jruby.Main {HBASE-HOME}/bin/loadtable.rb tiny tinytable
> Then run the simple row counter
> {HADOOP-HOME}/bin/hadoop jar {HBASE-HOME}/hbase-0.20.x.jar rowcounter tiny values
> Notice that map input records read is always zero. We confirmed that other mapreduce
jobs do not execute the map function at all, always returning 0 records.
> We also ran a major_compaction of all Hbase tables (.META. and .ROOT. as well) but this
did not fix the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message