hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2615) M/R on bulk imported tables
Date Fri, 28 May 2010 20:30:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873143#action_12873143
] 

stack commented on HBASE-2615:
------------------------------

Using hfile tool, I confirmed that the keys in the two files are ordered:

./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile


> M/R on bulk imported tables
> ---------------------------
>
>                 Key: HBASE-2615
>                 URL: https://issues.apache.org/jira/browse/HBASE-2615
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3, 0.20.4
>         Environment: os.arch=amd64; os.version=2.6.9-67.ELsmp; java.version=1.6.0_15;
java.vendor=Sun Microsystems Inc.
>            Reporter: Azza Abouzeid
>         Attachments: dummydata.tar.gz
>
>
> We are bulk importing using loadtable.rb and running M/R jobs using HBase as input.
> We're taking the following steps:
> 1a. Load HBase with a M/R job using the normal API. 
> OR
> 1b. Load HBase with bulk import.
> THEN
> 2a. Using the shell, do a "count" over the table.
> OR
> 2b. Run a M/R job that scans the whole HBase table (and nothing else).
> Of the 4 combos, 3 are fine: 1a+2a, 1a+2b, 1b+2a.  We're having trouble with 1b+2b. 
When we run the M/R job, it doesn't seem to read in any records, but there are no explicit
errors in either the Hadoop or HBase logs.
> Any ideas on what might be wrong with the bulk import to cause this problem?  We confirmed
this problem exists in both hbase-0.20.3 and hbase-0.20.4.
> We have created dummy data (see attached). This is the test case:
> After loading the data into HDFS. In hbase shell:
> create 'tiny', 'values'
> Execute: 
> {HBASE-HOME}/bin/hbase org.jruby.Main {HBASE-HOME}/bin/loadtable.rb tiny tinytable
> Then run the simple row counter
> {HADOOP-HOME}/bin/hadoop jar {HBASE-HOME}/hbase-0.20.x.jar rowcounter tiny values
> Notice that map input records read is always zero. We confirmed that other mapreduce
jobs do not execute the map function at all, always returning 0 records.
> We also ran a major_compaction of all Hbase tables (.META. and .ROOT. as well) but this
did not fix the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message