hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Salyakhov (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-2378) Bulk insert with multiple reducers
Date Thu, 25 Mar 2010 19:24:27 GMT
Bulk insert with multiple reducers

                 Key: HBASE-2378
                 URL: https://issues.apache.org/jira/browse/HBASE-2378
             Project: Hadoop HBase
          Issue Type: Bug
          Components: mapreduce
    Affects Versions: 0.20.3
            Reporter: Ruslan Salyakhov

If I run MR to prepare HFIles with more than one reducer then some values for keys are not
appeared in the table after loadtable.rb script execution. With one reducer everything works

- the row id must be formatted as a ImmutableBytesWritable
- MR job should ensure a total ordering among all keys

MAPREDUCE-366  (patch-5668-3.txt)
- TotalOrderPartitioner that uses the new API (attached)

- patched HFileOutputFormat (attached)

Input data (attached):
* my_sample_log_1k.txt - sample data, input for MyHFilesWriter

Source (attached):
* MyKeyComparator.java - comparator for my ImmutableBytesWritable keys
* TestTotalOrderPartitionerForMyKeys.java - test case for my keys (note that I've set up MyKeyComparator
to pass that test)
* MyHFilesWriter.java	 - My MR job to prepare HFiles
* HFileOutputFormat.java - from MAPREDUCE-366
* TotalOrderPartitioner.java - from MAPREDUCE-366
* MySampler.java - My RandomSampler based on Sampler from MAPREDUCE-366 BUT I've put the following
string into getSample method (without that string it doesn't work):
            reader.initialize(splits.get(i), new TaskAttemptContext(job.getConfiguration(),
new TaskAttemptID()));

Test case:
# hadoop jar keyvalue-poc.jar MyHFilesWriter -in /test_hbase/my_sample_log_1k.txt -out /test_hbase/hfiles/01/
-r 1
# hadoop jar keyvalue-poc.jar MyHFilesWriter -in /test_hbase/my_sample_log_1k.txt -out /test_hbase/hfiles/02/
-r 2
# hbase> create 'tst_hfiles_01', {NAME => 'vals'}
# hbase> create 'tst_hfiles_02', {NAME => 'vals'}
# hbase org.jruby.Main /usr/lib/hbase-0.20/bin/loadtable.rb tst_hfiles_01 /test_hbase/hfiles/01
# hbase org.jruby.Main /usr/lib/hbase-0.20/bin/loadtable.rb tst_hfiles_02 /test_hbase/hfiles/02
# check values for keys

for example:
hbase(main):006:0* count 'tst_hfiles_01', 100 
Current count: 100, row: 0.14.USA.IL.602.ELMHURST.                                
Current count: 200, row: 0.245.USA.ME.500.PORTLAND.                               
Current count: 300, row: 0.34.USA.FL.Rollup.Rollup.                               
Current count: 400, row: 0.443.USA.CA.803.LOS.ANGELES.1.1.0                              
Current count: 500, row: 0.8.USA.CO.751.CASTLE.ROCK.1.1.0                                
Current count: 600, row: 1.14.DZA.Rollup.Rollup.Rollup.                           
Current count: 700, row: 1.159.SWE.AB.Rollup.Rollup.                              
Current count: 800, row: 1.17.USA.TN.659.CLARKSVILLE.                             
Current count: 900, row: 1.220.USA.MI.505.SOUTHFIELD.                             
999 row(s) in 0.0930 seconds
hbase(main):007:0> count 'tst_hfiles_02', 100
Current count: 100, row: 0.231.USA.GA.524.BUFORD.                                 
Current count: 200, row: 0.4.USA.VA.573.Rollup.                                   
Current count: 300, row: 0.9.ROU.B.-1.BUCHAREST.                                  
Current count: 400, row: 1.16.USA.IA.679.Rollup.                                  
Current count: 500, row: 1.245.NOR.03.-1.OSLO.                                    
Current count: 600, row: 0.245.GBR.ENG.826005.BEXLEY.                             
Current count: 700, row: 0.48.GBR.ENG.826027.Rollup.                              
Current count: 800, row: 1.14.SWE.Rollup.Rollup.Rollup.                           
Current count: 900, row: 1.201.GBR.ENG.826005.LONDON.                             
999 row(s) in 0.1630 seconds
hbase(main):008:0> get 'tst_hfiles_01', '0.14.USA.IL.602.ELMHURST.'
COLUMN                       CELL                                                        
 vals:key0                   timestamp=1269542753914, value=0                            
 vals:key1                   timestamp=1269542753914, value=14                           
 vals:key2                   timestamp=1269542753914, value=USA                          
 vals:key3                   timestamp=1269542753914, value=IL                           
 vals:key4                   timestamp=1269542753914, value=602                          
 vals:key5                   timestamp=1269542753914, value=ELMHURST                     
 vals:key6                   timestamp=1269542753914, value=1                            
 vals:key7                   timestamp=1269542753914, value=1                            
 vals:key8                   timestamp=1269542753914, value=0                            
 vals:key9                   timestamp=1269542753914, value=0                            
 vals:val0                   timestamp=1269542753914, value=2                            
11 row(s) in 0.0160 seconds
hbase(main):009:0> get 'tst_hfiles_02', '0.14.USA.IL.602.ELMHURST.'
COLUMN                       CELL                                                        
0 row(s) in 0.0220 seconds

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message