hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul van Hoven <paul.van.ho...@googlemail.com>
Subject Map Reduce with multiple scans
Date Tue, 26 Feb 2013 13:41:59 GMT
My rowkeys look something like this:

md5( date ) + md5( ip address )

So an example would be
md5( "2013-02-08") + md5( "")

For one particular date I got several rows. Now I'd like to query
different dates, for example "2013-01-01" and "2013-02-01" and some
other. Additionally I'd like to perform this or these scans in a map
reduce job.

Currently my map reduce job looks like this:

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ToyJob");
job.setJarByClass( PlayWithMapReduce.class );

byte[] md5Key = Utils.md5( "2013-01-07" );
int md5Length = 16;
int longLength = 8;

byte[] startRow = Bytes.padTail( md5Key, longLength ); //append "0 0 0
0 0 0 0 0"
byte[] endRow = Bytes.padTail( md5Key, longLength );
endRow[md5Length-1]++; //last byte gets counted up

Scan scan = new Scan( startRow, endRow );

Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"),
Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") );

String tableName = "ToyDataTable";
TableMapReduceUtil.initTableMapperJob( tableName, scan, Mapper.class,
null, null, job);

This map reduce job works fine but this is just one scan job for this
map reduce task. What do I have to do to pass multiple scans? Or do
you have any other suggestions on how to achieve that goal? The
constraint would be that it must be possible to combine it with map

View raw message