hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Brooks <i.bro...@sensewhere.com>
Subject Streaming a subset of HBase data
Date Thu, 13 Mar 2014 13:48:31 GMT

I'm trying to implement a way of using the hadoop-streaming-2.2.0.jar to export a subset of
data ( timerange ) to a mapper and reduce application written in another language. However
I have been unable to get anything but all the data from HBase table.

Looking at the code and forums, it seems that as hadoop-streaming doesnt support the new API
it isn't possible to give it scan parameters to set the timerange or other filters. I found
some classes online (http://cp1985chenpeng.iteye.com/blog/1315076) that implement the funuctionality
of the newer API in a say that hadoop-streaming seems to be ok with, but when it gets the
the mapreduce.Job part of processing it still just returns the whole table rather than the
rows between the timeframe I am specifying.

Is there a known way that I should be able to do this?

-Ian Brooks

View raw message