hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaeyun Noh" <metal...@gmail.com>
Subject Re: map reduce range of records from hbase table
Date Wed, 08 Oct 2008 23:04:58 GMT
Hi,

May I ask another question?

I'm running HBase/Hadoop on linux server, and implementing business
application with java, which runs on a different windows machine.

It looks like MapReduce job runs on a server node. Can I run the MapReduce
job built on windows client with an existing linux server? How can we get
result done by MapReduce job at the server?

e.g. scanning specific table with some filter conditions and return sum of
specific columns...

Regards,
Jaeyun Noh.


On Wed, Oct 8, 2008 at 2:13 PM, stack <stack@duboce.net> wrote:

> Cedric Ho wrote:
>
>> Hi all,
>>
>> I am using 0.18.0 and have successfully used data from hbase table as
>> input to my map/reduce job.
>>
>> I wonder how to specify a subset of records from a table instead of
>> taking all records as input.
>> Such as a range of the row keys or maybe by specific values of certain
>> columns.
>>
>>
> You'll have to subclass the TableInputFormat.
>
> There is an example in the javadoc on subclassing TIF:
> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html(Sorry,
the example is mangled.  Do a get of the html source to see
> non-garbled code).
>
> The example shows you how to set a filter.  Filters can filter on rows and
> values.
>
> To work against a subset, you'd probably need to play with getSplits  in
> your subclass.   Default, it  basically eretrns as many splits as there are
> regions in your table, so its the whole table always.  Filters could stop
> unwanted rows being returned but maybe its better if the rows weren't
> considered in the first place; hence the need of getSplits subclassing.
>
> St.Ack
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message