hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhaval Shah <prince_mithi...@yahoo.co.in>
Subject Re: Controlling TableMapReduceUtil table split points
Date Sun, 06 Jan 2013 17:29:00 GMT

Another option to avoid the timeout/oome issues is to use scan.setBatch() so that the scanner
would function normally for small rows but would break up large rows in multiple Result objects
which you can now use in conjunction with scan.setCaching() to control how much data you get
back.. 

This approach would not need a change in your schema design and would ensure that only 1 mapper
processes the entire row (but in multiple calls to the map function)



------------------------------
On Sun 6 Jan, 2013 10:07 PM IST David Koch wrote:

>Hi Ted,
>
>Thank you for your response. I will take a look.
>
>With regards to the timeouts: I think changing the key design as outlined
>above would ameliorate the situation since each map call only requests a
>small amount of data as opposed to what could be a large chunk. I remember
>that simply doing a get on one of the large outlier rows (~500mb) brought
>down the region server involved.
>
>/David
>
>On Sun, Jan 6, 2013 at 5:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> If events for one user are processed by a single mapper, I think you would
>>


Mime
View raw message