hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: OOM when fetching all versions of single row
Date Fri, 31 Oct 2014 11:17:50 GMT
Here’s the simple answer. 

Don’t do it. 

They way you are abusing versioning is a bad design. 

Redesign your schema. 

On Oct 30, 2014, at 10:20 AM, Andrejs Dubovskis <dubis.lv@gmail.com> wrote:

> Hi!
> We have a bunch of rows on HBase which store varying sizes of data
> (1-50MB). We use HBase versioning and keep up to 10000 column
> versions. Typically each column has only few versions. But in rare
> cases it may has thousands versions.
> The Mapreduce alghoritm uses full scan and our algorithm requires all
> versions to produce the result. So, we call scan.setMaxVersions().
> In worst case Region Server returns one row only, but huge one. The
> size is unpredictable and can not be controlled, because using
> parameters we can control row count only. And the MR task can throws
> OOME even if it has 50Gb heap.
> Is it possible to handle this situation? For example, RS should not
> send the raw to client, if the last has no memory to handle the row.
> In this case client can handle error and fetch each row's version in a
> separate get request.
> Best wishes,
> --
> Andrejs Dubovskis

View raw message