hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Mapping Over Cells
Date Mon, 11 May 2015 22:52:16 GMT
How large is the max file size? How large are your regions? How much memory are you allocating
to your region server? 
How many rows are too large that cause the OOM error? 

The key is trying to figure out how to help you without doing a slight schema change. 
(Adding an (Max Long - timestamp) to the row key and then counting the number of column qualifiers
in the row.  Once you hit N, you write to a new row with a new timestamp.  When you want to
insert, you just fetch the first rowkey in a small range scan, and count the current number
of column qualifiers.  The difficult part is that you will have to manually merge the result
set on read and if you have two rows with the same column qualifier, the one in the latest
row wins. 

That will solve your too fat of a row problem if you could change schemas. 

> On May 11, 2015, at 11:04 AM, Webb, Ryan L. <Ryan.Webb@jhuapl.edu> wrote:
> We use the filtering for the Family, but the resulting Result is still too large.
> Basically we have a super vertex problem. 
> RowID        ColF       ColQ       Value
> VertexID   InEdge  EdgeID  VertexID
> We are working with an existing codebase so a scheme re-write would be painful and was
hoping there was a simple solution we just haven't found.
> A Cell input format would let us look at the table as an Edge List instead of the Vertex
list that the Result gives us. 
> We are starting to look into a migration to a different scheme because of all of the
other issues a super vertex gives.
> Ryan Webb
> -----Original Message-----
> From: Shahab Yunus [mailto:shahab.yunus@gmail.com] 
> Sent: Monday, May 11, 2015 11:51 AM
> To: user@hbase.apache.org
> Subject: Re: Mapping Over Cells
> You can specify the column family or column to read when you create the Scan object.
Have you tried that? Does it make sense? Or I misunderstood your problem?
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[],%20byte[])
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addFamily(byte[])
> Regards,
> Shahab
> On Mon, May 11, 2015 at 11:45 AM, Webb, Ryan L. <Ryan.Webb@jhuapl.edu>
> wrote:
>> Hello,
>> We have a table in HBase that has very large rows and it goes OOM when 
>> the table mapper attempts to read the entire row into a result.
>> We would like to be able to map over each Cell in the table as a 
>> solution and it is what we are doing in the map anyway.
>> Is this possible? Like the default behavior for Accumulo?
>> We looked at the settings on Scan and didn't really see anything and 
>> the source code of Result looks like it wraps an array of cells so the 
>> data is already loaded at that point.
>> We are using HBase .98.1 and Hadoop 2 APIs
>> Thanks
>> Ryan Webb
>> PS - Sorry if this is a duplicate, I sent the first one before 
>> subscribing so I don't know what the policy is with that.

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

View raw message