accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Large Data Size in Row or Value?
Date Mon, 01 Apr 2013 14:55:38 GMT
Ignoring the actual size constraint necessary (I'm not entirely sure how 
that all adds up; it would be affected by concurrent query load and many 
other things), placing the large chunk into the Key will affect the size 
of the index inside of RFile (the file construct actually backing the 
data in your table). This will increase your access times just to find 
the offset in the file for the Key you're looking for.

Putting a chunk number in the Key and the actual data in the Value will 
probably net you much better results. Chunking into 128M should work 
with a 3G heap; however, I'd err on the cautious side and make many 
smaller chunks instead of few very large chunks.

On 4/1/13 10:33 AM, David Medinets wrote:
> I have a chunk of data (let's say 400M) that I want to store in 
> Accumulo. I can store the chunk in the ColumnFamily or in the Value. 
> Does it make any difference to Accumulo which is used?
> My tserver is setup to use -Xmx3g. What is the largest size that seems 
> to work? I have much more  that I can allocate.
> Or should I focus on breaking the data into smaller pieces ... say 
> 128M each?
> Thanks.

View raw message