accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slater, David M." <David.Sla...@jhuapl.edu>
Subject RE: Max tablet size & Pre-splitting
Date Mon, 28 Oct 2013 18:50:04 GMT
Thanks John, that helps. I checked Eric's reply as well, and I think I'm good.

From: John Vines [mailto:vines@apache.org]
Sent: Monday, October 28, 2013 2:11 PM
To: user@accumulo.apache.org
Subject: Re: Max tablet size & Pre-splitting

There is no hardcoded maximum for file size in Accumulo, so the split threshhold is the only
things that provides some sort of definition for tablet size. Please be aware, if you have
giant rows, you can have a tablet that exceeds the split threshhold as well, hence me referring
to it loosely as the defining characteristic.


As for tablet size, you can get that information from the !METADATA table, as one option.
Eric Newton recently wrote a reply on this mailing list in the past 2 weeks, I think, which
explained the entries there.

On Mon, Oct 28, 2013 at 1:59 PM, Slater, David M. <David.Slater@jhuapl.edu<mailto:David.Slater@jhuapl.edu>>
wrote:
First, a quick question: For Accumulo 1.4.2, is there a maximum size that tablet can have?
In other words, if I was to do something like table.split.threshold=1000G, would that actually
allow the tablet to grow to that size, or is there some static maximum, like 2G that a tablet
can have?

The reason I ask this is that I'm doing time-based presplitting of tables, so that I add a
set of split points when I get to a new time range (or one of the tablets reach a certain
size), and then transfer all of my ingest to the new set of tablets created. This keeps me
from needing to do any table splits involving data. Therefore, I would like to set the table
split threshold arbitrarily high, so that my presplitting algorithm can do all the work.

Second, is there a preferred way to estimate the tablet sizes from the Java API? I have the
Ingestion application using my split points and mutation.numBytes() to keep track of the number
of bytes per tablet. Should I be using mutation.memory() instead? Or is there a more direct
way via connector.tableOperations() or some other mechanism to determine the size of the tablet?

Thanks,
David


Mime
View raw message