accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@stevens.edu>
Subject Re: Determining tablets assigned to table splits, and the number of rows in each tablet
Date Sat, 04 Oct 2014 21:52:01 GMT
It should suffice to list the number of entries for a table, tablet and
tablet server.  No need to worry about number of unique rows, number of
unique column families, etc.  By entry I mean number of (key,value)s.

For load balancing, we care about how much physical data is on each tablet
/ tablet server.  This is directly proportional to the number of entries,
assuming that the key size and value size in bytes do not differ too
drastically.  If they do (say for raw documents of vastly different sizes),
the best measure is the *size of the data in bytes *for each tablet /
tablet server.  I didn't suggest it because it doesn't look like Accumulo
tracks it so it would involve a lot of new implementation and book-keeping,
which could hamper performance.

Accumulo does already track the number of entries for tables, tablets and
tablet server.  It's just hard to get to, relying on the format of the
metadata table and accessing the non-public Monitor classes.  Bringing it
to the public API just looks like a matter of reworking the API and letting
the client gather the information that the Monitor already does by
connecting to each tablet server.  Does that sound reasonable?

Regards, Dylan

On Sat, Oct 4, 2014 at 4:11 PM, David Medinets <david.medinets@gmail.com>
wrote:

> Adding this functionality into Accumulo's API would reduce it's efficiency
> for users that don't need this level of tracking. Let ingest procedures
> take the performance hit. There are synchronization issues that reduce
> degrade performance. Also what would be the appropriate level of tracking -
> at the row, column-family, or every level? Whatever answer you give,
> someone else will ask for something different. And then there are the
> aggregation questions. Not to mention the additional storage requirements.
>


-- 
www.cs.stevens.edu/~dhutchis

Mime
View raw message