accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dylan Hutchison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3206) New Public API: Approximate data counts of Tablets and Tablet Servers
Date Sun, 05 Oct 2014 02:49:33 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159399#comment-14159399
] 

Dylan Hutchison commented on ACCUMULO-3206:
-------------------------------------------

Copying from user list:

It should suffice to list the number of entries for a table, tablet and tablet server.  No
need to worry about number of unique rows, number of unique column families, etc.  By entry
I mean number of (key,value)s.

For load balancing, we care about how much physical data is on each tablet / tablet server.
 This is directly proportional to the number of entries, assuming that the key size and value
size in bytes do not differ too drastically.  If they do (say for raw documents of vastly
different sizes), the **best measure is the size of the data in bytes** for each tablet /
tablet server.  I didn't suggest it because it doesn't look like Accumulo tracks it so it
would involve a lot of new implementation and book-keeping, which could hamper performance.

Accumulo does already track the number of entries for tables, tablets and tablet server. 
It's just hard to get to, relying on the format of the metadata table and accessing the non-public
Monitor classes.  Bringing it to the public API just looks like a matter of reworking the
API and letting the client gather the information that the Monitor already does by connecting
to each tablet server.  Does that sound reasonable?

> New Public API: Approximate data counts of Tablets and Tablet Servers
> ---------------------------------------------------------------------
>
>                 Key: ACCUMULO-3206
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3206
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, monitor
>    Affects Versions: 1.6.1
>            Reporter: Dylan Hutchison
>            Priority: Minor
>
> The broader picture is public programmatic access to information in the Accumulo monitor.
 Specifically I'm looking to obtain the number of entries per tablet and per tablet server
for a given table.  The use case is to verify that manually set (or automatically set I suppose)
table splits are effectively dividing Accumulo data among many tablets, that is, verifying
load balancing.
> I wrote Accumulo 1.5 code which uses non-public API to obtain this information in the
same way the Monitor does via TabletStats. The tricky part was cross-referencing the Metadata
table to find the assignment of tablets to tablet servers for a given table.  I rewrote that
code for 1.6, switching the name of the Metadata table to "accumulo.metadata" and other associated
changes, but it would be great to make this part of the public API so that people don't have
to use non-public methods to obtain data that Accumulo has in the Monitor and Metadata table
anyway.
> We could approach this by adding to the TableOperations class or something similar. 
A request could go to an Accumulo master which gathers the necessary information from the
tablet servers just as the Monitor does, so that the client does not have to do it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message