accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@stevens.edu>
Subject Re: Determining tablets assigned to table splits, and the number of rows in each tablet
Date Sat, 04 Oct 2014 19:57:31 GMT
David, thanks for the pointer to the articles.  I read them a few months
ago but forgot.  Will need to read the HyperLogLog paper
<https://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf>
.

*The number of unique rows within a tablet are not explicitly tracked.*


Right Josh, I misspoke.  For load balancing, we're interested in the *number
of entries in each tablet*, not the number of unique rows.  Only counting
the number of unique rows doesn't distinguish between really big rows and
singleton rows, and as David pointed out, we need client-controlled means
of doing unique row counting/estimation.

We can see the number of entries in a Table and the number of entries in a
Table of a particular Tablet Server, because these are listed in the
monitor.
[image: Inline image 2]

David, you may recognize the name of this tablet server.  Just got Accumulo
Vagrant <https://github.com/medined/Accumulo_1_5_0_By_Vagrant> working last
week, thanks ;)

[image: Inline image 1]

However, there could be multiple Tablets assigned to the same Tablet
Server.  Here is an outline of the procedure I followed to read the
*TabletStats.numEntries*
<https://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/tabletserver/thrift/TabletStats.html#numEntries>
for the correct Tablet that holds a split range.

Given table name,

   -

   get a list of all tablet servers by connecting to the Master and
   referencing the MasterMonitorInfo
   <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/master/thrift/MasterClientService.Client.html#getMasterStats(org.apache.accumulo.trace.thrift.TInfo,%20org.apache.accumulo.core.security.thrift.TCredentials)>
   -

   get internal table ID via Tables.getNameToIdMap
   <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/impl/Tables.html#getNameToIdMap(org.apache.accumulo.core.client.Instance)>
   -

   connect to each tablet server  TabletStat
   <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/tabletserver/thrift/TabletStats.html>s
   of tablets that are on the tablet server under the given internal table ID
   -

   Scan Metadata table starting at the {tableName converted to internal
   table ID}
   -

   and ending at {internal table ID}’<’     (last entry for this table in
   the metadata table)
   -

      Example row: 1<  (if the internal table ID is 1 and this is the last
      split in the row)
      -

   look at the column for the previous row:  ~tab:~pr
   -

      Example row-col-val:   1< ~tab:~pr []    \x00
      -

      (this table has no table splits-- no end row and no previous row
      start)
      -

   Create an extent for the value using KeyExtent
   <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/KeyExtent.html>
   -

      (shortcut for parsing the metadata table and getting the previous and
      current end row)
      -

   Among the list of TabletStats, find the one whose previous end row and
   next end row match the result from the Metadata table.

Take that tabletStat.numEntries
<https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/tabletserver/thrift/TabletStats.html#numEntries>
to get the number of entries in this table split range.

Later this information is combined into a method that returns an array of
triples

(tablet_split_range, tablet_num_entries, tablet_server_list_for_this_tablet)


I recommend adding the ability to get the number of entries for tables,
tablet servers and tablets to the public API.  It would be nice to
reference any of the data from the Accumulo monitor programmatically; in
this case we cross-reference monitor data with the Metadata table.  Josh,
is JIRA the place to file those kinds of suggestions?

Regards,
Dylan

-- 
www.cs.stevens.edu/~dhutchis

Mime
View raw message