accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Write or Ingest bottleneck
Date Tue, 06 Dec 2016 18:10:47 GMT


hujs wrote:
> Hello, I asked a few questions,
> 1, suppose I insert data into the 'a' table, each tserver in the cluster has
> at least one 'a' table of tablets, I use letters such as j, k as the split
> point. If I have four tserver A, B, C, D, A, B, C ingest rate can reach 90k,
> D ingest rate only can reach 50k, D tserver will affect the cluster ingest
> performance?

I don't think I understand this. For a table, tablet ranges are 
disjoint. If you split the tables on letters (e.g. 'a', 'f', 'j'), the 
Key-Values that have a key starting with 'a' would only reside in one 
tablet and thus only on one tabletserver.

> 2, if my rowid is self-increasing, such as 1,2,3,4, ..., N, how do I choose
> splitpoints? Can I use the remainder of an integer as a splitpoint? Such as
> n% 3 = 0, n% 3 = 1, n% 3 = 2 as splitpoints, if rowid = 3 will be written to
> n% 3 = 0 tablet, rowid = 5 will be written to n% 3 = 2 Tablet. What can I
> do?

Remember that Accumulo is only dealing with bytes and has no context 
that, in your case, the bytes are actually stringified numbers. For 
example, to create 10 split points, it's easy: [1, 2, 3, 4, 5, 6, 7, 8, 
9]. This creates ten tablets, (-inf, 1), [1, 2), [2, 3), ... [9, +inf).

To create 20 tablets, you can do the following: [05, 1, 15, 2, 25, 3, 
35, 4, 45, 5, 55, 6, 65, 7, 75, 8, 85, 9, 95]. This would create 20 
tablets, (-inf, 05), [05, 1), [1, 15), ... [95, +inf).

You can extend this to create more split points if necessary for 
"numbers", but it also applies to alphabetical data as you described 
earlier. Another common trick is to temporarily reduce the split 
threshold for your table, ingest a corpus of data until you get a 
desired number of split points, and then copy the current split and then 
them later (the split command in the shell can read the split points, 
one per line, from a file).

Mime
View raw message