accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <z11...@outlook.com>
Subject Re: question on data block cache
Date Fri, 22 Jan 2016 20:35:06 GMT
Thanks Josh!
Ok, here I add column Hosted Tablets and Entries to the table below for
additional information.
As we can see the tablets are distributed evenly to all tablet servers, and
the one with highest load has the highest number of entries (> 1B), there
are few tablet servers have > 700M entries, which are not really far away.
I'd admit the data distribution likely not great, because URL is used as row
id value (so many of them share same prefix), and it's almost impossible to
set the presplit points, unless we know what the data value would be.
Instead of specifying split point strings, I wish Accumulo has feature to
allow us to specify x number of tablets, and it will automatically split y
entries across those x tablets :-)

Follow up questions:
1. The test queries are generated randomly, so theoretically I'd say the
likelihood most requests coming to 1 tablet server should be slim, but with
the fact of URL is used as row id value, then that may be possible. What
does the number in Query column indicate? Is that the number of entries
returned, or number of reads?
2. Looking at sample table below, is there a way to find out the ranges of
all tablets hosted on TServer14? I am thinking to write a small program to
scan all row ids from that tablet server, and find the values which would
become the split points, which then I can add the splits to the table, and
re-run my tests to see if it resolves the issue.

Regarding your other question, yes, I saw a few occasion when refreshing the
page, which it shows number of active scans was not 16, and yet there were
waiting scans, so it's not like 1-2 times.

Server | Hosted Tablets | Entries | Query | Running Scans 
==================================== 
TServer1 | 47 | 548.43M | 24 | 0 (0) 
TServer2 | 47 | 708.70M | 37 | 0 (0) 
TServer3 | 47 | 597.88M | 40 | 0 (0) 
TServer4 | 47 | 382.72M | 1 | 0 (0) 
TServer5 | 47 | 756.77M | 0 | 0 (0) 
TServer6 | 47 | 654.38M | 57 | 0 (0) 
TServer7 | 47 | 695.09M | 5 | 0 (0) 
TServer8 | 47 | 637.94M | 4 | 0 (0) 
TServer9 | 47 | 541.74M | 7 | 0 (0) 
TServer10 | 46 | 625.12M | 0 | 0 (0) 
TServer11 | 46 | 248.75M | 56 | 0 (0) 
TServer12 | 46 | 368.87M | 124 | 0 (0) 
TServer13 | 46 | 292.73M | 25 | 0 (0) 
TServer14 | 46 | 1.05B | 121 | 16 (435) 
TServer15 | 46 | 442.23M | 36 | 0 (0) 
TServer16 | 46 | 800.67M | 21 | 0 (0) 
TServer17 | 46 | 689.81M | 3 | 0 (0) 
TServer18 | 46 | 351.86M | 107 | 0 (0) 
TServer19 | 47 | 941.17M | 21 | 0 (0) 
TServer20 | 47 | 257.99M | 92 | 0 (0) 


Thanks,
Z




--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/question-on-data-block-cache-tp15906p15937.html
Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message