accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ott, Charles H." <CHARLES.H....@saic.com>
Subject RE: Uneven distribute of Hosted Tablets?
Date Fri, 31 May 2013 14:33:00 GMT
For our biggest table, we are not adding data monotonically. (by which I
assume you mean each key added is greater than the key before it.)

 

The row keys are our indexed terms, so that we can scan on the row key.
So when a record is added, it's rowkey could start with any ascii
character([\x00-\x7F]).

 

 

 

From: user-return-2640-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2640-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of William Slacum
Sent: Friday, May 31, 2013 10:14 AM
To: user@accumulo.apache.org
Subject: Re: Uneven distribute of Hosted Tablets?

 

You could also lower the split threshold (do a `config -t <table>` and
you'll see a parameter with a similar name) and then compact the table.

How are you ingesting data? I believe that adding monotonically
increasing keys can lead to a pattern where only the last tablet is
being added to and split (not 100% on this). If you know some
distribution for the keys you're adding, it might be a good idea to add
split points to the table to increase parallelism. 

 

On Fri, May 31, 2013 at 10:00 AM, Ott, Charles H. <
CHARLES.H.OTT@saic.com> wrote:

I performed a clean shutdown and startup of all the processes using the 
start-all.sh/stop-all.sh scripts.

 

The systems have only been online for about 5 minutes and everything is
working.  But I see the following Recent WARN in the Logs:

 

time                                       application
count    level      message

31 09:37:57,0774               tserver:1620-accumulo  1
WARN   Future location is not to this server for the root tablet

 

Hosted tablet distribution seems to be worse:

 

(Image Below Here)

 
(Image Above Here)

 

I am able to login and scans seems to be responsive.   I noticed that
when we had our entries ~20 M count, our batch scans were taking much
longer.  I was hoping that by distributing the tablets evenly, and
splitting some of the bigger tables, we could get better performance.

As for splitting the bigger table, I received a message from a peer.  He
mentioned that I could create a new table and split it on the values I
want.  Then use Map reduce job to move the data from the single tablet
table to split table.  

 

From: user-return-2638-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2638-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of John Vines
Sent: Thursday, May 30, 2013 5:30 PM
To: user@accumulo.apache.org
Cc: Lahr-Vivaz, Emilio F.


Subject: Re: Uneven distribute of Hosted Tablets?

 

Your distribution is cause for concern. I thought we had resolved a lot
of the balancer issues in 1.4.1 or 1.4.2. Are you seeing any errors from
the master in your logs? Worst case scenario is you just have to kill
the master process and start it back up and you should see things
balancing out.

 

On Thu, May 30, 2013 at 4:40 PM, Ott, Charles H. <
CHARLES.H.OTT@saic.com> wrote:

Thanks for the feedback.  I will keep what you said in mind.

 

From: user-return-2636-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2636-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of David Medinets
Sent: Thursday, May 30, 2013 4:34 PM
To: accumulo-user
Subject: Re: Uneven distribute of Hosted Tablets?

 

Don't worry about splits until you have a few billion entries and a lot
more servers. What you're seeing now is just a bad signal to noise
ratio.

 

On Thu, May 30, 2013 at 11:22 AM, Ott, Charles H. <
CHARLES.H.OTT@saic.com> wrote:

First I want to say thanks to the you all.  The information provided by
this mailing list has been invaluable to me and I appreciate it.

 

My newest concern is the uneven allocation of hosted tablets across my
tablet servers:

 

(Image Pasted below here)

 

(Image Pasted above here)

 

I have been reading about pre-splitting tables in the Accumulo guide.
But I am not sure if that would be the 'fix' for this.  (Or even if this
needs fixing.)

 

I have 3 tables that could potentially grow to n number of records.
Currently of those tables (and there single tablet) reside on the
1620-accumulo server (Hosting 24 tablets).

 

Since there is already several entries on those tables, would splitting
them be appropriate?  Does splitting guarantee that the new tablets will
be allocated to Node1 instead of Node 3? Or perhaps could I "re-balance"
the cluster so that all of the tablet servers host an approximately
equal number of tablets?

 

These tablet servers were all brought up at separate times and I have
not performed any optimizations or custom operations on them.

 

 

Thanks,

Charles

 

 

 

 

 


Mime
View raw message