accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <billie.rina...@gmail.com>
Subject Re: Uneven distribute of Hosted Tablets?
Date Fri, 31 May 2013 17:31:37 GMT
Hmm.  Anything on the one that reported assignment failed?

Billie


On Fri, May 31, 2013 at 9:53 AM, Ott, Charles H. <CHARLES.H.OTT@saic.com>wrote:

> 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Got
> unloadTablet message from user: !SYSTEM****
>
> 2013-05-31 09:49:53,471 [tabletserver.Tablet] DEBUG:
> initiateClose(saveState=true queueMinC=false disableWrites=false) !0;!0<<*
> ***
>
> 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Failed to
> unload tablet !0;!0<<... it was alread closing or closed : Tablet !0;!0<<
> already closing****
>
> ** **
>
> The timestamp is 12 minutes off, since the clocks are out of sync,  but
> there seems to be the same number of debug statements above as there were
> errors in the master.****
>
> ** **
>
> *From:* user-return-2646-CHARLES.H.OTT=saic.com@accumulo.apache.org[mailto:
> user-return-2646-CHARLES.H.OTT=saic.com@accumulo.apache.org] *On Behalf
> Of *Billie Rinaldi
> *Sent:* Friday, May 31, 2013 12:47 PM
>
> *To:* user@accumulo.apache.org
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
> ** **
>
> Can you go to one of those servers that is reporting unload / assignment
> failed and check its tserver log to see why it failed?
>
> Billie****
>
> ** **
>
> On Fri, May 31, 2013 at 9:39 AM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
> I am not sure if I am using one of the balancers that comes with
> Accumulo.  There are some errors in my logs for the master since I did the
> clean shutdown/startup this morning:****
>
>  ****
>
> 2013-05-31 09:37:57,592 [master.Master] ERROR: 10.35.56.92:9997 reports
> unload failed for tablet !0;!0<< (A lot of these errors showed up)****
>
>  ****
>
> 2013-05-31 09:37:57,795 [master.Master] ERROR: 10.35.58.81:9997 reports
> assignment failed for tablet !0;!0<< (only one of these)****
>
>  ****
>
> 2013-05-31 09:37:05,784 [master.Master] ERROR: master:
> 1620-accumulo.dhcp.saic.com 10.35.56.92:9997 reports unload failed for
> tablet !0;!0<< (a lot of these)****
>
>  ****
>
> The entire batch of errors all occurred within 1 minute.  Then they don’t
> occur anymore.****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* user-return-2644-CHARLES.H.OTT=saic.com@accumulo.apache.org[mailto:
> user-return-2644-CHARLES.H.OTT=saic.com@accumulo.apache.org] *On Behalf
> Of *Billie Rinaldi
> *Sent:* Friday, May 31, 2013 12:14 PM****
>
>
> *To:* user@accumulo.apache.org
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
>  ****
>
> So (at the risk of stating the obvious) it seems like your cluster is in a
> funny state.  I would expect the counts in the "Hosted Tablets" column to
> all be roughly the same, especially after restarting the master, assuming
> you're using one of the balancers that comes with Accumulo.  It's possible
> the cluster has gotten into this state due to the clock differences.
> Accumulo has a mechanism called "logical time" to deal with clock
> differences, but it is not enabled by default.  You can enable it when you
> create a table.  If you don't enable this it is recommended that you use
> NTP to synchronize the clocks on your cluster.  The !METADATA table has
> logical time by default, but your other tables might not contain what you
> expect them to if you haven't enabled logical time.****
>
> That said, I'm not sure why the clock issue would be affecting the
> balancing.  You mentioned the new warnings you saw on the monitor page
> after you restarted the system.  Could you see if there are any older
> errors in your log files?
>
> Billie****
>
>  ****
>
> On Fri, May 31, 2013 at 8:10 AM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
> -bash-4.1$ ssh 1620-accumulo****
>
> -bash-4.1$ date****
>
> Fri May 31 *10:52:49 *EDT 2013****
>
>  ****
>
> -bash-4.1$ ssh 1620-Node1****
>
> -bash-4.1$ date****
>
> Fri May 31 *11:05:48* EDT 2013****
>
>  ****
>
> -bash-4.1$ ssh 1620-Node2****
>
> -bash-4.1$ date****
>
> Fri May 31 *11:05:58* EDT 2013****
>
>  ****
>
> -bash-4.1$ ssh 1620-Node3****
>
> -bash-4.1$ date****
>
> Fri May 31 *11:05:58* EDT 2013****
>
>  ****
>
> Looks like the master(1620-accumulo) and it’s tablet server are 12-13
> minutes behind the nodes.  I’m not sure my
> zookeeper+Hadoop+Accumulo+storm+Kafka stack will appreciate moving forward
> in time 12 minutes.  ****
>
>  ****
>
> *From:* user-return-2642-CHARLES.H.OTT=saic.com@accumulo.apache.org[mailto:
> user-return-2642-CHARLES.H.OTT=saic.com@accumulo.apache.org] *On Behalf
> Of *Billie Rinaldi
> *Sent:* Friday, May 31, 2013 11:02 AM
> *To:* user@accumulo.apache.org****
>
>
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
>  ****
>
> Those last contact times are concerning as well.  Have they always looked
> like that?  I notice they were roughly the same on your first screenshot.
> Are your server clocks not in sync?****
>
> Billie****
>
>  ****
>
> On Fri, May 31, 2013 at 7:00 AM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
> I performed a clean shutdown and startup of all the processes using the
> start-all.sh/stop-all.sh scripts.****
>
>  ****
>
> The systems have only been online for about 5 minutes and everything is
> working.  But I see the following Recent WARN in the Logs:****
>
>  ****
>
> time
> application                          count    level      message****
>
> 31 09:37:57,0774               tserver:1620-accumulo  1
> WARN   Future location is not to this server for the root tablet****
>
>  ****
>
> Hosted tablet distribution seems to be worse:****
>
>  ****
>
> (Image Below Here)****
>
>
> (Image Above Here)****
>
>  ****
>
> I am able to login and scans seems to be responsive.   I noticed that when
> we had our entries ~20 M count, our batch scans were taking much longer.  I
> was hoping that by distributing the tablets evenly, and splitting some of
> the bigger tables, we could get better performance.****
>
> As for splitting the bigger table, I received a message from a peer.  He
> mentioned that I could create a new table and split it on the values I
> want.  Then use Map reduce job to move the data from the single tablet
> table to split table.  ****
>
>  ****
>
> *From:* user-return-2638-CHARLES.H.OTT=saic.com@accumulo.apache.org[mailto:
> user-return-2638-CHARLES.H.OTT=saic.com@accumulo.apache.org] *On Behalf
> Of *John Vines
> *Sent:* Thursday, May 30, 2013 5:30 PM
> *To:* user@accumulo.apache.org
> *Cc:* Lahr-Vivaz, Emilio F.****
>
>
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
>  ****
>
> Your distribution is cause for concern. I thought we had resolved a lot of
> the balancer issues in 1.4.1 or 1.4.2. Are you seeing any errors from the
> master in your logs? Worst case scenario is you just have to kill the
> master process and start it back up and you should see things balancing out.
> ****
>
>  ****
>
> On Thu, May 30, 2013 at 4:40 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
> Thanks for the feedback.  I will keep what you said in mind.****
>
>  ****
>
> *From:* user-return-2636-CHARLES.H.OTT=saic.com@accumulo.apache.org[mailto:
> user-return-2636-CHARLES.H.OTT=saic.com@accumulo.apache.org] *On Behalf
> Of *David Medinets
> *Sent:* Thursday, May 30, 2013 4:34 PM
> *To:* accumulo-user
> *Subject:* Re: Uneven distribute of Hosted Tablets?****
>
>  ****
>
> Don't worry about splits until you have a few billion entries and a lot
> more servers. What you're seeing now is just a bad signal to noise ratio.*
> ***
>
>  ****
>
> On Thu, May 30, 2013 at 11:22 AM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
> First I want to say thanks to the you all.  The information provided by
> this mailing list has been invaluable to me and I appreciate it.****
>
>  ****
>
> My newest concern is the uneven allocation of hosted tablets across my
> tablet servers:****
>
>  ****
>
> (Image Pasted below here)****
>
> ****
>
> (Image Pasted above here)****
>
>  ****
>
> I have been reading about pre-splitting tables in the Accumulo guide.  But
> I am not sure if that would be the ‘fix’ for this.  (Or even if this needs
> fixing.)****
>
>  ****
>
> I have 3 tables that could potentially grow to *n* number of records.
> Currently of those tables (and there single tablet) reside on the
> 1620-accumulo server (Hosting 24 tablets).****
>
>  ****
>
> Since there is already several entries on those tables, would splitting
> them be appropriate?  Does splitting guarantee that the new tablets will be
> allocated to Node1 instead of Node 3? Or perhaps could I “re-balance” the
> cluster so that all of the tablet servers host an approximately equal
> number of tablets?****
>
>  ****
>
> These tablet servers were all brought up at separate times and I have not
> performed any optimizations or custom operations on them.****
>
>  ****
>
>  ****
>
> Thanks,****
>
> Charles****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Mime
View raw message