Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 856B4116A9 for ; Fri, 20 Jun 2014 14:33:22 +0000 (UTC) Received: (qmail 72848 invoked by uid 500); 20 Jun 2014 14:33:22 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 72799 invoked by uid 500); 20 Jun 2014 14:33:22 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 72789 invoked by uid 99); 20 Jun 2014 14:33:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 14:33:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.128.171] (HELO mail-ve0-f171.google.com) (209.85.128.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 14:33:17 +0000 Received: by mail-ve0-f171.google.com with SMTP id jz11so3670427veb.30 for ; Fri, 20 Jun 2014 07:32:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=LritTsvY5LoSdf56VmgUKJjqiGVn7GlE+Wxy8P+M0BM=; b=TWu6SjfB3wGTUdiYg2NPuJvldaWEv4qZ2BPugjGlBYaIx8vBLufKoYgU7XJh7YHU0F Hw3Mvy2NIUEebiPR/fq1AJ0V/WUMquEZV6m9ki0Un1LjEsnJDc1SRXb3nNOCXVCUh9eK sOQjdL5DoXVaSvu4XQH+wOSXw8N09Qgwp62ysssfjfk9uS3J0IOVFexXS2OB+3hpS7FR eCzt8rIkedk0IyuCxxOF/Q+/hQRoxq9Q4g08TYJc18CEwM31gS4ixCPy3QH9S54YZbGX 8NFXl6Tt/2c9YcOO6FqzjpzN6mW07OKBE2y5nbrfJxTB7DjT7kB19AsPJrxGi7uVZYeF xOHg== X-Gm-Message-State: ALoCoQnfJPnicaaDjLqLwBpdlPApSowj6cygN5MntR6OW71M6TAG49H8YxNT1VdP0GWtdQJAWUxt MIME-Version: 1.0 X-Received: by 10.52.252.4 with SMTP id zo4mr2805948vdc.20.1403274776283; Fri, 20 Jun 2014 07:32:56 -0700 (PDT) Received: by 10.221.21.199 with HTTP; Fri, 20 Jun 2014 07:32:56 -0700 (PDT) In-Reply-To: References: Date: Fri, 20 Jun 2014 10:32:56 -0400 Message-ID: Subject: Re: BatchWriter woes From: Keith Turner To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a1133f7709022ac04fc455ea2 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133f7709022ac04fc455ea2 Content-Type: text/plain; charset=UTF-8 On Thu, Jun 19, 2014 at 11:57 PM, William Slacum < wilhelm.von.cloud@accumulo.net> wrote: > I'm finding some ingest jobs I have running in a bit of a sticky sitch: > > I have a MapReduce job that reads a table, transforms the entries, creates > an inverted index, and writes out mutations to two tables. The cluster size > is in the tens of nodes, and I usually have 32 mappers running. > > The batch writer configs are: > - memory buffer: 128MB > - max latency: 5 minutes > - threads: 32 > - timeout: default Long.MAX_VALUE > > I know we're on Accumulo 1.5.0 and I believe using CDH 4.5.0, Zookeeper > 3.3.6. > > I'm noticing an ingest pattern of usually ok rates for the cluster (in the > 100K+ entries per second), but after some time they start to drop off to > ~10K E/s. Sometimes this happens when a round of compactions kicks off > (usually major, not minor), sometimes not. Eventually, the mappers will > timeout. We have them set to timeout after 10 minutes of not reporting > status. > > I added a bit of probing/profiling, and noticed that there's an > exponential growth in per entry processing time in the mapper. They're of > pretty uniform size, so there should not be much variance in the times. The > times go from single milliseconds, to hundreds of milliseconds, to seconds, > to minutes. > > If I jstack a mapper, it's sitting in TabletServerBatchWriter#waitRTE. It > should only enter that method if the batch writer has (a) too much data > buffered or (b) the user requested a flush. I'm inferring that (a) is the > case, because there is no explicit TabletServerBatchWriter#flush() call. > > We did notice that there was a send thread trying to send to a dead > server. We can't ssh to the IP it was trying to send to, and have verified > manually that it's not listed in the current tablet servers. We did notice > that the master log is reporting that a recovery on a WAL associated with > that IP is under way. Looking back, the master had been reporting that > message for about a day and a half. The message was similar to the one > described in https://issues.apache.org/jira/browse/ACCUMULO-1364 . I do > not know the significance of this as it relates to my jobs. > Do you think its trying to write to a half dead server? Does that server still have locations in the metadata table? > > I did some digging in TabletServerBatchWriter, and the only thing I can > kind of see happening is that if SendTask#sendMutationsToTabletServer > receives a TException, it rethrows it as an IOException, then SendTask#send > will catch that exception and add the mutations to the failures collection. > Since the timeout is Long.MAX_VALUE, I think it's possible this loop can > continue forever or until some outside force kills the entire process. > > Does this seem coherent? Is there anything else that could cause this? > > I'm on the track of converting the code over to using bulk ingest, but I > think there's an issue with a vanilla BatchWriter that I would just be > getting around instead of actually fixing. > > Also, I'd love to provide logs, but there's a high amount of friction in > getting them, so I won't be able to deliver on that front. > --001a1133f7709022ac04fc455ea2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable



On Thu, Jun 19, 2014 at 11:57 PM, William Slacum = <wil= helm.von.cloud@accumulo.net> wrote:
I'm finding some ingest= jobs I have running in a bit of a sticky sitch:

I have = a MapReduce job that reads a table, transforms the entries, creates an inve= rted index, and writes out mutations to two tables. The cluster size is in = the tens of nodes, and I usually have 32 mappers running.

The batch writer configs are:
- memory buffer= : 128MB
- max latency: 5 minutes
- threads: 32
- timeout: default Long.MAX_VALUE

I know we'= re on Accumulo 1.5.0 and I believe using CDH 4.5.0, Zookeeper 3.3.6.

I'm noticing an ingest pattern of usually ok rates = for the cluster (in the 100K+ entries per second), but after some time they= start to drop off to ~10K E/s. Sometimes this happens when a round of comp= actions kicks off (usually major, not minor), sometimes not. Eventually, th= e mappers will timeout. We have them set to timeout after 10 minutes of not= reporting status.

I added a bit of probing/profiling, and noticed that th= ere's an exponential growth in per entry processing time in the mapper.= They're of pretty uniform size, so there should not be much variance i= n the times. The times go from single milliseconds, to hundreds of millisec= onds, to seconds, to minutes.

If I jstack a mapper, it's sitting in TabletServerB= atchWriter#waitRTE. It should only enter that method if the batch writer ha= s (a) too much data buffered or (b) the user requested a flush. I'm inf= erring that (a) is the case, because there is no explicit TabletServerBatch= Writer#flush() call.

We did notice that there was a send thread trying to se= nd to a dead server. We can't ssh to the IP it was trying to send to, a= nd have verified manually that it's not listed in the current tablet se= rvers. We did notice that the master log is reporting that a recovery on a = WAL associated with that IP is under way. Looking back, the master had been= reporting that message for about a day and a half. The message was similar= to the one described in=C2=A0https://issues.apache.org/jira/browse/= ACCUMULO-1364 . I do not know the significance of this as it relates to= my jobs.=C2=A0

Do you think its trying to write to = a half dead server?=C2=A0 Does that server still have locations in the meta= data table?
=C2=A0

I did some digging in TabletServerBatchWriter, and the = only thing I can kind of see happening is that if SendTask#sendMutationsToT= abletServer receives a TException, it rethrows it as an IOException, then S= endTask#send will catch that exception and add the mutations to the failure= s collection. Since the timeout is Long.MAX_VALUE, I think it's possibl= e this loop can continue forever or until some outside force kills the enti= re process.

Does this seem coherent? Is there anything else that co= uld cause this?

I'm on the track of converting= the code over to using bulk ingest, but I think there's an issue with = a vanilla BatchWriter that I would just be getting around instead of actual= ly fixing.

Also, I'd love to provide logs, but there's a h= igh amount of friction in getting them, so I won't be able to deliver o= n that front.=C2=A0

--001a1133f7709022ac04fc455ea2--