Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08727106AE for ; Thu, 19 Sep 2013 08:07:49 +0000 (UTC) Received: (qmail 53218 invoked by uid 500); 19 Sep 2013 08:07:46 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 52853 invoked by uid 500); 19 Sep 2013 08:07:32 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 52831 invoked by uid 99); 19 Sep 2013 08:07:28 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Sep 2013 08:07:28 +0000 Received: from localhost (HELO mail-we0-f173.google.com) (127.0.0.1) (smtp-auth username afuchs, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Sep 2013 08:07:28 +0000 Received: by mail-we0-f173.google.com with SMTP id w62so7641765wes.32 for ; Thu, 19 Sep 2013 01:07:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mAzt4xt8fvf03/bbSBalOraTUjmyETYul0ymPlfRoAw=; b=XawHJsuUzMZD/vZ5TsluozsQw2DmvdVeHSMRcZmF8qTyWTbbTw9T65IeglXgYpNwTM duRbnVuzS3MTJ/yZHhKOguybbKhZ8f+df1r/3NS9CnyMEY5cn9eVxI8ahXFshAKJdvoe 5PesGcqglsaFIuQLd1NeCNrZ8/C/MlIens5qCVyhaz7eCjzwrQhx9qVEl2PrlxDeMauG UHzFpdVH+RfVC5SkT5A19bqJDmfmlyefGoy0lhdaQTKGa0Cx6fha2IgDWPvtsI1+cqWL Xp/GHbBCUccmi6axtibkwUEolf4LqY5grNX2l63P0vOcO+/BPpkF2CbrhvZmbFLxe9cR 9NOg== MIME-Version: 1.0 X-Received: by 10.180.208.97 with SMTP id md1mr190379wic.41.1379578046168; Thu, 19 Sep 2013 01:07:26 -0700 (PDT) Received: by 10.217.44.4 with HTTP; Thu, 19 Sep 2013 01:07:26 -0700 (PDT) Received: by 10.217.44.4 with HTTP; Thu, 19 Sep 2013 01:07:26 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 Sep 2013 04:07:26 -0400 Message-ID: Subject: Re: BatchWriter performance on 1.4 From: Adam Fuchs To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a11c3390461c55404e6b80b11 --001a11c3390461c55404e6b80b11 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable The addMutations method blocks when the client-side buffer fills up, so you may see a lot of time spent in that method due to a bottleneck downstream. There are a number of things you could try to speed that up. Here are a few= : 1. Increase the BatchWriter's buffer size. This can smooth out the network utilization and increase efficiency. 2. Increase the number of threads that the BatchWriter uses to process mutations. This is particularly useful if you have more tablet servers than ingest clients. 3. Use a more efficient encoding. The more data you put through the BatchWriter, the longer it will take, even if that data compresses well at rest. 4. If you are seeing hold time show up on your tablet servers (displayed through the monitor page) you can increase the memory.maps.max to make minor compactions more efficient. Cheers, Adam On Sep 18, 2013 10:08 PM, "Slater, David M." wrote: > Hi, I=92m running a single-threaded ingestion program that takes data fro= m > an input source, parses it into mutations, and then writes those mutation= s > (sequentially) to four different BatchWriters (all on different tables). > Most of the time (95%) taken is on adding mutations, e.g. > batchWriter.addMutations(mutations); I am wondering how to reduce the tim= e > taken by these methods. **** > > ** ** > > 1) For the method batchWriter.addMutations(Iterable), does it > matter for performance whether the mutations returned by the iterator are > sorted in lexicographic order? **** > > ** ** > > 2) If the Iterable that I pass to the BatchWriter is very large= , > will I need to wait for a number of Batches to be written and flushed > before it will finish iterating, or does it transfer the elements of the > Iterable to a different intermediate list?**** > > ** ** > > 3) If that is the case, would it then make sense to spawn off short > threads for each time I make use of addMutations?**** > > ** ** > > At a high level, my code looks like this:**** > > ** ** > > BatchWriter bw1 =3D connector.createBatchWriter(=85)**** > > BatchWriter bw2 =3D =85**** > > =85**** > > while(true) {**** > > String[] data =3D input.getData();**** > > List mutations1 =3D parseData1(data);**** > > List mutations2 =3D parseData2(data);**** > > =85**** > > bw1.addMutations(mutations1);**** > > bw2.addMutations(mutations2);**** > > =85**** > > }**** > > **** > > Thanks, > David**** > --001a11c3390461c55404e6b80b11 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable

The addMutations method blocks when the client-side buffer f= ills up, so you may see a lot of time spent in that method due to a bottlen= eck downstream. There are a number of things you could try to speed that up= . Here are a few:
1. Increase the BatchWriter's buffer size. This can smooth out the netw= ork utilization and increase efficiency.
2. Increase the number of threads that the BatchWriter uses to process muta= tions. This is particularly useful if you have more tablet servers than ing= est clients.
3. Use a more efficient encoding. The more data you put through the BatchWr= iter, the longer it will take, even if that data compresses well at rest. 4. If you are seeing hold time show up on your tablet servers (displayed th= rough the monitor page) you can increase the memory.maps.max to make minor = compactions more efficient.

Cheers,
Adam

On Sep 18, 2013 10:08 PM, "Slater, David M.= " <David.Slater@jhuapl.e= du> wrote:

Hi, I=92m running a single-threaded ingestion program that takes data f= rom an input source, parses it into mutations, and then writes those mutati= ons (sequentially) to four different BatchWriters (all on different tables)= . Most of the time (95%) taken is on adding mutations, e.g. batchWriter.add= Mutations(mutations); I am wondering how to reduce the time taken by these = methods.

=A0

1) For t= he method batchWriter.addMutations(Iterable<Mutation>), does it matte= r for performance whether the mutations returned by the iterator are sorted= in lexicographic order?

=A0

2) If th= e Iterable<Mutation> that I pass to the BatchWriter is very large, wi= ll I need to wait for a number of Batches to be written and flushed before = it will finish iterating, or does it transfer the elements of the Iterable = to a different intermediate list?

=A0

3) If th= at is the case, would it then make sense to spawn off short threads for eac= h time I make use of addMutations?

= =A0

At a high level, my code looks like this:<= /u>

=A0

B= atchWriter bw1 =3D connector.createBatchWriter(=85)

BatchWriter bw2 =3D =85

=85<= u>

while(true) {

String[] data =3D input.getData();

List<Mutation> muta= tions1 =3D parseData1(data);

=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 List<Mutation> mutations2 =3D= parseData2(data);

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =85=

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 bw1.addMutations(mutations1);

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 bw2.addMutations(mutatio= ns2);

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =85=

}

Thanks,
David

--001a11c3390461c55404e6b80b11--