Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37FF5FE81 for ; Thu, 4 Apr 2013 19:26:45 +0000 (UTC) Received: (qmail 34082 invoked by uid 500); 4 Apr 2013 19:26:45 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 34034 invoked by uid 500); 4 Apr 2013 19:26:45 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 34026 invoked by uid 99); 4 Apr 2013 19:26:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 19:26:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jimmys.email@gmail.com designates 209.85.210.178 as permitted sender) Received: from [209.85.210.178] (HELO mail-ia0-f178.google.com) (209.85.210.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 19:26:38 +0000 Received: by mail-ia0-f178.google.com with SMTP id r13so2493727iar.37 for ; Thu, 04 Apr 2013 12:26:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=fUOSK+SEO3oAtIZfcDO1qyf15uBVfGvIZTjtJIMw3oI=; b=uuTpYraDVatyhKVKiaKfh7il5PBsIePF7W4eFVB8rcwWjGmcSQgMj75hdRIEWCKePH l+aamqG2SeYIpFedA3V4olPRlbHZAbF+p7AKIB6KNn7+fgGm1QK0vY7nlT6M1UE9zBCo B5zFuNnyU+gn+6UKTWXzt+82tC/fhFLhyE7Vdwe3svshTSDWs+KvwVmKPqpfuquevUf/ CftUf/O/5LHDhAOk8ZY0w07mQBB4fWKdZbLp2jSUzWYUqlAm9BX0VtW9/1L28sdjb7Lw 185O+JODgOVT56al1oj3OWQ18KEGhHSv9f+1XN/gPC67O2K/IindgxbFIH8F3SKAciN1 yYoA== MIME-Version: 1.0 X-Received: by 10.50.164.162 with SMTP id yr2mr4566346igb.0.1365103577392; Thu, 04 Apr 2013 12:26:17 -0700 (PDT) Received: by 10.50.156.193 with HTTP; Thu, 4 Apr 2013 12:26:17 -0700 (PDT) In-Reply-To: References: Date: Thu, 4 Apr 2013 15:26:17 -0400 Message-ID: Subject: Re: Increasing Ingest Rate From: Jimmy Lin To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=089e013a09cad01bf004d98df10a X-Virus-Checked: Checked by ClamAV on apache.org --089e013a09cad01bf004d98df10a Content-Type: text/plain; charset=UTF-8 On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton wrote: > Have you pre-split your tablet to spread the load out to all the machines? > Yes. We are using splits from loading the whole dataset previously. > Does the data distribution match your splits? > Yes. See above. > Is the ingest data already sorted (that is, it always writes to the last > tablet)? > No. The data writes to multiple tablets concurrently. We set up a queue > parameter and divide the data into multiple queues. > How much memory and how many threads are you using in your batchwriters? > I believe we have 16GB of memory for the Java writer with 18 threads > running per server. > > Check the ingest rates on tablet server monitor page and look for hot > spots. > There are certain servers that have higher ingest rates, and the server > that is busiest changes over time, but the overall ingestion rate will not > go up. > > > > > On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin wrote: > >> Hello, >> I am fairly new to Accumulo and am trying to figure out what is >> preventing my system from ingesting data at a faster rate. We have 15 nodes >> running a simple Java program that reads and writes to Accumulo and then >> indexes some data into Solr. The rate of ingest is not scaling linearly >> with the number of nodes that we start up. I have tried increasing several >> parameters including: >> - limit of file descriptors in linux >> - max zookeeper connections >> - tserver.memory.maps.max >> - tserver_opts memory size >> - tserver.mutation_queue.max >> - tserver.scan.files.open.max >> - tserver.walog.max.size >> - tserver.cache.data.size >> - tserver.cache.index.size >> - hdfs setting for xceivers >> No matter what changes we make, we cannot get the ingest rate to go over >> 100k entries/s and about 6 Mb/s. I know Accumulo should be able to ingest >> faster than this. >> Thanks in advance, >> >> Jimmy Lin >> >> > > --089e013a09cad01bf004d98df10a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton <eric.newton@gmail.com= > wrote:
Have you pre-split your tablet to sp= read the load out to all the machines?=C2=A0
Yes.=C2=A0 We are using splits from loading the whole dataset previous= ly.
Does the data distribution match your splits?
Y= es.=C2=A0 See above.
Is the ingest data already sorted (that is, = it always writes to the last tablet)?
No.=C2=A0 The data writes to multiple tablets concurrently.=C2=A0 We s= et up a queue parameter and divide the data into multiple queues.
=
How much memory and how many threads are you using in your batchwriter= s?
I believe we have 16GB of memory for the Java writer with 18 threads r= unning per server.

Check the ingest rates on table= t server monitor page and look for hot spots.
There are certain s= ervers that have higher ingest rates, and the server that is busiest change= s over time, but the overall ingestion rate will not go up.
=C2=A0
= =C2=A0

On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin = <jimmys.email@gmail.com> wrote:
Hello,
I am fa= irly new to Accumulo and am trying to figure out what is preventing my syst= em from ingesting data at a faster rate. We have 15 nodes running a simple= Java program that reads and writes to Accumulo and then indexes some data = into Solr. The rate of ingest is not scaling linearly with the number of n= odes that we start up. I have tried increasing several parameters includin= g:
- limit of file descriptors in linux
- max zooke= eper connections
- tserver.memory.maps.max
- tserver_op= ts memory size
- tserver.mutation_queue.max
- tserver.s= can.files.open.max
- tserver.walog.max.size
- tserver.cache.data.size
- tserver.cache.index.size
- hdfs setting for xceivers
No matter what changes we make, we cannot get the ingest rate = to go over 100k entries/s and about 6 Mb/s. I know Accumulo should be able= to ingest faster than this.
Thanks in advance,
=C2=A0
Jimmy Lin
=C2=A0
=


--089e013a09cad01bf004d98df10a--