Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79AD0114AE for ; Tue, 29 Jul 2014 17:43:28 +0000 (UTC) Received: (qmail 55976 invoked by uid 500); 29 Jul 2014 17:43:28 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 55923 invoked by uid 500); 29 Jul 2014 17:43:28 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 55913 invoked by uid 99); 29 Jul 2014 17:43:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jul 2014 17:43:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of josh.elser@gmail.com designates 209.85.192.53 as permitted sender) Received: from [209.85.192.53] (HELO mail-qg0-f53.google.com) (209.85.192.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jul 2014 17:43:21 +0000 Received: by mail-qg0-f53.google.com with SMTP id q107so10555623qgd.26 for ; Tue, 29 Jul 2014 10:42:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=56ULLNYkwhnKeDUZqr17mjB7eZzKpoZsvlTHOY1Zw6o=; b=L5jvEbZJoJMTakMOtj6H57gEfTqtU6323kpjR0U8KsbY3ok5HE+gf8bkskFebXAQN1 EjMcZhaKoiNYQwCiJIFj9wQ8ftTHfpVtAfNaAKFW4zHYkC34oQM5okl/O8qQuYuIf8p6 3i6eHigpCpNc4btzZEsGa3D8x0G4Ls87fTwaqVrBbLRWs4ULQwXottTLOYjeyYU0AP43 Unh9ZDW3OEj6GGI5K8JNjffXPbZR+CDzxXwHJtSThLFXenznemLvkBsVe2oy945I7iKc bk/8vvDOIQiX9ypOESCVrGrpiyQ3AU144ZJKJx3FY3OVMo+0Zr9oCBJv5dUcGhFAXyhA 37lw== X-Received: by 10.224.114.74 with SMTP id d10mr5868665qaq.33.1406655778058; Tue, 29 Jul 2014 10:42:58 -0700 (PDT) Received: from HW10447.local (pool-71-166-48-47.bltmmd.fios.verizon.net. [71.166.48.47]) by mx.google.com with ESMTPSA id b104sm27050390qga.24.2014.07.29.10.42.56 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Jul 2014 10:42:57 -0700 (PDT) Message-ID: <53D7DD1C.8040503@gmail.com> Date: Tue, 29 Jul 2014 13:42:52 -0400 From: Josh Elser User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Request for Configuration Help for basic test. Tservers dying and only one tablet being used References: <561F667422EF3E4B8CEFE151FB54FAC207A258D0@EADC-E-MABPRD13.ad.gd-ais.com> In-Reply-To: <561F667422EF3E4B8CEFE151FB54FAC207A258D0@EADC-E-MABPRD13.ad.gd-ais.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Some comments inline On 7/29/14, 1:07 PM, Pelton, Aaron A. wrote: > Hi All, > > I am new to Accumulo and I apologize if the answers to my questions are > already posted somewhere. I�ve done a fair amount of googling and poking > around the manuals etc. > > I am just doing a simple test with two machines, one producing about 600 > threads on the network to stream simultaneous writes to a rest service, > and the other producing about 300 threads on the network to perform > simultaneous queries to a rest service. The rest service has Accumulo > API calls in it to write out and query data. > > I have inherited the following configuration > > -Squirrel Bundle distribution of Accumulo 1.5.0 > > -1 Master machine to start and stop Accumulo services on > > -12 data nodes running tservers. The first three of these also running > the zookeeper instances. And, nodes 4-6 running tracers. > > I have noticed the following issues with configuration and changed them > as follows > > -Changed swapiness to 0 on all nodes > > -Was getting OutOfMemoryExceptions after the above still, and after > running test for long duration. Thus, increased Java Heap size from 1g > to 4g, which is still far below the physical ram on the nodes. > > -Increased java heap from 1g to 2g on master node > > -I also increased the following properties > > o > > o tserver.memory.maps.max > > o 2G > > o > > o > > o > > o tserver.cache.data.size > > o 512M > > o > > o > > o > > o tserver.cache.index.size > > o 512M > > o > > -Changed the ulimit for virtual memory to unlimited > > -Changed the ulimit for files opened to 65536 > > -Changed the ulimit for max user processes to 1024 These all look good. Just keep in mind that tserver.cache.data.size and tserver.cache.index.size will be on the JVM heap while tserver.memory.maps.max is off heap (assuming you're using the native maps which you very well should be -- I assume Sqrrl's distro set this up for you) > -A tomcat instance with a server socket accepting up to 1,000 threads / > user connections to a rest service that eventually makes a read / write > out to an Accumulo connector instance. > > -Changed the zookeeper connection limit max to 0 since this is just a > test environment > > -Noticed that code I had inherited didn�t have close calls on the > scanner objects in the rest service b/c it was originally designed for > Accumulo 1.4 in which there wasn�t such an API. Scanners can clean up after themselves, whereas BatchScanners don't. A close method was added to ScannerBase (the parent class of Scanner and BatchScanner) to let you seamlessly swap out a Scanner with a BatchScanner (and vice versa) while not leaking any resources. In short, you can call Scanner#close, but it's just a no-op. > -This may be wrong, but in an effort to see my ~900 connections > simultaneously get as much access to db writes/reads for servicing, I > up�d some thread counts for > > o > > o tserver.server.threads.minimum > > o 75 > > o > > o > > o > > o master.server.threads.minimum > > o 300 > > o > > I have a couple of problems to note: > > 1.Ingest speeds seem kinda slow. I would anticipate network overhead but > not enough to reduce writes to 125 records / sec when each record is > only a few kB. What do you actually do when you receive an HTTP request to write to Accumulo. It sounds like you're reading data and then writing? Is each HTTP request creating its own BatchWriter? More insight to what a "write" looks like in your system (in terms of Accumulo API calls) would help us make recommendations about more efficient things you can do. > a.I believe this is due to the fact that I�m only seeing one tserver > primarily active at ingesting, with one tbalet in particular for the > table receiving the bulk of the data. > > b.I have added pre-splits upon table creation for each letter of the > alphabet, plus the digits 0-9. As this is a test with a simple loop > creating ID values, I throw 2 alpha chars randomly in front of the > generated number in my loop and use that as the ID to distribute > hopefully the IDs across tablets for this table. A record ID ingested > might look like �bk1234:8876�, whereby it has random 2 chars, orig ID > value, colon, and a timestamp. Sample pre-splitting: (Granted the array > could be constructed more gracefully, but for a quick test, meh). > > *try* > > { > > conn.tableOperations().create(/TABLE_NAME/); > > *final*SortedSet sortedSplits = *new*TreeSet(); > > *for*(String binPrefix : *new*String[] { "a", "b", "c", "d", "e", "f", > "g", "h", "i", "j", "k", "l", "m", > > "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "1", > "2", "3", "4", "5", "6", "7", > > "8", "9", "0"}) > > { > > sortedSplits.add(*new*Text(binPrefix)); > > } > > conn.tableOperations().addSplits(/TABLE_NAME/, sortedSplits); > > } > > *catch*(TableExistsException | TableNotFoundException exception) > > { > > /LOGGER/.warn("Could not create table or sorted splits", exception); > > } Good, pre-splitting your table should help with random data, but if you're only writing data to one tablet, you're stuck (very similar to hot-spotting reducers in MapReduce jobs). > 2.Tservers running on the data node halt after about 4 hours in of > processing. I�m attempting to ingest into the billions, hopefully > trillions of records range. Generally it is the ones that aren�t under > load in the beginning, until finally the one that is handling the bulk > of the load crashes typically last. In the beginning, I noticed in the > tserver logs the OutOfMemoryException, but haven�t seen that in the past > few runs after the memory adjustments. In fact the tserver log doesn�t > say anything about why it stopped. Also didn�t notice anything unusual > in the zookeeper log other than the occasional CancelledKeyException. Make sure you check both the tserver_hostname.debug.log, tserver_hostname.out and tserver_hostname.err files. OOMEs sometimes don't make it to the log file because of the JVM tearing down. You should be able to find something as to why the tserver stopped. > 3.Lastly can anyone approximate with the 12 nodes that I have, what kind > of ingest speed should I see if things were configured correctly in > number of records per second based on small record sizes of a few kB. > And, is anything obviously wrong with the configurations mentioned above > that would improve throughput? Generally, a "normal" machine will be able to do ingest of about 200k records at 150bytes for ~30MB/s. You might also want to try increasing tserver.mutation.queue.max to 1M in accumulo-site.xml (restart required). You can find some extra information about that on the releases notes: http://accumulo.apache.org/release_notes/1.5.1.html#known-issues. Not sure if Sqrrl's distribution has done this already for you. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -- > > Sincerely, > > Aaron Pelton >