Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6CF8810DB8 for ; Tue, 5 May 2015 15:50:11 +0000 (UTC) Received: (qmail 32269 invoked by uid 500); 5 May 2015 15:50:06 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 32230 invoked by uid 500); 5 May 2015 15:50:06 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 32216 invoked by uid 99); 5 May 2015 15:50:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 15:50:05 +0000 X-ASF-Spam-Status: No, hits=1.3 required=5.0 tests=SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for dev@accumulo.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 May 2015 15:49:39 +0000 Received: from mail-qk0-f172.google.com (mail-qk0-f172.google.com [209.85.220.172]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 14D8B24E9F for ; Tue, 5 May 2015 15:49:38 +0000 (UTC) Received: by qkx62 with SMTP id 62so108760071qkx.0 for ; Tue, 05 May 2015 08:49:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=9sr5iRptx0A5MDNKXkQqUyki9RFXfv+uYeFrZ/Zuqz0=; b=GGWCth7KN5fjAEKzRTwMr/TKaRV5djhy/T23GbDNrCQ7U6C/X5mXN9eMJ7KbxQwDbV vRmu1KjeLrRKPAGzSXun4NRvNG3rknKCl35aT2LVhu0DQxNbP99+QmwAFpIOGqQTrz2N xZCEE983+4mqBYwTirJEkIDCKaZpwwGN/WnwtpqDKX0a1tEf53zrTqYCzx37OHaXaGm3 jLMJfW9jfWHn0MbYQTsqHPZGi0S7jd+vZWGjw6Po8qg0z2dxLTZQ7vCkqHT6MRyuC/N/ pGYuABWkBcMmTUWslxJ5BilL668SkJmOoHv5Sn+e/1YYW9cteHf3gTlj7HWUVrzmjhaC X47Q== X-Received: by 10.140.34.215 with SMTP id l81mr34469413qgl.43.1430840977042; Tue, 05 May 2015 08:49:37 -0700 (PDT) Received: from hw10447.local (pool-72-81-135-153.bltmmd.fios.verizon.net. [72.81.135.153]) by mx.google.com with ESMTPSA id 8sm12443857qgy.39.2015.05.05.08.49.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 May 2015 08:49:36 -0700 (PDT) Message-ID: <5548E68E.4020900@gmail.com> Date: Tue, 05 May 2015 11:49:34 -0400 From: Josh Elser User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: dev@accumulo.apache.org Subject: Re: Ingest speed References: <1430818553916-14005.post@n5.nabble.com> <5548CF37.2050308@gmail.com> <1430839974510-14013.post@n5.nabble.com> In-Reply-To: <1430839974510-14013.post@n5.nabble.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On a single node, you can easily achieve 10s of thousands of key-value inserts per second. Depending on how many columns are in each row, 600 a second is rather slow :) Your loop looks good. Using a single BatchWriter and letting it amortize sending data from your client to the servers will be the most efficient. If the JSON parsing is the slowest part, you could consider a single thread reading the file and provide the line to a thread pool, parse the line and add it to some concurrent data structure. You could have a consumer on that data structure reading each parsed object and sending it to Accumulo. Alternatively, this is where MapReduce is a clear win as it's very good at parallelizing these types of problems. You could use the FileInputFormat and the AccumuloOutputFormat to accomplish this task. Andrea Leoni wrote: > Thank you for your answer. > Today i tried to create a big command file and push it to shell (about 300k > insert per file). As you said it is too slow for me (about 600 inserted > row/sec) > > I'm on Accumulo by just one week. I'm a noob but i'm learning. > > Actually my app has to store a large number of data. > > The row is the timestamp and the family/qualif are the column... I catch my > data from a JSON file, so my app scan it for new records, parse it and once > for record create a mutation and push it on Accumulo with batchWriter... > > Maybe I wrong something that can increase the speed of my inserts. > > Actually I: > > LOOP > 1) read a json line > 2) parse it > 3) create a mutation > 4) put in this mutation the line's information > 5) use batchWriter to insert mutation in Accumulo > END LOOP > > Is it all right? I now that point 1) and 2) are slow but it's necessary and > i use the fastest json parser i've found online. > > Thank you so much again! > (and sorry again for my bad english!) > > > > ----- > Andrea Leoni > Italy > Computer Engineering > -- > View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Ingest-speed-tp14005p14013.html > Sent from the Developers mailing list archive at Nabble.com.