Return-Path: X-Original-To: apmail-phoenix-dev-archive@minotaur.apache.org Delivered-To: apmail-phoenix-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76D4217A56 for ; Thu, 5 Mar 2015 15:55:15 +0000 (UTC) Received: (qmail 91704 invoked by uid 500); 5 Mar 2015 15:55:15 -0000 Delivered-To: apmail-phoenix-dev-archive@phoenix.apache.org Received: (qmail 91652 invoked by uid 500); 5 Mar 2015 15:55:15 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 91640 invoked by uid 99); 5 Mar 2015 15:55:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2015 15:55:15 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ndimiduk@gmail.com designates 74.125.82.182 as permitted sender) Received: from [74.125.82.182] (HELO mail-we0-f182.google.com) (74.125.82.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2015 15:55:09 +0000 Received: by wesx3 with SMTP id x3so7208250wes.4 for ; Thu, 05 Mar 2015 07:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zs7g0gv1qDhn8S2LHY+91IB4NsYu+kadEX5zGjB8WIo=; b=ErN2zpVPaVsR8VMaAv0YORDaKHZRHfXtBLV2GsohtG0V6T2LbLvqfc0DStipCwd/Vu GZIU52iZtj+wNd+GFl7d9JcgjkiugJM44RQM9kw05xoR9D5axs6MdcAjc++QEPco/6Ks xIAH2TWlbqPORF3SC1Rioh6XIQTswPyk0yfQApVq+2JsXLb2Rt3XjWPn8mKQWG1SQ17z lZcXKEqp5XOKUkJ1hzyEP13Jls3v1vekBZPTnb7PWtgJIjMavchghIKTTpFlRnMvPF7F xkkv0lgicXy7jPFclfSJtCzg618T9Q1J8gzs5DmkibQrfohx85J7+R+xTM+Gz4RiI4nH avGA== MIME-Version: 1.0 X-Received: by 10.180.211.235 with SMTP id nf11mr24573956wic.52.1425570798881; Thu, 05 Mar 2015 07:53:18 -0800 (PST) Received: by 10.28.186.135 with HTTP; Thu, 5 Mar 2015 07:53:18 -0800 (PST) In-Reply-To: References: Date: Thu, 5 Mar 2015 07:53:18 -0800 Message-ID: Subject: Re: Bulk-loader performance From: Nick Dimiduk To: "dev@phoenix.apache.org" Content-Type: multipart/alternative; boundary=001a11c3891011d29805108c91de X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3891011d29805108c91de Content-Type: text/plain; charset=UTF-8 Also: how large is your cluster? You can make things go faster by increasing the number of mappers. What changes did you make to the map() method? Increased logging, performance enhancements, plugging in custom logic, something else? On Thursday, March 5, 2015, Gabriel Reid wrote: > Hi Tulasi, > > Answers (and questions) inlined below: > > On Thu, Mar 5, 2015 at 2:41 AM Tulasi Paradarami < > tulasi.krishna.p@gmail.com > > wrote: > > > Hi, > > > > Here are the details of our environment: > > Phoenix 4.3 > > HBase 0.98.6 > > > > I'm loading data to a Phoenix table using the csv bulk-loader (after > making > > some changes to the map(...) method) and it is processing about 16,000 - > > 20,000 rows/sec. I noticed that the bulk-loader spends upto 40% of the > > execution time in the following steps. > > > > //... > > csvRecord = csvLineParser.parse(value.toString()); > > csvUpsertExecutor.execute(ImmutableList.of(csvRecord)); > > Iterator>> uncommittedDataIterator = > > PhoenixRuntime.getUncommittedDataIterator(conn, true); > > //... > > > > The non-code translation of those steps is: > 1. Parse the CSV record > 2. Convert the contents of the CSV record into KeyValues > > Although it may look as though data is being written over the wire to > Phoenix, the execution of an upsert executor and retrieval of the > uncommitted KeyValues is all local (in memory). The code is implemented in > this way because JDBC is the general API used within Phoenix -- there isn't > direct "convert fields to Phoenix encoding" API, although this is doing the > equivalent operation. > > Could you give some more information on your performance numbers? For > example, is this the throughput that you're getting in a single process, or > over a number of processes? If so, how many processes? Also, how many > columns are in the records that you're loading? > > > > > > We plan to load up-to 100TB of data and overall performance of the > > bulk-loader is not satisfactory. > > > > How many records are in that 100TB? What is the current (projected) time > required to load the data? What is the minimum allowable ingest speed to be > considered satisfactory? > > - Gabriel > --001a11c3891011d29805108c91de--