From user-return-1249-archive-asf-public=cust-asf.ponee.io@kudu.apache.org Mon Feb 5 05:01:27 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 48E4318064A for ; Mon, 5 Feb 2018 05:01:27 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3866D160C59; Mon, 5 Feb 2018 04:01:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5B2A0160C41 for ; Mon, 5 Feb 2018 05:01:26 +0100 (CET) Received: (qmail 4532 invoked by uid 500); 5 Feb 2018 04:01:25 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 4517 invoked by uid 99); 5 Feb 2018 04:01:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Feb 2018 04:01:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 390C418030A for ; Mon, 5 Feb 2018 04:01:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 4AVWHBnIBQkg for ; Mon, 5 Feb 2018 04:01:23 +0000 (UTC) Received: from mail-ua0-f176.google.com (mail-ua0-f176.google.com [209.85.217.176]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id EDEB65F4A9 for ; Mon, 5 Feb 2018 04:01:22 +0000 (UTC) Received: by mail-ua0-f176.google.com with SMTP id e25so17845247uan.5 for ; Sun, 04 Feb 2018 20:01:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=P3NsBlAIewx7KZZeZIyZit+B6OSnCb1/MVipWfSuaVc=; b=VsFYWwyDehzqbn/OBNgrYyA7P3EFIOtoFCOw2YN5evrQ/a/JMLm/HIsUkpKyMSAVBk KtVl+ivo5uPqkShh4MwZZOSEvBepiL+Tje1xA9e0WsK5c4rgQctSAEyLfMmuhb4erEx/ Cq4T44noCgERqqBMDE8sMQygOro2AcvaVmUBgWrQCn9w2Zt6fhD3JPvqseC+Dl+Z0lz3 LvEkOMnmU+G1zXWUfdVOpg5R7vcOqzH8qvUgfVdInU+sVfG1fC7wYKQy/lPqP5Un2ik9 Iquu6u4CS+oVKFoCjYhOGmJq/8h9LE7WzmaniMPbT+5PUjK4qP++SqSWQVDIpEOo1WKc Q9ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=P3NsBlAIewx7KZZeZIyZit+B6OSnCb1/MVipWfSuaVc=; b=W2P8W7kGctj4FnPaDb693wZpGrNQEPYFGL8yAkSmizckd8VTs1QwL0BkB3EjrZqfvS xj/gVSCuA1oEGE9+jOY+0sR5LoRSmiC5DGq6bz4bGYv/rG54FTEsuCnG/Oszc0eYlmHx PaKpadIkj510eMA9ntXI/FUmDiwmrxi4o3s2XleOS8yVaR8LDQ6ojqM7OT6aJtFln3WF sw2CAYu+GJpiWr6ECnsATqx11du05Yh1ZYfFBy+6qi+ceULN8t5hp6M64Y/lRvA3I8/r grMqs0hRUrVamJxkbLvQlpXciiDWsw9bM6FgYjia99GNdfnPnBrRbIrcz91zNC6Or3qp gWqw== X-Gm-Message-State: AKwxytd6gEmVH/Pup8d+yubeHIZQlMtDaVnZQFRPxJo4BOl6aoApz3Cy 5mg7A9wCvcZsAr/TgjB7fB+QVRhalvw71A4wXHZDP0CD X-Google-Smtp-Source: AH8x226o1TqQgjHVL1UcuLu3sM960Ae8flSR9IwnjAqGGxnSglIi1e8Zn/vsjV/zTZhLyiGpdh+oHFruK4G9dEXM4cc= X-Received: by 10.176.66.97 with SMTP id i88mr10012682uai.127.1517803281771; Sun, 04 Feb 2018 20:01:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.72.115 with HTTP; Sun, 4 Feb 2018 20:01:01 -0800 (PST) In-Reply-To: References: From: Todd Lipcon Date: Sun, 4 Feb 2018 20:01:01 -0800 Message-ID: Subject: Re: Using Kudu to Handle Huge amount of Data To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="94eb2c06d416731a9e05646f1dce" --94eb2c06d416731a9e05646f1dce Content-Type: text/plain; charset="UTF-8" Hi JP, Answers inline... On Thu, Feb 1, 2018 at 9:45 PM, Jp Gupta wrote: > Hi, > As an existing HBase user, we handle close to 20TB of data everyday. > What does "handle" mean in this case? You are inserting 20TB of new data each day, so that your total dataset grows by that amount? How much data do you retain? How many nodes is your cluster? (I would guess many hundred?) > > While we are contemplating on moving to Kudu to take advantage of the new > technology, I am yet to hear of an real industry use case where Kudu is > being to used to handle of huge amount of data. > If you are seeing Kudu as an "improved HBase" that isn't really accurate. Of course there are some things we can do better than HBase, but there are some things HBase can do better than Kudu. As for Kudu data sizes, I am aware of some organizations storing several hundred TB in a Kudu cluster, but I have not yet heard of a use case with 1PB+. If you are looking to run at that scale you may hit some issues, but we are standing ready to help you overcome them. I don't see any fundamental problems that would prevent it, and I have run some basic smoke tests of Kudu on ~800 nodes before. > > Looking forward to your inputs on any organisation using Kudu where data > volumes of more than 10 TB is ingested everyday. > Hope some other users can chime in. -Todd -- Todd Lipcon Software Engineer, Cloudera --94eb2c06d416731a9e05646f1dce Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi JP,

Answers inline...

On Thu, Feb 1, 2018 at 9:45= PM, Jp Gupta <newlifejpg@gmail.com> wrote:
Hi,
As an existing HBase user, we= handle close to 20TB of data everyday.=C2=A0
=
What does "handle" mean in this case? You are inse= rting 20TB of new data each day, so that your total dataset grows by that a= mount? How much data do you retain? How many nodes is your cluster? (I woul= d guess many hundred?)
=C2=A0

While we are contemplating on moving = to Kudu to take advantage of the new technology, I am yet to hear of an rea= l industry use case where Kudu is being to used to handle of=C2=A0 huge amo= unt of data.=C2=A0

If you are s= eeing Kudu as an "improved HBase" that isn't really accurate.= Of course there are some things we can do better than HBase, but there are= some things HBase can do better than Kudu.

As for= Kudu data sizes, I am aware of some organizations storing several hundred = TB in a Kudu cluster, but I have not yet heard of a use case with 1PB+. If = you are looking to run at that scale you may hit some issues, but we are st= anding ready to help you overcome them. I don't see any fundamental pro= blems that would prevent it, and I have run some basic smoke tests of Kudu = on ~800 nodes before.
=C2=A0
=

Looking forward to your inputs on any = organisation using Kudu where data volumes of more than 10 TB is ingested e= veryday.

Hope some other users = can chime in.

-Todd
--
Todd Lipcon
Soft= ware Engineer, Cloudera
--94eb2c06d416731a9e05646f1dce--