Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2D966200B21 for ; Thu, 12 May 2016 06:01:50 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2C50B160A18; Thu, 12 May 2016 04:01:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 72B87160A17 for ; Thu, 12 May 2016 06:01:49 +0200 (CEST) Received: (qmail 7770 invoked by uid 500); 12 May 2016 04:01:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 7757 invoked by uid 99); 12 May 2016 04:01:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 May 2016 04:01:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E6125C832F for ; Thu, 12 May 2016 04:01:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id DfrUcKXSJecg for ; Thu, 12 May 2016 04:01:44 +0000 (UTC) Received: from mail-yw0-f179.google.com (mail-yw0-f179.google.com [209.85.161.179]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 15CD65F2F2 for ; Thu, 12 May 2016 04:01:44 +0000 (UTC) Received: by mail-yw0-f179.google.com with SMTP id j74so62297683ywg.1 for ; Wed, 11 May 2016 21:01:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=dB1sNOH6M7Emeg+o5V7br7wurQ1OWMfCWKDtF7CvDgk=; b=EscLXS5MvlgIWnixlIMw6KYSIlbLepvAiCefKDOsFGKLouE18ZfKzpdZkxBSq5vdsB yTY3e4e5WKy+ap3UcVvrvoYLKFU9o92ZymTpl3jwTVfkE1uCFsxSy1pR/1wjhGX7Zj8g aJHxuPgY+5nXDmMU4EGu4xxw9RqIjcF4ri9WV+qWO0EUWws0MQHGhLyd2lHWGVST9lrx qUuhItG+KEAzwE4eyf0roG3Q2aHxiz7u2Yo2eMm3mxUxzy86ngDf2GXrg+ErHO8R0EVH lLufNKCPkYAgrRW/POSdCksYe0tzrkYyk9AJ0BV3IPyIixV2h41WSswFEeJyP1xOO1pu iwWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=dB1sNOH6M7Emeg+o5V7br7wurQ1OWMfCWKDtF7CvDgk=; b=TzM6gkWXYLGtN0gvTY5nDZ2XwGQVM8SqT880Ns+jQLIW0weOxLgZzyA0gBj9oQLCT9 7a2iY6DP3agx/1PbV7JK7mQ3MGJeDCxBGn2uOOgk5cpjZSfvsoXlei5uBwAbk1+PuNZF YvoQm1IOwy147Qqmxo9HcESipUXSaq5OMvpAvvmK3Mxv0NsFKiB3AJBsr1YSqtdFUWSt +JfUzdKElkO43l9jSzeOvi6Csy1nUWrtgMGGUgdX5NuJYczXA7ZS4ihoM2xbzK8dnNLh bSXwV6yoNhYGobsBUTyXFay3Vu7vrCh0AxMiGm7g39NgVPLd2ItXxuhhR3CQyXURm+sd bOpA== X-Gm-Message-State: AOPr4FUELIPF4+FSJM6aiA5jTtcbEHx6X90n6kMpb9riVrNbxqKWPY+zzlJOsA8dWmLb+Qc0xaGAGwRaUol9EA== MIME-Version: 1.0 X-Received: by 10.13.213.67 with SMTP id x64mr3852419ywd.261.1463025696711; Wed, 11 May 2016 21:01:36 -0700 (PDT) Received: by 10.37.37.142 with HTTP; Wed, 11 May 2016 21:01:36 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 May 2016 21:01:36 -0700 Message-ID: Subject: Re: Hbase scaning for couple Terabytes data From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a114fc14ef3598105329d36d3 archived-at: Thu, 12 May 2016 04:01:50 -0000 --001a114fc14ef3598105329d36d3 Content-Type: text/plain; charset=UTF-8 TableInputFormatBase is abstract. Most likely you would use TableInputFormat for the scan. See javadoc of getSplits(): * Calculates the splits that will serve as input for the map tasks. The * number of splits matches the number of regions in a table. FYI On Wed, May 11, 2016 at 6:05 PM, Yi Jiang wrote: > Hi, Guys > Recently we are debating the usage for hbase as our destination for data > pipeline job. > Basically, we want to save our logs into hbase, and our pipeline can > generate 2-4 terabytes data everyday, but our IT department think it is not > good idea to scan so hbase, it will cause the performance and memory issue. > And they ask our just keep 15 minutes data amount in the hbase for real > time analysis. > For now, I am using hive to external to hbase, but what I am thinking that > for map reduce job, what kind of mapper it is using to scan the data from > hbase? Is it TableInputFormatBase? and how many mapper it will use in hive > to scan the hbase. Is it efficient or not? Will it cause the performance > issue if we have couple T's or more larger data amount? > I am also trying to index some columns that we might use to query. But I > am not sure if it is good idea to keep so much history data in the hbase > for query. > Thank you > Jacky > > --001a114fc14ef3598105329d36d3--