Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1D576200C0A for ; Sat, 28 Jan 2017 19:57:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1BD33160B51; Sat, 28 Jan 2017 18:57:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3CEF4160B33 for ; Sat, 28 Jan 2017 19:57:46 +0100 (CET) Received: (qmail 7959 invoked by uid 500); 28 Jan 2017 18:57:45 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 7947 invoked by uid 99); 28 Jan 2017 18:57:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jan 2017 18:57:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3BE0FC0D33 for ; Sat, 28 Jan 2017 18:57:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.101 X-Spam-Level: X-Spam-Status: No, score=-1.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hotmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id pqju7yyDw6vR for ; Sat, 28 Jan 2017 18:57:43 +0000 (UTC) Received: from BAY004-OMC1S20.hotmail.com (bay004-omc1s20.hotmail.com [65.54.190.31]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C058B5F2F0 for ; Sat, 28 Jan 2017 18:57:42 +0000 (UTC) Received: from NAM03-DM3-obe.outbound.protection.outlook.com ([65.54.190.61]) by BAY004-OMC1S20.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Sat, 28 Jan 2017 10:57:36 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=jMwQ5rNi5Oofb6Y3Z0p8oMu7PE061g/j81SYqafzMbE=; b=IspnQXItKSIJLTGAXCxOSxakXYM36zzZky349rviicqH5iS547TjkT9JKL2EVDs0n6mQGqdZ/6K/raHKZC5PPWnm1AMop3e91fOfwDdGrTKrhgHI6dVKhiJy1I5CzmV8Fidq/qlfRiCeho5ok/K5xWPkxlqPacMC1pEEMpp+lJKVgU/5LNu3wjS0DERd87ss/MRWaOWMEpYn9+KOE6eabLjJlHHsVxLPkr/NjgRL1weErxhaf23yfx/J15vLsSGbxOUT/4MKU5H6sEsfjS5pQHn373WkCKxZBpgI7uA5zaUokbLEV8SVd/em9knZLoxlNWWH3yC99MXciwlvlUevZA== Received: from CO1NAM03FT051.eop-NAM03.prod.protection.outlook.com (10.152.80.56) by CO1NAM03HT048.eop-NAM03.prod.protection.outlook.com (10.152.81.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.888.7; Sat, 28 Jan 2017 18:57:34 +0000 Received: from CY4PR14MB1352.namprd14.prod.outlook.com (10.152.80.53) by CO1NAM03FT051.mail.protection.outlook.com (10.152.80.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.888.7 via Frontend Transport; Sat, 28 Jan 2017 18:57:34 +0000 Received: from CY4PR14MB1352.namprd14.prod.outlook.com ([10.172.158.144]) by CY4PR14MB1352.namprd14.prod.outlook.com ([10.172.158.144]) with mapi id 15.01.0860.024; Sat, 28 Jan 2017 18:57:33 +0000 From: jeff saremi To: "user@hbase.apache.org" Subject: Re: Writing/Importing large number of records into HBase Thread-Topic: Writing/Importing large number of records into HBase Thread-Index: AQHSeRYiuvAq0wgJkkmsJr+a6RX1wqFNPqYAgAD/2vM= Date: Sat, 28 Jan 2017 18:57:33 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: hbase.apache.org; dkim=none (message not signed) header.d=none;hbase.apache.org; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:17C740FD748CE41A0F732078059BDA6C9A27B67E41DFC52A1AC823F0CCBD996A;UpperCasedChecksum:F1AF937009721F2F5A2AF415A8B56760C39D6F2DBCC31B9C8E5CB3234B8E6127;SizeAsReceived:7828;Count:39 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [dBDY1T28ep41tx1QHKuUc5qF275bbsE8] x-incomingheadercount: 39 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1;CO1NAM03HT048;7:yTztL+hLIopB++XdeAh38+PKAQyaaxIcvuWtJuAFBGm5e/Iol002K8S84//fwhv4t6RmGclPNL1kDvBOLHFM7dioCaE2XHbxF+eu+wEADdBkonZD3uCM05oN3PThO3SlqP8/Wbbizt0J8kfdPprNPu55P7+JlQZ1TznQYbO/GFnvdNskEJN0Y0BfdhDNoav/fD5enRouIcF5plz/fY5Gt5NT2uVWGtMqCNny2KCPe7B8gQ2Md6+X5RWbp6yx5r7WbFwARxSq2y7WNoElxdZelr5JnI4Rileu6XUIjgG+xDRTDfj81r+TXoKgHMCX8al16Wa7idTpJD3txYmisBCennHlrh1KMbzsYyjraWXXOKvMcdX7L7JCLOqDke1hfB7n9eHR7fZXvIAhZpFT0wAc8tesUxwJgrOGOau7b7dDHCN0nMLLleLzvGeE+hKkmF+w x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(10019020)(98900005);DIR:OUT;SFP:1102;SCL:1;SRVR:CO1NAM03HT048;H:CY4PR14MB1352.namprd14.prod.outlook.com;FPR:;SPF:None;LANG:en; x-ms-office365-filtering-correlation-id: 202e43b0-a987-49fd-5d94-08d447af84d2 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(1601124038)(5061506344)(5061507293)(1603103113)(1601125047)(1603101340)(1701031023);SRVR:CO1NAM03HT048; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(444111334)(444112120)(432015012)(82015046);SRVR:CO1NAM03HT048;BCL:0;PCL:0;RULEID:;SRVR:CO1NAM03HT048; x-forefront-prvs: 02015246A9 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_CY4PR14MB13527DE04CA6795CFC933507C1490CY4PR14MB1352namp_" MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jan 2017 18:57:33.8219 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1NAM03HT048 X-OriginalArrivalTime: 28 Jan 2017 18:57:36.0267 (UTC) FILETIME=[63BFCDB0:01D27998] archived-at: Sat, 28 Jan 2017 18:57:47 -0000 --_000_CY4PR14MB13527DE04CA6795CFC933507C1490CY4PR14MB1352namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable No iI had not.I will take a look. Thanks Ted ________________________________ From: Ted Yu Sent: Friday, January 27, 2017 7:41 PM To: user@hbase.apache.org Subject: Re: Writing/Importing large number of records into HBase Have you looked at hbase-spark module (currently in master branch) ? See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/dataso= urces/AvroSource.scala and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/DefaultSourceS= uite.scala for examples. There may be other options. FYI On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi wrote= : > Hi > I'm seeking some pointers/guidance on what we could do to insert billions > of records that we already have in avro files in hadoop into HBase. > > I read some articles online and one of them recommended using HFile > format. I took a cursory look at the documentation for that. Given the > complexity of that I think that may be the last resort we want to pursue. > Unless some library is out there that easily helps us write our files int= o > that format. I didn't see any. > Assuming that the Hbase native client may be our best bet, is there any > advice around pre-paritioning our records or such techniques that we coul= d > use? > thanks > > Jeff > --_000_CY4PR14MB13527DE04CA6795CFC933507C1490CY4PR14MB1352namp_--