Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EB57E200C0A for ; Sat, 28 Jan 2017 19:57:15 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E9DBE160B51; Sat, 28 Jan 2017 18:57:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1AC28160B33 for ; Sat, 28 Jan 2017 19:57:14 +0100 (CET) Received: (qmail 4771 invoked by uid 500); 28 Jan 2017 18:57:13 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 4759 invoked by uid 99); 28 Jan 2017 18:57:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jan 2017 18:57:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4A1AC1A022C for ; Sat, 28 Jan 2017 18:57:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.09 X-Spam-Level: X-Spam-Status: No, score=-1.09 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hotmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id xL3VpBxEhXrf for ; Sat, 28 Jan 2017 18:57:10 +0000 (UTC) Received: from BAY004-OMC4S6.hotmail.com (bay004-omc4s6.hotmail.com [65.54.190.208]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 498D95F2F0 for ; Sat, 28 Jan 2017 18:57:09 +0000 (UTC) Received: from NAM03-DM3-obe.outbound.protection.outlook.com ([65.54.190.199]) by BAY004-OMC4S6.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Sat, 28 Jan 2017 10:56:20 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=88OYdwCfUPaYDHbH8skJQrepElJfkjL1qGrQx7UhLwA=; b=GWOs/nJYz3xavvOEErNkAUCH9rQ866Jg/uznea/y7NCEL0W41TFio1qk16Zx6Is4hhxvJdi4kMoqXImbmucCfOldYZACE9U397oxD36HQAWV3BwW8b74xT0PYFzgHogEiw/JlPjJFFemXPhns1N2fMYhcakqnxjLvzfrpCS5vc0+K8bM6XkpCVkf+bKo2iR0FwkAqlpAu0bmFg7y7Iexe5huWIrlBdl8M11QiTWysorUHaFQLfzl0yfI5Iha4lKHMzwe6z5WyVHhMfYVPA/mhdvuE3mZNsJsieqe+kxxQQ6VEc9uT9naWZNtKBFw7qtTwFpiNA+Q8G66HpCTcSUbug== Received: from CO1NAM03FT017.eop-NAM03.prod.protection.outlook.com (10.152.80.53) by CO1NAM03HT216.eop-NAM03.prod.protection.outlook.com (10.152.81.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.888.7; Sat, 28 Jan 2017 18:56:19 +0000 Received: from CY4PR14MB1352.namprd14.prod.outlook.com (10.152.80.54) by CO1NAM03FT017.mail.protection.outlook.com (10.152.80.172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.888.7 via Frontend Transport; Sat, 28 Jan 2017 18:56:19 +0000 Received: from CY4PR14MB1352.namprd14.prod.outlook.com ([10.172.158.144]) by CY4PR14MB1352.namprd14.prod.outlook.com ([10.172.158.144]) with mapi id 15.01.0860.024; Sat, 28 Jan 2017 18:56:18 +0000 From: jeff saremi To: "user@hbase.apache.org" Subject: Re: Writing/Importing large number of records into HBase Thread-Topic: Writing/Importing large number of records into HBase Thread-Index: AQHSeRYiuvAq0wgJkkmsJr+a6RX1wqFNPqYAgAABnYCAAALwgIAABO4AgAD2EoM= Date: Sat, 28 Jan 2017 18:56:18 +0000 Message-ID: References: <43F4BA97-4287-49A0-AF8C-8BFF93A754C8@gmail.com>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: hbase.apache.org; dkim=none (message not signed) header.d=none;hbase.apache.org; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:3160894044709653E008DFA3D39A89EF65AAE8F90AEF105E37E9F29D5AB3DED1;UpperCasedChecksum:46CA759671198715CFA12EF4A14CFCCE4EAC698C6FEBF777BBDBAFA24001A5A0;SizeAsReceived:8108;Count:39 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [vTxXCMSX9KUizm+Sb+EI/g9gkPrHM9df] x-incomingheadercount: 39 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1;CO1NAM03HT216;7:i6ekgoHO2dNatv2NG9PsMiEhRMhhvuObCXQ/JNfRdl11wN5RS2c0p4fCopW+KN33dYdBzNmI7EmpDd7wrpyJgXEPsgMAoqs12SHHkPMnpOfYJ3CyHP9vglMoIi/LJ1xIj6pcPPVV6f1Qhm+fo4GwMWhYLkyx6hpfDAPA2up1jTuk4Kjt/BMcSemX427DCAAiWdDUUAaUUoiqX5I+6+iBY9qKOETpEu48MZv+ZuLIoREL755S6HGhC43ee4XQmqyogIdWGPMKTqVnF3y5/nc7/vd6DbpxeRn8B5VQRkzaMAy4XNEOb4F3Dqcjz49tVB7VJNak3Egr6kJ5j7TomUyS6V+MYsMhYFq2LX4O3TQzmZydw7z4lLZjQ1LikCLr4O4ERH4O0UPl1Tf56RoMCKiSmqhl4Xe1vzbkS3/pI4Wk8yALUqDUssSXAyghswEXQwVgQYQtHSYU8K7g55GGvmV0Ew== x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(10019020)(98900005);DIR:OUT;SFP:1102;SCL:1;SRVR:CO1NAM03HT216;H:CY4PR14MB1352.namprd14.prod.outlook.com;FPR:;SPF:None;LANG:en; x-ms-office365-filtering-correlation-id: 71a806f4-65f5-4e33-ec2c-08d447af57f6 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(1601124038)(5061506344)(5061507270)(1603103113)(1601125047)(1603101340)(1701031023);SRVR:CO1NAM03HT216; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(444111334)(444112120)(432015012)(82015046);SRVR:CO1NAM03HT216;BCL:0;PCL:0;RULEID:;SRVR:CO1NAM03HT216; x-forefront-prvs: 02015246A9 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_CY4PR14MB1352E5473567F918E768D03DC1490CY4PR14MB1352namp_" MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jan 2017 18:56:18.5690 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1NAM03HT216 X-OriginalArrivalTime: 28 Jan 2017 18:56:20.0634 (UTC) FILETIME=[36AB1FA0:01D27998] archived-at: Sat, 28 Jan 2017 18:57:16 -0000 --_000_CY4PR14MB1352E5473567F918E768D03DC1490CY4PR14MB1352namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thank you Chetan ________________________________ From: Chetan Khatri Sent: Friday, January 27, 2017 8:15 PM To: user@hbase.apache.org Subject: Re: Writing/Importing large number of records into HBase Oh. Sorry. https://github.com/apache/hbase/blob/master/hbase-spark/src/main/java/org/a= pache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkPutExample.java [https://avatars1.githubusercontent.com/u/47359?v=3D3&s=3D400] hbase/JavaHBaseBulkPutExample.java at master =B7 apache ... github.com hbase - Mirror of Apache HBase ... Switch branches/tags. Branches; Tags On Sat, Jan 28, 2017 at 9:27 AM, Ted Yu wrote: > Chetan: > The link you posted was from personal repo. > > There hasn't been commit for at least a year. > > Meanwhile, the hbase-spark module in hbase repo is being actively > maintained. > > FYI > > > On Jan 27, 2017, at 7:47 PM, Chetan Khatri > wrote: > > > > Adding to @Ted Check Bulk Put Example - > > https://github.com/tmalaska/SparkOnHBase/blob/master/src/ [https://avatars3.githubusercontent.com/u/1946016?v=3D3&s=3D400] tmalaska/SparkOnHBase github.com Contribute to SparkOnHBase development by creating an account on GitHub. > main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/ > HBaseBulkPutExampleFromFile.scala > > > >> On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu wrote: > >> > >> Have you looked at hbase-spark module (currently in master branch) ? > >> > >> See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/ > >> example/datasources/AvroSource.scala > >> and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/ > >> DefaultSourceSuite.scala > >> for examples. > >> > >> There may be other options. > >> > >> FYI > >> > >> On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi > >> wrote: > >> > >>> Hi > >>> I'm seeking some pointers/guidance on what we could do to insert > billions > >>> of records that we already have in avro files in hadoop into HBase. > >>> > >>> I read some articles online and one of them recommended using HFile > >>> format. I took a cursory look at the documentation for that. Given th= e > >>> complexity of that I think that may be the last resort we want to > pursue. > >>> Unless some library is out there that easily helps us write our files > >> into > >>> that format. I didn't see any. > >>> Assuming that the Hbase native client may be our best bet, is there a= ny > >>> advice around pre-paritioning our records or such techniques that we > >> could > >>> use? > >>> thanks > >>> > >>> Jeff > >> > --_000_CY4PR14MB1352E5473567F918E768D03DC1490CY4PR14MB1352namp_--