Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 39221200BF6 for ; Tue, 27 Dec 2016 00:25:40 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 378EA160B3E; Mon, 26 Dec 2016 23:25:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 81923160B3B for ; Tue, 27 Dec 2016 00:25:39 +0100 (CET) Received: (qmail 10308 invoked by uid 500); 26 Dec 2016 23:25:38 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 10295 invoked by uid 99); 26 Dec 2016 23:25:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2016 23:25:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9547BC0301 for ; Mon, 26 Dec 2016 23:25:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.398 X-Spam-Level: ** X-Spam-Status: No, score=2.398 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id SXVzoeZ0EdJw for ; Mon, 26 Dec 2016 23:25:36 +0000 (UTC) Received: from mail-io0-f173.google.com (mail-io0-f173.google.com [209.85.223.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5610B5F4E7 for ; Mon, 26 Dec 2016 23:25:36 +0000 (UTC) Received: by mail-io0-f173.google.com with SMTP id h133so66725810ioe.3 for ; Mon, 26 Dec 2016 15:25:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=r0/PUxB/s8nbzFhwosClnbgkOD+ivR3LJOC8rc5122E=; b=bpgffieyrZ0l96vNKCl1ftNIuqtHdyDYFd3moIWkpc8bZmU41NMXVFgPModFgw+/vY pucLx+ujnAj7sB93ntfxEQccN096uUsWCl78sZYGPULmiD5gXCzmmV6st9mGSrIcsCgc ElZfC8gMa7iVva8AtgbIvCzYixwOKpMkMvMNYBuZkNe8gNvz2pURcMFqDpBf1Z5EUGLk mMpVAYY6PV0o+t4461z8v6CZpsxY9U78TktBet7Ykr4N6x+WA8ET2EiH1SDGwRcozzpi egjNiSEKMEl21HbOno5s5qG48YepHq+HUYo/2cq2RrkFOai5AeV1/ttDEizMXg9beJPM gHtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=r0/PUxB/s8nbzFhwosClnbgkOD+ivR3LJOC8rc5122E=; b=Y4+bNeCSVXdoYOKhwaf+apwhWi2R/DBLBzczUtHz9KNPRURsI0ccGNnoulv/aq68Pk 0aggeiFhl9bVZXuIEw0TSijMIgyvROulwdccDhHg2oRdSy7Xs/vPsOe9laYdDtHeoCOM WiKUGPkPaUP2Oe/OVJbTS6HP4tqHCmMtcRM9TC1K6vgRwXu4Ag8SmTYX6D9loaC4xpEi D69F7WuACASA4bQCy5zvhQBlAYzmgA7TcjPm/I5bLX68uPohS44TUeWtByO9EbfzObhW DAWbyeveRB1+NSmsfYSglqFFCLoWC0sllXVgxMU29n0h4JnyUcPduHC5DSBk7KfkJCIp tCXQ== X-Gm-Message-State: AIkVDXJkkubVKIHLLf5b9f3PTrE7t42LWxTp68DNLMZhEwbdj2nB87zPQf053oYfOuRnf4AFZ8xFGyHqsX4Rsg== X-Received: by 10.107.156.67 with SMTP id f64mr26237821ioe.173.1482794734150; Mon, 26 Dec 2016 15:25:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.8.225 with HTTP; Mon, 26 Dec 2016 15:25:13 -0800 (PST) In-Reply-To: References: From: Jerry He Date: Mon, 26 Dec 2016 15:25:13 -0800 Message-ID: Subject: Re: Approach: Incremental data load from HBASE To: dev Content-Type: multipart/alternative; boundary=001a1140fd94676bf70544980dcb archived-at: Mon, 26 Dec 2016 23:25:40 -0000 --001a1140fd94676bf70544980dcb Content-Type: text/plain; charset=UTF-8 There is no magic in the sqoop incremental import. You need a key column or timestamp column to let sqoop know where to start each incremental. HBase has built in timestamp. Please look at the hbase bundled MR tool. Export. https://hbase.apache.org/book.html#tools There are options that let you specify starttime and endtime. You can also write your own MR or Spark job to do incremental export of HBase data by providing timestamps to the Scan or providing filter. Jerry On Sat, Dec 24, 2016 at 7:24 PM, Chetan Khatri wrote: > Hello HBase Community, > > What is suggested approach for Incremental import from HBase to HDFS, like > RDBMS to HDFS Sqoop provides support with below script > > sqoop job --create myssb1 -- import --connect > jdbc:mysql://:/sakila --username admin --password admin > --driver=com.mysql.jdbc.Driver --query "SELECT address_id, address, > district, city_id, postal_code, alast_update, cityid, city, country_id, > clast_update FROM(SELECT a.address_id as address_id, a.address as address, > a.district as district, a.city_id as city_id, a.postal_code as postal_code, > a.last_update as alast_update, c.city_id as cityid, c.city as city, > c.country_id as country_id, c.last_update as clast_update FROM > sakila.address a INNER JOIN sakila.city c ON a.city_id=c.city_id) as sub > WHERE $CONDITIONS" --incremental lastmodified --check-column alast_update > --last-value 1900-01-01 --target-dir /user/cloudera/ssb7 --hive-import > --hive-table test.sakila -m 1 --hive-drop-import-delims --map-column-java > address=String > > > Thanks. > > On Wed, Dec 21, 2016 at 3:58 PM, Chetan Khatri < > chetan.opensource@gmail.com> > wrote: > > > Hello Guys, > > > > I would like to understand different approach for Distributed Incremental > > load from HBase, Is there any *tool / incubactor tool* which satisfy > > requirement ? > > > > *Approach 1:* > > > > Write Kafka Producer and maintain manually column flag for events and > > ingest it with Linkedin Gobblin to HDFS / S3. > > > > *Approach 2:* > > > > Run Scheduled Spark Job - Read from HBase and do transformations and > > maintain flag column at HBase Level. > > > > In above both approach, I need to maintain column level flags. such as 0 > - > > by default, 1-sent,2-sent and acknowledged. So next time Producer will > take > > another 1000 rows of batch where flag is 0 or 1. > > > > I am looking for best practice approach with any distributed tool. > > > > Thanks. > > > > - Chetan Khatri > > > --001a1140fd94676bf70544980dcb--