Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 807D8200B9C for ; Mon, 10 Oct 2016 23:46:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7EEF2160AE1; Mon, 10 Oct 2016 21:46:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 940F4160AD1 for ; Mon, 10 Oct 2016 23:46:41 +0200 (CEST) Received: (qmail 34827 invoked by uid 500); 10 Oct 2016 21:46:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 34815 invoked by uid 99); 10 Oct 2016 21:46:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2016 21:46:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 91D03C28F4 for ; Mon, 10 Oct 2016 21:46:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id CFtuqM4Gegs1 for ; Mon, 10 Oct 2016 21:46:37 +0000 (UTC) Received: from mail-vk0-f51.google.com (mail-vk0-f51.google.com [209.85.213.51]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 760E95FB39 for ; Mon, 10 Oct 2016 21:46:36 +0000 (UTC) Received: by mail-vk0-f51.google.com with SMTP id b186so2457784vkb.1 for ; Mon, 10 Oct 2016 14:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=QPHuyhZiysXNfFSXt+JhZOQBfORuBCLR27DNlYBuOgc=; b=YPYRM5nyGAHaDgpYzUITB12ZnVtIOEDgnhWxtp4uu+Im1zzuUklg7dJ6YE+zhXQDHB eVakHkz7A3vxEXG2womk7jN5MDPWjnJR9QsAW/Ba+fbu/RaM1D2nQHq9QXG2gyZDKvbB F6yQYUmanJ6/LrbkCmy0ixgCSoqdYCD5nN3ojKGWC/ipeDwQN5fJmqbU3hKG+poKO98y l/5Kz9z2H9l3pPNs7V2yPMiW9voBVDID9umQ/2ugiupyAbH0gyKXsKyxe4LaNUv4QYjj noq8dYhsn7ygSCG57a0qeTHIJzKSjpF/b3z929KsSqGxcAilov7Xry81R6xKjTCjdj/M beZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=QPHuyhZiysXNfFSXt+JhZOQBfORuBCLR27DNlYBuOgc=; b=HOMy3+9Oj+YEveoX8/TwOL4y9JFQSupwXTPo1AjGdBBBlr4+R2B8RDjydcq2qIc+OH gYIX+GHPS1OLG4Iem9ei3kbe5DKK8fbpiiuWdc9MhGg2+YaBWBvYmiR0n4GmvbLUX/Ks 0eFNZ2SpJvMrXD68tPHQZiG0E/qEMd1SzI9GWHsvBFhQShIgfa0LwiQIandI3JuzCwYS cHO5L2ziuZ96yE6a4EU6W/e3oGU4txxvMjww0ErfvuVAbRoQMiRf0cvpyd9kOLB93Dpm 9u/DApdKiS6FWsjEiEowfwIcKR/fZQ9SayIA6+DOkCd4TzDUEII+rWpyxW5z8oIcTwUe Mfzg== X-Gm-Message-State: AA6/9RmjfG1T1mmq+BupZq4xBpKmkKyYwx6WzJJlUiHBFzMBlBYVcPR3tM/lfOcSBIhXyjyDDLvdVrgCjS2QZw== X-Received: by 10.31.146.131 with SMTP id u125mr414966vkd.16.1476135995356; Mon, 10 Oct 2016 14:46:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.188.198 with HTTP; Mon, 10 Oct 2016 14:46:34 -0700 (PDT) In-Reply-To: References: From: Mich Talebzadeh Date: Mon, 10 Oct 2016 22:46:34 +0100 Message-ID: Subject: Re: reading Hbase table in Spark To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1143b86aa4bbe4053e89b146 archived-at: Mon, 10 Oct 2016 21:46:42 -0000 --001a1143b86aa4bbe4053e89b146 Content-Type: text/plain; charset=UTF-8 Thanks Ted, So basically involves Java programming much like JDBC connection retrieval etc. Writing to Hbase is pretty fast. Now I have both views in Phoenix and Hive on the underlying Hbase tables. I am looking for flexibility here so I get I should use Spark on Hive tables with a view on Hbase table. Also I like tools like Zeppelin that work with both SQL and Spark Functional programming. Sounds like reading data from Hbase table is best done through some form of SQL. What are view on this approach? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 10 October 2016 at 22:13, Ted Yu wrote: > For org.apache.hadoop.hbase.client.Result, there is this method: > > public byte[] getValue(byte [] family, byte [] qualifier) { > > which allows you to retrieve value for designated column. > > > FYI > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh < > mich.talebzadeh@gmail.com> > wrote: > > > Hi, > > > > I am trying to do some operation on an Hbase table that is being > populated > > by Spark Streaming. > > > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view on > > Hbase etc. I also have Phoenix view on this Hbase table. > > > > This is sample code > > > > scala> val tableName = "marketDataHbase" > > > val conf = HBaseConfiguration.create() > > conf: org.apache.hadoop.conf.Configuration = Configuration: > > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, > > hbase-default.xml, hbase-site.xml > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName) > > scala> //create rdd > > scala> > > *val hBaseRDD = sc.newAPIHadoopRDD(conf, > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io > > .ImmutableBytesWritable],classOf[org.apache.hadoop. > > hbase.client.Result])*hBaseRDD: > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io. > > ImmutableBytesWritable, > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at > > newAPIHadoopRDD at :64 > > scala> hBaseRDD.count > > res11: Long = 22272 > > > > scala> // transform (ImmutableBytesWritable, Result) tuples into an > RDD > > of Result's > > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2) > > resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client. > Result] > > = MapPartitionsRDD[8] at map at :41 > > > > scala> // transform into an RDD of (RowKey, ColumnValue)s the RowKey > has > > the time removed > > > > scala> val keyValueRDD = resultRDD.map(result => > > (Bytes.toString(result.getRow()).split(" ")(0), > > Bytes.toString(result.value))) > > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] = > > MapPartitionsRDD[9] at map at :43 > > > > scala> keyValueRDD.take(2).foreach(kv => println(kv)) > > (000055e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528) > > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990) > > > > OK above I am only getting the rowkey (UUID above) and the last > > attribute (price). > > However, I have the rowkey and 3 more columns there in Hbase table! > > > > scan 'marketDataHbase', "LIMIT" => 1 > > ROW COLUMN+CELL > > 000055e2-63f1-4def-b625-e73f0ac36271 > > column=price_info:price, timestamp=1476133232864, > > value=43.89760813529593664528 > > 000055e2-63f1-4def-b625-e73f0ac36271 > > column=price_info:ticker, timestamp=1476133232864, value=S08 > > 000055e2-63f1-4def-b625-e73f0ac36271 > > column=price_info:timecreated, timestamp=1476133232864, > > value=2016-10-10T17:12:22 > > 1 row(s) in 0.0100 seconds > > So how can I get the other columns? > > > > Thanks > > > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > OABUrV8Pw>* > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > > The author will in no case be liable for any monetary damages arising > from > > such loss, damage or destruction. > > > --001a1143b86aa4bbe4053e89b146--