hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
Date Wed, 11 Nov 2015 23:07:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001301#comment-15001301
] 

Ted Malaska commented on HBASE-14796:
-------------------------------------

I agreed with point one.  But the use case I'm thinking about is one like this.

HBase table 100 million or a billion records (number does matter much, just make it a lot)

Then the select looks like this

Select * from hbase_table where rowkey = "foobar"

I can see this being very common not optimal but common.

> Provide an alternative spark-hbase SQL implementations for Gets
> ---------------------------------------------------------------
>
>                 Key: HBASE-14796
>                 URL: https://issues.apache.org/jira/browse/HBASE-14796
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Malaska
>            Assignee: Zhan Zhang
>            Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase from the driver
if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal operations in a
single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so moving the
work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and exceptions may
cause trouble with the Spark application and not just with the a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message