hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhan Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
Date Wed, 11 Nov 2015 22:20:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001233#comment-15001233
] 

Zhan Zhang commented on HBASE-14796:
------------------------------------

In theory I think performing the get in executors should have lower latency if piggyback with
other scans, because
1. The task serialization is minimum since it is piggybacked with other scan tasks. (Getting
from driver does not have this overhead)
2. executors have better data locality since get will be in the partition co-located with
region sever. (Driver does not have data locality and have to fetch data from remote)
3. we don't need to redistribute the get result to executors. (Driver has to parallelize the
record to executors for form RDD)

Definitely we should get some performance number.

> Provide an alternative spark-hbase SQL implementations for Gets
> ---------------------------------------------------------------
>
>                 Key: HBASE-14796
>                 URL: https://issues.apache.org/jira/browse/HBASE-14796
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Malaska
>            Assignee: Zhan Zhang
>            Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase from the driver
if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal operations in a
single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so moving the
work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and exceptions may
cause trouble with the Spark application and not just with the a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message