hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject HBase + Spark join
Date Mon, 13 Mar 2017 12:12:09 GMT
Hi

I want to join a Spark RDD with an HBase table. Im familiar with the
different connectors available but couldn't find this functionality.

The idea I have is to first sort the RDD according to a byte[] key [1]
and rdd.mapPartitions so that I each partition contains a unique and
sequentially sorted range of keys that lines up with the key order in
HBase.

I should mention that the RDD will always contain almost all the keys
that are stored in HBase, so full tables scans are fine.

Unfortunately, Spark cannot sort native Java byte[]. And i'm also not
sure if mapPartitions really maintain the total sort order of the
original RDD.

Any suggestions?

Cheers,
-Kristoffer

[1] Guava UnsignedBytes.lexicographicalComparator

Mime
View raw message