kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Henke <ghe...@cloudera.com>
Subject Re: How to load kudu RDD with correct partitioner
Date Wed, 14 Nov 2018 15:57:17 GMT
Unfortunately, I am not sure of a simple way to provide the partitioner
information with the existing implementation. Currently the KuduRDD does
not override the RDD partitioner
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L141>,
though it probably could as an improvement.

Would you like to file a Kudu jira to track the work? Would you be
interested in contributing the improvement?

I am curious to know, how are you planning to use the knowledge of the
original Kudu partitioning and how is it useful to your Spark workflow?

Thanks,
Grant



On Wed, Nov 14, 2018 at 2:41 AM Dmitry Pavlov <dm.pavlov@inbox.ru> wrote:

> Hi guys
>
> I have a question about Kudu with Spark.
>
> For example there is a table in kudu with field record_id and following
> partitioning:
> HASH (record_id) PARTITIONS N
>
> Is it possible to load records from such table in key value fashion with
> correct partitioner information in RDD? For example RDD[(record_id, row)]
> Because when i try to use kuduRDD in spark the partitioner has None value
> so im losing information about original (kudu) partitioning.
>
> Thanks



-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Mime
View raw message