Hi Darshan,
You should be able to use Kudu as an additional store alongside HDFS and Phoenix. Your data scientists should be able to do joins across HDFS, HBase, and Kudu using Spark. You could also use Apache Impala (incubating) to do those joins, however Impala does not support accessing Phoenix, as far as I know.

You can also access Kudu from R if you go through rimpala: http://blog.cloudera.com/blog/2013/12/how-to-do-statistical-analysis-with-impala-and-r/ ... but I have never used R, myself.

Hope this helps!
Mike

On Wed, Aug 3, 2016 at 11:02 PM, Darshan Shah <dashah@tibco.com> wrote:

Following is our current architecture...

 

We have huge data residing in HDFS.. That we do not want to change.

 

With Impala select queries, we are taking that data and loading it in HBase, using Phoenix. Which is then used by data scientists to do analysis using R and Spark. 

 

Each data set creates new schemas and tables in hbase, so its fast for data scientists to do analysis...

 

 

We want to go for Kudu for obvious advantages in this space.

 

Can you tell me where can we fit it?


Thanks,

Darshan...