spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Indexing of RDDs and DF in 2.0?
Date Tue, 17 May 2016 19:48:10 GMT
Hi, 

I saw a replay of a talk about what’s coming in Spark 2.0 and the speed performances…


I am curious about indexing of data sets. 
In HBase/MapRDB you can create ordered sets of indexes through an inverted table. 
Here, you can take the intersection of the indexes to find the result set of rows.  
(Or intersect/null if you have left outer joins…) 

AFAIK, there was a project on an indexedRDD, but not sure how far that had gone? 

I realize that some of the improvements are based on using hashed joins, which would make
indexing a bit harder… or am I missing something? 

Thx



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message