spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <>
Subject Re: Secondary Indexing?
Date Mon, 30 May 2016 18:00:40 GMT

have you tried using partitioning and parquet format. It works super fast


On Mon, May 30, 2016 at 5:08 PM, Michael Segel <>

> I’m not sure where to post this since its a bit of a philosophical
> question in terms of design and vision for spark.
> If we look at SparkSQL and performance… where does Secondary indexing fit
> in?
> The reason this is a bit awkward is that if you view Spark as querying
> RDDs which are temporary, indexing doesn’t make sense until you consider
> your use case and how long is ‘temporary’.
> Then if you consider your RDD result set could be based on querying
> tables… and you could end up with an inverted table as an index… then
> indexing could make sense.
> Does it make sense to discuss this in user or dev email lists? Has anyone
> given this any thought in the past?
> Thx
> -Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message