spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Secondary Indexing?
Date Mon, 30 May 2016 18:00:40 GMT
Hi,

have you tried using partitioning and parquet format. It works super fast
in SPARK.


Regards,
Gourav

On Mon, May 30, 2016 at 5:08 PM, Michael Segel <msegel_hadoop@hotmail.com>
wrote:

> I’m not sure where to post this since its a bit of a philosophical
> question in terms of design and vision for spark.
>
> If we look at SparkSQL and performance… where does Secondary indexing fit
> in?
>
> The reason this is a bit awkward is that if you view Spark as querying
> RDDs which are temporary, indexing doesn’t make sense until you consider
> your use case and how long is ‘temporary’.
> Then if you consider your RDD result set could be based on querying
> tables… and you could end up with an inverted table as an index… then
> indexing could make sense.
>
> Does it make sense to discuss this in user or dev email lists? Has anyone
> given this any thought in the past?
>
> Thx
>
> -Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message