spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andy petrella <andy.petre...@gmail.com>
Subject Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?
Date Fri, 01 Aug 2014 23:15:31 GMT
Actually for click stream, the users space wouldn't be a continuum, unless
the order of users is important or the fact that they are coming in a kind
of order can be used by the algo.
The purpose of the break or binning function is to package things in a
cluster for which we know the properties, but we don't know in advance
which or how many elements it will contain.
However,  this would need to extend the notion of continuum I thought of,
to, indeed,  include categorical space and thus allowing groupBy mapping to
RDDs.
And actually,  there would be a way to fallback to a continuum if the
breaks function would be dictated by a trained model that can cluster the
users,  and they were previously and accordingly shuffled to form a
sequence where they come in batch.
Just thinking (and hardly trying to use a tablet to write it, man... How
unfriendly is this keyboard and small screen ☺)
Cheers
Andy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message