spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Allman <mich...@videoamp.com>
Subject Re: window every n elements instead of time based
Date Wed, 08 Oct 2014 05:19:03 GMT
Hi Andrew,

The use case I have in mind is batch data serialization to HDFS, where sizing files to a certain
HDFS block size is desired. In my particular use case, I want to process 10GB batches of data
at a time. I'm not sure this is a sensible use case for spark streaming, and I was trying
to test it. However, I had trouble getting it working and in the end I decided it was more
trouble than it was worth. So I decided to split my task into two: one streaming job on small,
time-defined batches of data, and a traditional Spark job aggregating the smaller files into
a larger whole. In retrospect, I think this is the right way to go, even if a count-based
window specification was possible. Therefore, I can't suggest my use case for a count-based
window size.

Cheers,

Michael

On Oct 5, 2014, at 4:03 PM, Andrew Ash <andrew@andrewash.com> wrote:

> Hi Michael,
> 
> I couldn't find anything in Jira for it -- https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming
> 
> Could you or Adrian please file a Jira ticket explaining the functionality and maybe
a proposed API?  This will help people interested in count-based windowing to understand the
state of the feature in Spark Streaming.
> 
> Thanks!
> Andrew
> 
> On Fri, Oct 3, 2014 at 4:09 PM, Michael Allman <michael@videoamp.com> wrote:
> Hi,
> 
> I also have a use for count-based windowing. I'd like to process data
> batches by size as opposed to time. Is this feature on the development
> roadmap? Is there a JIRA ticket for it?
> 
> Thank you,
> 
> Michael
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/window-every-n-elements-instead-of-time-based-tp2085p15701.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
> 


Mime
View raw message