flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Parallelizing DataStream operations on Array elements
Date Fri, 04 Nov 2016 17:59:24 GMT
Hi Daniel,

I'm not sure whether I grasp the whole problem, but can't you split the
vector up into the different rows, group by the row index and then apply
some kind of continuous aggregation or window function?

Maybe it helps if you can share some of your code with the community to
discuss the implementation.

Cheers,
Till

On Fri, Nov 4, 2016 at 5:45 PM, Daniel Suo <dsuo@cs.princeton.edu> wrote:

> Hello!
>
> I have a data source that emits Arrays that I collect into windows via
> countWindow. Rather than parallelize my subsequent operations by groups of
> these arrays, I'd like to parallelize my operations across the elements of
> the array (rows rather than columns, if you will) within each window.
>
> Some context: I'm attempting a time series analysis across some number of
> voxels. Each time step, I receive an Array of voxel data, but I'd like to
> analyze the voxels across time.
>
> It sounds like this approach mixes DataStream and DataSet concepts (where
> each window is a DataSet), which I know are not supported. Perhaps there is
> some other way to accomplish this task?
>
> Thanks!
> Daniel
>

Mime
View raw message