apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bright Chen <bri...@datatorrent.com>
Subject Re: Sort Accumulation
Date Wed, 08 Mar 2017 17:54:41 GMT
Hi Ajay,
I think sort at getOutput() probably will get this method stuck due to very
high volume of computation.
And as we still need to persistent the data, it will not very helpful to
increase the performance of processing tuple. Probably we can bucket the
data with range of value. Such as following:
- process tuple in one window: sort data of current window in memory
- end window: merge the sorted memory data into buckets.

thanks
Bright

On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA <ajaygit158@gmail.com> wrote:

> Hi Thomas,
>
> I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar
> approach for Sort will lead to an O(n^2) complexity.
> Since we have to sort all elements, we can do it in a single sort call in
> getOutput().
>
>
> On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise <thw@apache.org> wrote:
>
> > Look at the existing topN accumulation. It should be a generalization,
> > where you don't have a limit.
> >
> >
> > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA <ajaygit158@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to propose the Sort Accumulation. The accumulation will be
> > > responsible for sorting the input POJO stream. The accumulation will
> > > require a comparator to compare and sort the input tuples. Another
> > boolean
> > > parameter "sortDesc" will be used to decide sorting order.
> > >
> > > Let me know your views.
> > >
> > > Thanks,
> > > Ajay
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message