apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJAY GUPTA <ajaygit...@gmail.com>
Subject Re: Sort Accumulation
Date Thu, 09 Mar 2017 06:54:18 GMT
Hi Bright,

I couldnot completely understand the bucketing approach you mentioned. How
would we bucket the data considering we have no idea what the data will be?

How about using a TreeMultiSet?


Thanks,
Ajay

On Wed, Mar 8, 2017 at 11:24 PM, Bright Chen <bright@datatorrent.com> wrote:

> Hi Ajay,
> I think sort at getOutput() probably will get this method stuck due to very
> high volume of computation.
> And as we still need to persistent the data, it will not very helpful to
> increase the performance of processing tuple. Probably we can bucket the
> data with range of value. Such as following:
> - process tuple in one window: sort data of current window in memory
> - end window: merge the sorted memory data into buckets.
>
> thanks
> Bright
>
> On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA <ajaygit158@gmail.com> wrote:
>
> > Hi Thomas,
> >
> > I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar
> > approach for Sort will lead to an O(n^2) complexity.
> > Since we have to sort all elements, we can do it in a single sort call in
> > getOutput().
> >
> >
> > On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise <thw@apache.org> wrote:
> >
> > > Look at the existing topN accumulation. It should be a generalization,
> > > where you don't have a limit.
> > >
> > >
> > > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA <ajaygit158@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose the Sort Accumulation. The accumulation will
> be
> > > > responsible for sorting the input POJO stream. The accumulation will
> > > > require a comparator to compare and sort the input tuples. Another
> > > boolean
> > > > parameter "sortDesc" will be used to decide sorting order.
> > > >
> > > > Let me know your views.
> > > >
> > > > Thanks,
> > > > Ajay
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message