flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: [Gelly] Help with GSA compiler tests
Date Wed, 15 Jul 2015 12:27:08 GMT
Lady Kalamari,

The plan looks good.

To test whether the data is partitioned there: If you have the optimizer
plan, make sure the global properties have a partitioning property of
"PATITIONED_HASH".

Thanks,
Stephan


On Wed, Jul 15, 2015 at 2:07 PM, Vasiliki Kalavri <vasilikikalavri@gmail.com
> wrote:

> Hi,
>
> thank you Stephan!
>
> Here's the missing part of the plan: http://i.imgur.com/N861tg1.png
> There is one hash partition / sort. Is this what you're talking about?
>
> Regarding your second point, how can I test if the data is known to be
> partitioned at the end?
>
>
> -Vasia.
>
> On 15 July 2015 at 13:13, Stephan Ewen <sewen@apache.org> wrote:
>
> > Hey Vasia!
> >
> > Sorry for the late response... Thanks for pinging again!
> >
> > The optimizer is acting a little funky here - seems an artifact of the
> > "properties" optimization.
> >
> >   -> The initial join needs to be partitioned and sorted. Can you check
> > whether one partitioning and sorting happens before the iteration? That
> > part is cut off in the screenshot sou sent. It must be either on the
> input
> > of the iteration, of the output.
> >
> >   -> The iteration needs to make sure it leaves the data partitioned and
> > sorted. There is a "re-sorting" operator at the end ("Rebuild Workset
> > Properties"), but it does not partition. The test should make sure the
> data
> > is known to be partitioned at the very end of the iteration (after the
> > "Rebuild Workset Properties" operator). This is probably true, if the
> join
> > has some forward field annotation.
> >
> > We can have a quick skype chat later, if you have more questions...
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri <
> > vasilikikalavri@gmail.com> wrote:
> >
> > > Hey,
> > >
> > > any input on this? or a hint? or where to look to figure this out by
> > > myself?
> > >
> > > Thanks!
> > > -Vasia.
> > >
> > > On 7 July 2015 at 15:20, Vasiliki Kalavri <vasilikikalavri@gmail.com>
> > > wrote:
> > >
> > > > Hello to my squirrels,
> > > >
> > > > I've started looking into FLINK-1943
> > > > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some
> > help
> > > > to understand what to test and how to do it properly.
> > > >
> > > > In the corresponding Spargel compiler test, the following
> functionality
> > > is
> > > > checked:
> > > >
> > > > 1. sink: the ship strategy is FORWARD and the parallelism is correct
> > > > 2. iteration: degree of parallelism
> > > > 3. solution set join: parallelism and input1 ship strategy is
> > > > PARTITION_HASH
> > > > 4. workset join: parallelism, input1 (edges) ship strategy is
> > > > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
> > > > 5. check that the initial partitioning is pushed out of the loop
> > > > 6. check that the initial workset sort is outside the loop
> > > >
> > > > I have been able to verify 1-4 of the above for the GSA iteration
> plan,
> > > > but I'm not sure how to check (5) and (6) or whether they are
> expected
> > to
> > > > hold in the GSA case.
> > > >
> > > > In [1] you can see what the GSA iteration operators looks like and in
> > [2]
> > > > you can see what the visualizer tools generates the GSA connected
> > > > components.
> > > >
> > > > Any pointers would be greatly appreciated!
> > > >
> > > > Cheers,
> > > > Vasia.
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
> > > > [2]: http://imgur.com/GQZ48ZI
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message