flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: [Gelly] Help with GSA compiler tests
Date Wed, 15 Jul 2015 12:07:47 GMT
Hi,

thank you Stephan!

Here's the missing part of the plan: http://i.imgur.com/N861tg1.png
There is one hash partition / sort. Is this what you're talking about?

Regarding your second point, how can I test if the data is known to be
partitioned at the end?


-Vasia.

On 15 July 2015 at 13:13, Stephan Ewen <sewen@apache.org> wrote:

> Hey Vasia!
>
> Sorry for the late response... Thanks for pinging again!
>
> The optimizer is acting a little funky here - seems an artifact of the
> "properties" optimization.
>
>   -> The initial join needs to be partitioned and sorted. Can you check
> whether one partitioning and sorting happens before the iteration? That
> part is cut off in the screenshot sou sent. It must be either on the input
> of the iteration, of the output.
>
>   -> The iteration needs to make sure it leaves the data partitioned and
> sorted. There is a "re-sorting" operator at the end ("Rebuild Workset
> Properties"), but it does not partition. The test should make sure the data
> is known to be partitioned at the very end of the iteration (after the
> "Rebuild Workset Properties" operator). This is probably true, if the join
> has some forward field annotation.
>
> We can have a quick skype chat later, if you have more questions...
>
> Greetings,
> Stephan
>
>
>
> On Wed, Jul 15, 2015 at 12:08 PM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com> wrote:
>
> > Hey,
> >
> > any input on this? or a hint? or where to look to figure this out by
> > myself?
> >
> > Thanks!
> > -Vasia.
> >
> > On 7 July 2015 at 15:20, Vasiliki Kalavri <vasilikikalavri@gmail.com>
> > wrote:
> >
> > > Hello to my squirrels,
> > >
> > > I've started looking into FLINK-1943
> > > <https://issues.apache.org/jira/browse/FLINK-1943> and I need some
> help
> > > to understand what to test and how to do it properly.
> > >
> > > In the corresponding Spargel compiler test, the following functionality
> > is
> > > checked:
> > >
> > > 1. sink: the ship strategy is FORWARD and the parallelism is correct
> > > 2. iteration: degree of parallelism
> > > 3. solution set join: parallelism and input1 ship strategy is
> > > PARTITION_HASH
> > > 4. workset join: parallelism, input1 (edges) ship strategy is
> > > PARTITION_HASH and cached, input2 (workset) ship strategy is FORWARD
> > > 5. check that the initial partitioning is pushed out of the loop
> > > 6. check that the initial workset sort is outside the loop
> > >
> > > I have been able to verify 1-4 of the above for the GSA iteration plan,
> > > but I'm not sure how to check (5) and (6) or whether they are expected
> to
> > > hold in the GSA case.
> > >
> > > In [1] you can see what the GSA iteration operators looks like and in
> [2]
> > > you can see what the visualizer tools generates the GSA connected
> > > components.
> > >
> > > Any pointers would be greatly appreciated!
> > >
> > > Cheers,
> > > Vasia.
> > >
> > > [1]:
> > >
> >
> https://docs.google.com/drawings/d/1tiNQeOphWtkNXTGlnDJ3Ipanh0Tm2R8sHe8XNyTnf98/edit?usp=sharing
> > > [2]: http://imgur.com/GQZ48ZI
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message