flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject Re: Tuple
Date Tue, 04 Aug 2015 12:51:36 GMT
I set parallelism of map to 4 (and I double checked, that the 4 mappers
are running on different machines). Furthermore, fromElements() source
has parallelism of 1. Thus, some data is going over the network for sure.


On 08/04/2015 02:31 PM, Chesnay Schepler wrote:
> i think this job would be chained completely and never do any
> serialization.
> 
> On 04.08.2015 14:25, Matthias J. Sax wrote:
>> Works for batch job, too. See enclosed.
>>
>> On 08/04/2015 01:34 PM, Matthias J. Sax wrote:
>>> Yes, that is was the program does. However, streaming is not lazy so
>>> deserialization should have happened.
>>>
>>> I will try a batch job, later today.
>>>
>>> On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
>>>> so I'm not to much into the streaming API, but as i see it this program
>>>> creates an infinite number of tuples and then counts them, right?
>>>>
>>>> The problem with serialization as i understand it is that the receiver
>>>> can't tell how many Tuple0 are sent, since you never actually read any
>>>> data when deserializing a tuple. it's even more likely that it's not
>>>> even attempted.
>>>>
>>>> As such, I'd be curious to see what happens when you create a batch job
>>>> that with a limited number of starting tuples.
>>>>
>>>> On 04.08.2015 13:08, Matthias J. Sax wrote:
>>>>> Hi,
>>>>>
>>>>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>>>>
>>>>> However, I was not able to "reproduce" serialization issues... I
>>>>> tested
>>>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>>>>> miss anything?
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>>>>> Thanks for the advice about Tuple0.
>>>>>>
>>>>>> I personally don't see any advantage in having "flink-tuple"
>>>>>> project. Do
>>>>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>>>>> idea the have too many too small projects.
>>>>>>
>>>>>>
>>>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>>>>> Tuple0 would need special serialization and comparator logic.
If
>>>>>>> that is
>>>>>>> given, I see no reason not to support it.
>>>>>>>
>>>>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>>>>> project, that
>>>>>>> only contains the tuple classes. Any opinions on that?
>>>>>>>
>>>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>
>>>>>>>> Thanks for the explanation!
>>>>>>>>
>>>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>>>>> And I
>>>>>>>> guess I will need it for Storm compatibility layer, too.
(I need to
>>>>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>>>>
>>>>>>>> With regard to the information I collected during the discussion,
I
>>>>>>>> vote
>>>>>>>> for keeping Tuple0 in Flink core, and fix the serialization
>>>>>>>> problem.
>>>>>>>> Should we have another JIRA for this? Or should I extend
the
>>>>>>>> existing
>>>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>>>>> First of all, it was a really good idea to start a discussion
>>>>>>>>> about this.
>>>>>>>>>
>>>>>>>>> So the general idea behind Tuple0 was this:
>>>>>>>>>
>>>>>>>>> The Python API maps python tuples to flink tuples. Python
can have
>>>>>>>>> empty
>>>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!".
What
>>>>>>>>> i did
>>>>>>>>> not wanna do is create some non-Tuple object to represent
empty
>>>>>>>>> tuples,
>>>>>>>>> I'd rather have them treated the same, because it's less
work and
>>>>>>>>> creates simpler code.
>>>>>>>>>
>>>>>>>>> When transferring the plan to java, certain parameters
for
>>>>>>>>> operations
>>>>>>>>> are tuples, which can be empty aswell.
>>>>>>>>> This is where the Tuple0 class is really useful, because
these
>>>>>>>>> empty
>>>>>>>>> tuples go through the same logic as other tuples.
>>>>>>>>> This is also why i want to keep the class, at least in
the python
>>>>>>>>> project, for now.
>>>>>>>>>
>>>>>>>>> For the actual program execution, I need a new solution.
Funny
>>>>>>>>> story,
>>>>>>>>> while writing this reply i noticed that the Python API
can't
>>>>>>>>> handle
>>>>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>>>>
>>>>>>>>> Guess I now know what I'm working on next.
>>>>>>>>>
>>>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>>>>> Can you elaborate how and why Python used Tuple0?
If it cannot be
>>>>>>>>>> serialized similar to regular Tuples, what is the
usage in
>>>>>>>>>> Python? Right
>>>>>>>>>> now it seems, as there is no special serialization
code for
>>>>>>>>>> Tuple0.
>>>>>>>>>>
>>>>>>>>>> I just want to understand the topic in detail.
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>>>>> I think a Tuple0 cannot be implemented like the
current
>>>>>>>>>>> tuples, at
>>>>>>>> least
>>>>>>>>>>> with respect to runtime serialization.
>>>>>>>>>>>
>>>>>>>>>>> The system makes the assumption that it makes
progress in
>>>>>>>>>>> consuming
>>>>>>>>>>> bytes
>>>>>>>>>>> when deserializing values. If a Tuple= never
consumes data
>>>>>>>>>>> from the
>>>>>>>> byte
>>>>>>>>>>> stream, this assumption is broken. It would need
at least one
>>>>>>>>>>> marker
>>>>>>>>>>> byte.
>>>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing
itself as a
>>>>>>>>>>> tuple0.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax
<
>>>>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I just double checked. Scala does not have
type Tuple0. IMHO,
>>>>>>>>>>>> it would
>>>>>>>>>>>> be best to remove Tuple0 for consistency.
Having Tuple types is
>>>>>>>>>>>> for
>>>>>>>>>>>> consistency reason with Scala in the first
place, right? Please
>>>>>>>>>>>> give
>>>>>>>>>>>> feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>> I see.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that it might be useful to have
Tuple0, because in
>>>>>>>>>>>>> rare
>>>>>>>> cases,
>>>>>>>>>>>>> you only want to "notify" a downstream
operators (taking about
>>>>>>>>>>>>> streaming) that something happened but
there is no actual data
>>>>>>>>>>>>> to be
>>>>>>>>>>>>> processed. Furthermore, if Flink cannot
deal with Tuple0 it
>>>>>>>>>>>>> should be
>>>>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler
wrote:
>>>>>>>>>>>>>> also, I'm not sure if I ever sent
a Tuple0 through a
>>>>>>>>>>>>>> program, it
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler
wrote:
>>>>>>>>>>>>>>> there's no specific reason. it
was added fairly recently
>>>>>>>>>>>>>>> by me
>>>>>>>>>>>>>>> (mid of
>>>>>>>>>>>>>>> april), and you're most likely
the second person to use it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i didn't integrate into all our
tuple related stuff because,
>>>>>>>>>>>>>>> well,
>>>>>>>> i
>>>>>>>>>>>>>>> never thought anyone would actually
need it, so i saved
>>>>>>>>>>>>>>> myself the
>>>>>>>>>>>>>>> trouble.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is there any specific reason,
why Tuple.getTupleClass(int
>>>>>>>>>>>>>>>> arity)
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> not support arity zero? There
is a class Tuple0, but it
>>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...).
Is it a missing
>>>>>>>>>>>>>>>> feature (I
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
> 


Mime
View raw message