flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject Re: Tuple
Date Tue, 04 Aug 2015 11:34:28 GMT
Yes, that is was the program does. However, streaming is not lazy so
deserialization should have happened.

I will try a batch job, later today.

On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
> so I'm not to much into the streaming API, but as i see it this program
> creates an infinite number of tuples and then counts them, right?
> The problem with serialization as i understand it is that the receiver
> can't tell how many Tuple0 are sent, since you never actually read any
> data when deserializing a tuple. it's even more likely that it's not
> even attempted.
> As such, I'd be curious to see what happens when you create a batch job
> that with a limited number of starting tuples.
> On 04.08.2015 13:08, Matthias J. Sax wrote:
>> Hi,
>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>> However, I was not able to "reproduce" serialization issues... I tested
>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>> miss anything?
>> -Matthias
>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>> Thanks for the advice about Tuple0.
>>> I personally don't see any advantage in having "flink-tuple" project. Do
>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>> idea the have too many too small projects.
>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>> Tuple0 would need special serialization and comparator logic. If
>>>> that is
>>>> given, I see no reason not to support it.
>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>> project, that
>>>> only contains the tuple classes. Any opinions on that?
>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>> Thanks for the explanation!
>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>> And I
>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>> With regard to the information I collected during the discussion, I
>>>>> vote
>>>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>>>> Should we have another JIRA for this? Or should I extend the existing
>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>> -Matthias
>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>> First of all, it was a really good idea to start a discussion
>>>>>> about this.
>>>>>> So the general idea behind Tuple0 was this:
>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>> empty
>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>> i did
>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>> tuples,
>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>> creates simpler code.
>>>>>> When transferring the plan to java, certain parameters for operations
>>>>>> are tuples, which can be empty aswell.
>>>>>> This is where the Tuple0 class is really useful, because these empty
>>>>>> tuples go through the same logic as other tuples.
>>>>>> This is also why i want to keep the class, at least in the python
>>>>>> project, for now.
>>>>>> For the actual program execution, I need a new solution. Funny story,
>>>>>> while writing this reply i noticed that the Python API can't handle
>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>> Guess I now know what I'm working on next.
>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot
>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>> Python? Right
>>>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>>> I just want to understand the topic in detail.
>>>>>>> -Matthias
>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>> I think a Tuple0 cannot be implemented like the current tuples,
>>>>> least
>>>>>>>> with respect to runtime serialization.
>>>>>>>> The system makes the assumption that it makes progress in
>>>>>>>> bytes
>>>>>>>> when deserializing values. If a Tuple= never consumes data
from the
>>>>> byte
>>>>>>>> stream, this assumption is broken. It would need at least
>>>>>>>> marker
>>>>>>>> byte.
>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself
as a tuple0.
>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>>>>> I just double checked. Scala does not have type Tuple0.
>>>>>>>>> it would
>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple
types is
>>>>>>>>> for
>>>>>>>>> consistency reason with Scala in the first place, right?
>>>>>>>>> give
>>>>>>>>> feedback.
>>>>>>>>> -Matthias
>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>> I see.
>>>>>>>>>> I think that it might be useful to have Tuple0, because
in rare
>>>>> cases,
>>>>>>>>>> you only want to "notify" a downstream operators
(taking about
>>>>>>>>>> streaming) that something happened but there is no
actual data
>>>>>>>>>> to be
>>>>>>>>>> processed. Furthermore, if Flink cannot deal with
Tuple0 it
>>>>>>>>>> should be
>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>> -Matthias
>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through
a program, it
>>>>>>>>>>> could
>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>> there's no specific reason. it was added
fairly recently by me
>>>>>>>>>>>> (mid of
>>>>>>>>>>>> april), and you're most likely the second
person to use it.
>>>>>>>>>>>> i didn't integrate into all our tuple related
stuff because,
>>>>>>>>>>>> well,
>>>>> i
>>>>>>>>>>>> never thought anyone would actually need
it, so i saved
>>>>>>>>>>>> myself the
>>>>>>>>>>>> trouble.
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>> arity)
>>>>>>>>>>>>> does
>>>>>>>>>>>>> not support arity zero? There is a class
Tuple0, but it
>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>> generator by Tuple.getTupleClass(...).
Is it a missing
>>>>>>>>>>>>> feature (I
>>>>>>>>> would
>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>> -Matthias

View raw message