flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: combineGroup get false results
Date Thu, 22 Aug 2019 09:54:33 GMT

If all key fields are primitive types (long) or String, their hash values
should be deterministic.

There are two things that can go wrong:
1) Records are assigned to the wrong group.
2) The computation of a group is buggy.

I'd first check that 1) is correct.
Can you replace the sum function with a simple count and check if the
counts for each group are the same for p=1 and p=8?

Am Do., 22. Aug. 2019 um 11:45 Uhr schrieb anissa moussaoui <

> Hi Fabian,
> My GroupReduce function sum one column of input rows of each group.
> My key fields is array of multiple type, in this case is string and long.
> The result that i'm posting is just represents sampling of output dataset.
> Thank you in advance !
> Anissa
> Le jeu. 22 août 2019 à 11:24, Fabian Hueske <fhueske@gmail.com> a écrit :
>> Hi Anissa,
>> This looks strange. If I understand your code correctly, your GroupReduce
>> function is summing up a field.
>> Looking at the results that you posted, it seems as if there is some data
>> missing (the total sum does not seem to match).
>> For groupReduce it is important that the grouping keys are deterministic.
>> Since you provide a String array as key definition, there is no
>> KeyExtractor function.
>> However, something that can cause random results are key attributes with
>> random hash values.
>> What is the type of your key fields?
>> Another thing you might want to check is if the input (inputTable) to the
>> groupReduce function is the same with both parallelism settings.
>> Best, Fabian

View raw message