beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmet Altay (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-4518) Errors when running Python Game stats with a low fixed_window_duration in the DirectRunner
Date Wed, 03 Oct 2018 23:43:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ahmet Altay updated BEAM-4518:
------------------------------
    Fix Version/s:     (was: 2.8.0)

> Errors when running Python Game stats with a low fixed_window_duration in the DirectRunner
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-4518
>                 URL: https://issues.apache.org/jira/browse/BEAM-4518
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Ahmet Altay
>            Priority: Major
>
> Using the injector and the following command to start the DirectRunner pipeline:
> python -m apache_beam.examples.complete.game.game_stats \
> --project=google.com:clouddfe \
> --topic projects/google.com:clouddfe/topics/leader_board-$USER-topic-1 \
> --dataset ${USER}_test --fixed_window_duration 1
> Fails with:
> ValueError: PCollection of size 2 with more than one element accessed as a singleton
view. First two elements encountered are "13.93", "11.7777777778". [while running 'CalculateSpammyUsers/ProcessAndFilter']
> Offending code is here:
> global_mean_score = (
>  sum_scores
>  | beam.Values()
>  | beam.CombineGlobally(beam.combiners.MeanCombineFn())\
>  .as_singleton_view())
>  # Filter the user sums using the global mean.
>  filtered = (
>  sum_scores
>  # Use the derived mean total score (global_mean_score) as a side input.
>  | 'ProcessAndFilter' >> beam.Filter(
>  lambda key_score, global_mean:\
>  key_score[1] > global_mean * self.SCORE_WEIGHT,
>  global_mean_score))
> Since global_mean_score is the result of CombineGlobally, this is either an issue with CombineGlobally
or side inputs implementation in DirectRunner. The latter is more likely since it works on
DataflowRunner.
> cc: [~mariagh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message