Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1403A200D4B for ; Mon, 27 Nov 2017 23:52:13 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 12A20160C13; Mon, 27 Nov 2017 22:52:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 57BB9160BFA for ; Mon, 27 Nov 2017 23:52:12 +0100 (CET) Received: (qmail 51881 invoked by uid 500); 27 Nov 2017 22:52:06 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 51872 invoked by uid 99); 27 Nov 2017 22:52:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Nov 2017 22:52:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A6E4318071A for ; Mon, 27 Nov 2017 22:52:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id FU_-WCV7vuFi for ; Mon, 27 Nov 2017 22:52:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 981A35F36B for ; Mon, 27 Nov 2017 22:52:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 378BEE25A8 for ; Mon, 27 Nov 2017 22:52:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 13B67241D7 for ; Mon, 27 Nov 2017 22:52:01 +0000 (UTC) Date: Mon, 27 Nov 2017 22:52:01 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-3247) Sample.any memory constraint MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 27 Nov 2017 22:52:13 -0000 [ https://issues.apache.org/jira/browse/BEAM-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267735#comment-16267735 ] ASF GitHub Bot commented on BEAM-3247: -------------------------------------- jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance URL: https://github.com/apache/beam/pull/4175#discussion_r153348529 ########## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java ########## @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder builder) { } /** - * A {@link DoFn} that returns up to limit elements from the side input PCollection. + * A {@link DoFn} that outputs up to limit elements. */ - private static class SampleAnyDoFn extends DoFn { - long limit; - final PCollectionView> iterableView; + private static class SampleAnyDoFn extends DoFn { Review comment: Oh sorry, I looked in the wrong place in the code. Yeah, this implementation is wrong: the main element (null in this case) is in the global window, so of course it's not allowed to access a side input that is windowed non-globally, because it's ambiguous which window should be accessed. It plays even more poorly with triggering. See https://beam.apache.org/documentation/programming-guide/#side-inputs-and-windowing . All the more reason to fix this! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org > Sample.any memory constraint > ---------------------------- > > Key: BEAM-3247 > URL: https://issues.apache.org/jira/browse/BEAM-3247 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Affects Versions: 2.1.0 > Reporter: Neville Li > Assignee: Neville Li > Priority: Minor > > Right now {{Sample.any}} converts the collection to an iterable view and take first n in a side input. This may require materializing the entire collection to disk and is potentially inefficient. > https://github.com/apache/beam/blob/v2.1.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java#L74 > It can be fixed by applying a truncating `DoFn` first, then a combine into `List` which limits the list size, and finally flattening the list. -- This message was sent by Atlassian JIRA (v6.4.14#64029)