Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CD9AE200CF8 for ; Thu, 31 Aug 2017 06:09:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CBE4916A7FE; Thu, 31 Aug 2017 04:09:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1F33816A7FC for ; Thu, 31 Aug 2017 06:09:06 +0200 (CEST) Received: (qmail 9758 invoked by uid 500); 31 Aug 2017 04:09:06 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 9749 invoked by uid 99); 31 Aug 2017 04:09:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Aug 2017 04:09:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 935A11821F2 for ; Thu, 31 Aug 2017 04:09:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id y4HyXzKW-HNg for ; Thu, 31 Aug 2017 04:09:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 6E21B60F03 for ; Thu, 31 Aug 2017 04:09:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7FD3EE0041 for ; Thu, 31 Aug 2017 04:09:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D4E0723F0D for ; Thu, 31 Aug 2017 04:09:00 +0000 (UTC) Date: Thu, 31 Aug 2017 04:09:00 +0000 (UTC) From: "Kenneth Knowles (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-2516) User reports 4 minutes to process 1 million line CSV in DirectRunner MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 31 Aug 2017 04:09:08 -0000 [ https://issues.apache.org/jira/browse/BEAM-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148418#comment-16148418 ] Kenneth Knowles commented on BEAM-2516: --------------------------------------- I think for 2.2.0 it is best to remove the translation to/from a proto by hiding it behind PipelineOptions. There's a lot of overhead right now because of the impedance mismatch between the parts that are still Java-specific and the parts which are SDK-agnostic. In the full story for the portability framework, the DoFns and other UDFs can't even be deserialized, but shipped to the SDK harness. The harness will own the caching, so it probably doesn't make sense to add it to the DirectRunner unless there's one silly repeated deserialization we can eliminate. Based on the profiling results, perhaps there is, but no need to block anything on it. > User reports 4 minutes to process 1 million line CSV in DirectRunner > -------------------------------------------------------------------- > > Key: BEAM-2516 > URL: https://issues.apache.org/jira/browse/BEAM-2516 > Project: Beam > Issue Type: Bug > Components: runner-direct > Reporter: Kenneth Knowles > Priority: Minor > Fix For: 2.2.0 > > > https://stackoverflow.com/questions/44736414/simple-apache-beam-manipulations-work-very-slow > I don't know what the expectation are here, so I wasn't ready to say this is WAI. Low priority since it isn't what the runner is for anyhow, but this seems like the scale of data that should be snappy. Worth investigating, or maybe you can quickly indicate why it is expected? -- This message was sent by Atlassian JIRA (v6.4.14#64029)