Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9CA7D200CCA for ; Wed, 19 Jul 2017 19:56:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9B10516938C; Wed, 19 Jul 2017 17:56:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E188116937D for ; Wed, 19 Jul 2017 19:56:02 +0200 (CEST) Received: (qmail 58247 invoked by uid 500); 19 Jul 2017 17:56:02 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 58238 invoked by uid 99); 19 Jul 2017 17:56:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jul 2017 17:56:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A657D180353 for ; Wed, 19 Jul 2017 17:56:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 3K3m-uL8qckz for ; Wed, 19 Jul 2017 17:56:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id C768C5F477 for ; Wed, 19 Jul 2017 17:56:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 56B57E0059 for ; Wed, 19 Jul 2017 17:56:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0EDAA21E8E for ; Wed, 19 Jul 2017 17:56:00 +0000 (UTC) Date: Wed, 19 Jul 2017 17:56:00 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (BEAM-2641) Improve discoverability of TextIO.readAll() as a replacement of TextIO.read() for large globs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 19 Jul 2017 17:56:03 -0000 Eugene Kirpichov created BEAM-2641: -------------------------------------- Summary: Improve discoverability of TextIO.readAll() as a replacement of TextIO.read() for large globs Key: BEAM-2641 URL: https://issues.apache.org/jira/browse/BEAM-2641 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Eugene Kirpichov Assignee: Eugene Kirpichov TextIO.readAll() dramatically outperforms TextIO.read() when reading very large numbers of files (hundreds of thousands or millions or more). However, it is not obvious that this is what you should use if you have such a filepattern in TextIO.read(). We should take a variety of measures to make it more discoverable, e.g.: * Add a parameter to TextIO.read(), like "withHintManyFiles()" * Log something suggesting the use of that hint when splitting TextIO if the filepattern is very large * Improve documentation * Post something on StackOverflow about this -- This message was sent by Atlassian JIRA (v6.4.14#64029)