Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 01C8E200C68 for ; Wed, 3 May 2017 22:25:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 00514160BBA; Wed, 3 May 2017 20:25:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 46903160BA1 for ; Wed, 3 May 2017 22:25:08 +0200 (CEST) Received: (qmail 94203 invoked by uid 500); 3 May 2017 20:25:07 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 94194 invoked by uid 99); 3 May 2017 20:25:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 May 2017 20:25:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 00A6F188342 for ; Wed, 3 May 2017 20:25:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.001 X-Spam-Level: X-Spam-Status: No, score=-100.001 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id XF3T4_aQUWPW for ; Wed, 3 May 2017 20:25:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 441765F5F9 for ; Wed, 3 May 2017 20:25:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7C417E0045 for ; Wed, 3 May 2017 20:25:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3D8D921DED for ; Wed, 3 May 2017 20:25:04 +0000 (UTC) Date: Wed, 3 May 2017 20:25:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-2150) Support for recursive wildcards in GcsPath MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 03 May 2017 20:25:09 -0000 [ https://issues.apache.org/jira/browse/BEAM-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995591#comment-15995591 ] ASF GitHub Bot commented on BEAM-2150: -------------------------------------- GitHub user meunierd opened a pull request: https://github.com/apache/beam/pull/2866 [BEAM-2150] Relax regex to support wildcard globbing for GCS Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- Something I've noticed is that Beam's usage of the GCS API doesn't leverage delimiters so we're actually always iterating over the full set of objects after the prefix which is why this PR is so tiny. Ideally, we can actually specify the delimiter `/` when not using recursive wildcards (`**`) for some efficiency gains. You can merge this pull request into a Git repository by running: $ git pull https://github.com/meunierd/beam BEAM-2150-gcs-recursive-wildcards Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/2866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2866 ---- commit 6d0d6257040a9725b22f4c94bdaa2de388fd2e65 Author: Devon Meunier Date: 2017-05-03T20:22:16Z [BEAM-2150] Relax regex to support wildcard globbing for GCS ---- > Support for recursive wildcards in GcsPath > ------------------------------------------ > > Key: BEAM-2150 > URL: https://issues.apache.org/jira/browse/BEAM-2150 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core, sdk-java-gcp > Reporter: Devon Meunier > Assignee: Devon Meunier > Priority: Minor > > When working with heavily nested folder structures in Google Cloud Storage, it's great to make use of recursive wildcards, which the current API explicitly does not support. > This code hasn't been touched in 2 years so it's likely that simply no one's gotten around to it yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)