Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F42E18095 for ; Fri, 15 Jan 2016 22:47:41 +0000 (UTC) Received: (qmail 8176 invoked by uid 500); 15 Jan 2016 22:47:41 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 8139 invoked by uid 500); 15 Jan 2016 22:47:41 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 8112 invoked by uid 99); 15 Jan 2016 22:47:41 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2016 22:47:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2C916C0FB3 for ; Fri, 15 Jan 2016 22:36:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.101 X-Spam-Level: X-Spam-Status: No, score=-0.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hJj7fSDVp_43 for ; Fri, 15 Jan 2016 22:36:33 +0000 (UTC) Received: from mail-qg0-f44.google.com (mail-qg0-f44.google.com [209.85.192.44]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 96C4820658 for ; Fri, 15 Jan 2016 22:36:33 +0000 (UTC) Received: by mail-qg0-f44.google.com with SMTP id e32so430924393qgf.3 for ; Fri, 15 Jan 2016 14:36:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:date:in-reply-to:references:content-type :mime-version; bh=bClIeczEKAOP6vNLNA9XyKX5FECHRqaJStpxSlLt/94=; b=GdPaztj71WZl9W03R1TxEBQ9lvPdRZnQfj85Ke6SRVCtwnWtDkd54s9Ap24iLnb6Vf 5hdH2xtdqFbFLAkj5HSaVoefCkO1ryOVVGvjBwbqJsALL9HtMubQj8FKSNVuWJp8bny5 ZFWJ2572fQTu+mu0iOd3bejv0B0bn8urYPiLdav6pv8p2cv8R/+29Oyhq7GaWIWYxAYL h19YjJVzP+c4H5ncpfRQoKHomqtnYmkCJ82HqI406mPC8thxpdqwl7h3FQbGZVW7m2WA 7YQ9kIabSxSSXbotL7yp+blcHVU6SIFWzUGtZWMJwJxIARpxnjbDi9wwPWFHuFxHKEjt Nb5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:content-type:mime-version; bh=bClIeczEKAOP6vNLNA9XyKX5FECHRqaJStpxSlLt/94=; b=macjha1L8KA6pvYE6dke2hgLyiWAsZfWdpcaq7CaTK6DEOnYjU+28w8or6CZc8hT9/ OcsA70ptV9aF2l3DH15PPCCOLgLCcOmSh4x9dU6sJdmmCje+7yyiYFAVCdT8L6LsJAlR mqwbTGgiKcn5AHaW8NAErRhaVvBOhxtKsshky+Ps1z5qCAC2040RQuogyhcH6SCtEVry Iz9Vdp6hgJv7gkUl6x9iA8Zy++4Fxbcv/L4KlHh6svqaYtvPvOSBU0TrXU71lORPEz48 nbLnLYRcr1Zo/ckHjBxmK9U5iRO4SA+t34dHG9xA5wNNl6l7eDPDKlkyFKFqPv1bpJwU unFg== X-Gm-Message-State: ALoCoQmmcQ2vmt9IDmS7SJH21EFAjLfwVuQ4+kXk0LsF5p1I9k6ciBfwF3+lYxCyo+4SLY2js26cwsrZUEsNAfvAXCuMTXNp9w== X-Received: by 10.140.106.165 with SMTP id e34mr16961512qgf.23.1452897386745; Fri, 15 Jan 2016 14:36:26 -0800 (PST) Received: from pixel.fios-router.home (pool-96-231-213-81.washdc.fios.verizon.net. [96.231.213.81]) by smtp.gmail.com with ESMTPSA id v187sm5319249qhb.27.2016.01.15.14.36.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Jan 2016 14:36:24 -0800 (PST) Message-ID: <1452897377.2787.17.camel@gmail.com> Subject: Re: Custom ReadableSource implementations that reuse a PType? From: Evan McClain To: dev@crunch.apache.org Date: Fri, 15 Jan 2016 17:36:17 -0500 In-Reply-To: References: <1452791283.7336.16.camel@gmail.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-CB5u2A7yk3K0omp0kNyn" X-Mailer: Evolution 3.18.3 (3.18.3-1.fc23) Mime-Version: 1.0 --=-CB5u2A7yk3K0omp0kNyn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The PCollection that is being cached is the result of a parallelDo (so it is a DoCollection). The parent InputCollection has the source I defined. Since the first read is triggered by that .cache() call on the DoCollection, the file source given in the InputCollection is never actually used. Stepping through a debugger shows that even after using pipeline.read(...).cache(), my ReadableSource.read() doesn't get called even though I see my source in outputTargetsToMaterialize.value.source. I only see AvroFileSourceTarget performing the reads so I'm probably missing something (I'm new to crunch). Any pointers would be appreciated. Evan On Thu, 2016-01-14 at 15:29 -0800, Josh Wills wrote: > Hey Evan, >=20 > So I must be missing some context here-- it looks to me like > MRPipeline.getMaterializeSourceTarget first checks to see if you > passed in > an input collection type and then looks at its Source to see if it > implements ReadableSource and uses that if it's available before > defaulting > to the ptype's default file source later on in the file via > createIntermediateOutput. Is the PCollection you're trying to > materialize > not an input collection? e.g., is it an input collection that has had > parallelDo called it on one or more times? >=20 > J >=20 > On Thu, Jan 14, 2016 at 9:08 AM, Evan McClain > wrote: >=20 > > Hi list, > >=20 > > I am trying to implement a custom ReadableSource by extending > > FileSourceImpl, but it doesn't look like my source is actually > > being > > used since I am reusing AvroTypes (basically just trying to pass > > through some avro getMeta() values). > >=20 > > It looks like cache() -> materialize() -> > > ptype.getDefaultFileSource() > > is being used to perform the actual read (even though > > pipeline.read() > > is being passed my custom Source). > >=20 > > Is there any way to do this without also implementing a custom > > PType? > >=20 > > Thanks > > -- > > Evan McClain > > https://keybase.io/aeroevan > >=20 --=20 Evan McClain https://keybase.io/aeroevan --=-CB5u2A7yk3K0omp0kNyn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJWmXRiAAoJEPQx78gG1JMfNrkH/2m1YfcQum1rf7jZkcX+Mgkn MT5CLzpbwsg+Cnae828+oIVW885KY6Dau5ynldEX9VCv4zbO/nPoxt4l95YzlF4T Q0psqXeQ7JIf21rOuZjzhQmIx/rdzA3SAcARDAIIKtRdHygOLpA1dmZxlt8FfARq lYAR6Ocq4cbcCxUk6CLPvw/6r/hUUVrolLJYiRTnJgB5fuBYMssKqVyg8ifMp0zF mk03GtgHmFQgFbkaJgj+N4c9d/3CSNWGOzEgRdCnNLsKJ6VZhuA7J04yPIrluBNk 4UmVavU8nM74Rgkpgmta2K0MrifRGtR1YN2qmawWcbgq+qBaxzuJTkZ/PcB+aJ8= =8GPQ -----END PGP SIGNATURE----- --=-CB5u2A7yk3K0omp0kNyn--