Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EDF4188B1 for ; Wed, 24 Feb 2016 08:18:19 +0000 (UTC) Received: (qmail 40100 invoked by uid 500); 24 Feb 2016 08:18:06 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 40040 invoked by uid 500); 24 Feb 2016 08:18:05 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 40030 invoked by uid 99); 24 Feb 2016 08:18:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2016 08:18:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 80608C042D for ; Wed, 24 Feb 2016 08:18:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.298 X-Spam-Level: * X-Spam-Status: No, score=1.298 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=ifwe-co.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 6nDuI8uNmNG6 for ; Wed, 24 Feb 2016 08:18:04 +0000 (UTC) Received: from mail-io0-f181.google.com (mail-io0-f181.google.com [209.85.223.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8516A5FB4C for ; Wed, 24 Feb 2016 08:18:04 +0000 (UTC) Received: by mail-io0-f181.google.com with SMTP id l127so24353478iof.3 for ; Wed, 24 Feb 2016 00:18:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ifwe-co.20150623.gappssmtp.com; s=20150623; h=mime-version:date:message-id:subject:from:to:content-type; bh=sIxsbjVMIXMzAjL4wZ1UkI6GQRrq5AgS5JQoKFGKZSI=; b=p3nI+LMBykYbJN8EkZtpvtc8yBPXI2u8MpxP+BFck87Cj03RshOq4TRCZnLJRzpLHE vC7do3iqqLZ7dVxyiDT1sYccgWhcREyTyeC0p10yCIxXDvA7mgMK12g/EC3I/2Vo01SL pMdOgekBlvbyIEE3gWLGU4NS+Z3LUGO8zgwp/u2GUz6yo+QY5SLcrKpcy0p1yeiMb+lU TBK8XbwO25iI70DHbQSvAtl5J+tPJTI359i9UByw2OY/PJBj56Kuon5KEvlecWCHpHoU IapHk8bVk7uSbLGMa2SyXw5owHtKaoehexMK7tA5fJEl1NpW3pckpEK8UBlzWDP4FEZH A6Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=sIxsbjVMIXMzAjL4wZ1UkI6GQRrq5AgS5JQoKFGKZSI=; b=I7JHlsZ4S/rQdJsfYIYv/W2+wVeRpscDJuEvXtQgEHnlofRvGX4LMGwyWO1yi5vwKq 0J6rYpGuJI+CtfB8MAarJXX6PUNxVPCyEhJuWtXPozqCvRakEelGdexfDdcKxvsufUhy 2stDyx/Rf/K6kUOPuFwd8wh9OPPf+DJKLksE7zvdyb4UeDJUwk52NaYeMbZvck18ns5I IW/x3dvzbCmcTGVlhFVmpZXEmrjCPtgOk1XpQWuo+vRNyG033I+lxIAG1NTiq/o04hP7 QozVx30NY4GBGdgqsbK4YMNKiVal5F9eCMZslVD+7L7OvFvn6x6DCha0jdVKC0p767hP PRQg== X-Gm-Message-State: AG10YOSPzjUosLKykf//oIj3jNYjF1ijabZZPPiHGeMXJOZoc9qa2VuG3Oc7wDEyaNG/g/DXQaTmABgYkaHhHqXE MIME-Version: 1.0 X-Received: by 10.107.164.145 with SMTP id d17mr36000662ioj.112.1456301878372; Wed, 24 Feb 2016 00:17:58 -0800 (PST) Received: by 10.36.12.76 with HTTP; Wed, 24 Feb 2016 00:17:58 -0800 (PST) Date: Wed, 24 Feb 2016 00:17:58 -0800 Message-ID: Subject: Best way to pass GenericData.Record from one fn to the next one From: Marcin Michalski To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=001a114217e4258434052c7fb47c --001a114217e4258434052c7fb47c Content-Type: text/plain; charset=UTF-8 Hi, is there an easy way to pass GenericData.Record between Fns in crunch without specifically stating the schema? Since I want to pass multiple avro files that have various schemas as input to a single DoFn which will enhance the data into a Pair and later I want to do an aggregation (deduping) Fn on that data but don't want to specify the Schema in between (I just want to work with GenericData.Record instances. Here is an example PCollection messages = pipeline.read(From.avroFile("/events/*/20160223/")); // I don't want pass the schema instance but rather just work with GenericData.Record, is that possible? Or do I need to store use Avros.bytes instead and then reconstruct the Record later in the next Fn? messages.parallellDo(new EventEnhancerDoFn(), Avros.generics(messageSchema)).groupByKey... Thanks, Marcin --001a114217e4258434052c7fb47c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi, is there an easy way = to pass GenericData.Record between Fns in crunch without specifically stati= ng the schema? Since I want to pass multiple avro files that have various s= chemas as input to a single DoFn which will enhance the data into a Pair an= d later I want to do an aggregation (deduping) Fn on that data but don'= t want to specify the Schema in between (I just want to work with GenericDa= ta.Record instances. Here is an example

PCollection<Record>= messages =3D pipeline.read(From.avroFile("/events/*/20160223/"))= ;
// I don't want pass the schema instance but rather just work with = GenericData.Record, is that possible? Or do I need to store use Avros.bytes= instead and then reconstruct the Record later in the next Fn?
messages.parallellDo(new EventEnhancerD= oFn(), Avros.generics(messageSchema)).groupByKey...


Than= ks,
Marcin

<= div class=3D"gmail_signature">
<= /div>
--001a114217e4258434052c7fb47c--