crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <>
Subject Re: Sequential Processing
Date Thu, 28 Apr 2016 19:16:02 GMT
I think I am confused as to what you're going for.  A parallelDo over the
PGroupedTable should do exactly what you described.  You get key,
Iterable<DataRecord> for a single key, at which point you can do whatever
you want in the DoFn.  That's exactly what i had to do on a flow at work,
where I do a groupByKey on a PTable, then in the ensuing parallelDo, create
a List out of the Iterable<Record> and do some aggregate functions over it.

On Thu, Apr 28, 2016 at 2:59 PM Robinson, Landon - Landon <> wrote:

> Crunch Gurus,
> We need to process some data in order, so parallelDo shouldn’t work for
> this approach. We’ve looked at SequentialDo, but not sure how exactly to
> make it work…(Not much documentation on it).
> *DataRecord is a java object with getters and setters.*
> Right now, we have a PGroupedTable<String, DataRecord> where the String
> keys in the PGT are linked to multiple DataRecord objects (standard PGT
> behavior).
> What we need to do now is loop through all records for a particular key,
> sort them, and do some simple calculations.
> *What is the best way/standard way to process a PgroupedTable so that
> records corresponding to the same key are all kept together and processed?*
> Right now we know how to crack open a PGT in the local code and flip
> through it (the SingleUseIterable), but we need to make a new dataset out
> of it, not just play with it.
> Any direction or guidance would be appreciated!
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *

View raw message