crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Bernstein <igorbernst...@spotify.com>
Subject Re: Sequential Processing
Date Thu, 28 Apr 2016 19:43:24 GMT
You can also throw in a SecondarySort:
https://crunch.apache.org/user-guide.html#secsort to get each grouped
presorted
On Thu, Apr 28, 2016 at 2:16 PM David Ortiz <dpo5003@gmail.com> wrote:

> I think I am confused as to what you're going for.  A parallelDo over the
> PGroupedTable should do exactly what you described.  You get key,
> Iterable<DataRecord> for a single key, at which point you can do whatever
> you want in the DoFn.  That's exactly what i had to do on a flow at work,
> where I do a groupByKey on a PTable, then in the ensuing parallelDo, create
> a List out of the Iterable<Record> and do some aggregate functions over it.
>
> On Thu, Apr 28, 2016 at 2:59 PM Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> Crunch Gurus,
>>
>> We need to process some data in order, so parallelDo shouldn’t work for
>> this approach. We’ve looked at SequentialDo, but not sure how exactly to
>> make it work…(Not much documentation on it).
>> *DataRecord is a java object with getters and setters.*
>>
>> Right now, we have a PGroupedTable<String, DataRecord> where the String
>> keys in the PGT are linked to multiple DataRecord objects (standard PGT
>> behavior).
>> What we need to do now is loop through all records for a particular key,
>> sort them, and do some simple calculations.
>>
>> *What is the best way/standard way to process a PgroupedTable so that
>> records corresponding to the same key are all kept together and processed?*
>>
>> Right now we know how to crack open a PGT in the local code and flip
>> through it (the SingleUseIterable), but we need to make a new dataset out
>> of it, not just play with it.
>>
>> Any direction or guidance would be appreciated!
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone (704-758-1000)
>> or by e-mail and destroy all copies of this message electronic, paper, or
>> otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
>

Mime
View raw message