crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nipur Patodi <er.nipur.pat...@gmail.com>
Subject Re: multiple output format with crunch pipeline
Date Wed, 05 Aug 2015 06:28:22 GMT
hey Josh,

I want output from PGroupTable<String, String> to multiple files  where
file name path  is actually key for PGroupTable.
example PGroupTable<String, String> table =
                                                                         [
/root/test, { data1,data2}],

 [/root/test2,{data3,data4}]

output should be
$hadoop fs -cat /root/test/part-m-00000
data1
data2

$hadoop fs -cat /root/test2/part-m-00000
data3
data4


Thanks,

_Nipur



On Wed, Aug 5, 2015 at 11:27 AM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Nipur,
>
> I'm not quite sure what you mean: do you want to output a PTable<String,
> String> via an AvroPathPerKeyTarget? Or a PTable<String, Pair<String,
> String>>?
>
> J
>
> On Tue, Aug 4, 2015 at 10:49 PM, Nipur Patodi <er.nipur.patodi@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am trying to write  PGroupedTable contents to multiple output files
>> based on key of PGroupedTable. I know we have AvroPathPerKeyTarget for avro
>> kind of object.
>> But do we have some thing equivalent for Pair<Strings, Strings>?
>>
>> Please suggest.
>>
>> Thanks,
>>
>> _Nipur
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message