crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nipur Patodi <er.nipur.pat...@gmail.com>
Subject Re: multiple output format with crunch pipeline
Date Wed, 05 Aug 2015 06:34:23 GMT
 Hey Josh

Appologies for adding confusion,

Flow should be some thing like this

          Pipeline pipeline = new MRPipeline(AggregatorDriver.class, conf);

        PCollection<String> record = pipeline.read(From.textFile(inputPath,
WritableTypeFamily.getInstance().strings()));

        PTable<String, String> outputTable = record.parallelDo(new
Processor(),Writables.tableOf(Writables.strings(), Writables.strings()));

        PGroupedTable<String, String> groupTable = outputTable.groupByKey();

        // Need to implement
        FilePathPerKeyTarget  target = new  FilePathPerKeyTarget(path);

          pipeline.write(groupTable, target, WriteMode.APPEND);
         PipelineResult result = pipeline.done();




On Wed, Aug 5, 2015 at 11:58 AM, Nipur Patodi <er.nipur.patodi@gmail.com>
wrote:

> hey Josh,
>
> I want output from PGroupTable<String, String> to multiple files  where
> file name path  is actually key for PGroupTable.
> example PGroupTable<String, String> table =
>                                                                          [
> /root/test, { data1,data2}],
>
>  [/root/test2,{data3,data4}]
>
> output should be
> $hadoop fs -cat /root/test/part-m-00000
> data1
> data2
>
> $hadoop fs -cat /root/test2/part-m-00000
> data3
> data4
>
>
> Thanks,
>
> _Nipur
>
>
>
> On Wed, Aug 5, 2015 at 11:27 AM, Josh Wills <jwills@cloudera.com> wrote:
>
>> Hey Nipur,
>>
>> I'm not quite sure what you mean: do you want to output a PTable<String,
>> String> via an AvroPathPerKeyTarget? Or a PTable<String, Pair<String,
>> String>>?
>>
>> J
>>
>> On Tue, Aug 4, 2015 at 10:49 PM, Nipur Patodi <er.nipur.patodi@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am trying to write  PGroupedTable contents to multiple output files
>>> based on key of PGroupedTable. I know we have AvroPathPerKeyTarget for avro
>>> kind of object.
>>> But do we have some thing equivalent for Pair<Strings, Strings>?
>>>
>>> Please suggest.
>>>
>>> Thanks,
>>>
>>> _Nipur
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>
>

Mime
View raw message