crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nipur Patodi <er.nipur.pat...@gmail.com>
Subject Re: multiple output format with crunch pipeline
Date Wed, 05 Aug 2015 06:38:03 GMT
Hey Josh,

Thanks, AvroPathPerKeyTarget works just fine. But I am looking for some
thing which is equivalent to it but for writable objects

Thanks much,

_Nipur

On Wed, Aug 5, 2015 at 12:03 PM, Josh Wills <jwills@cloudera.com> wrote:

> So that seems fine, although we just now added the support for creating
> child directories in the keys:
> https://issues.apache.org/jira/browse/CRUNCH-543
>
> Are you running into a problem using the AvroPathPerKeyTarget as the
> output of that table once you've called ungroup() on it?
>
> On Tue, Aug 4, 2015 at 11:28 PM, Nipur Patodi <er.nipur.patodi@gmail.com>
> wrote:
>
>> hey Josh,
>>
>> I want output from PGroupTable<String, String> to multiple files  where
>> file name path  is actually key for PGroupTable.
>> example PGroupTable<String, String> table =
>>
>>  [ /root/test, { data1,data2}],
>>
>>  [/root/test2,{data3,data4}]
>>
>> output should be
>> $hadoop fs -cat /root/test/part-m-00000
>> data1
>> data2
>>
>> $hadoop fs -cat /root/test2/part-m-00000
>> data3
>> data4
>>
>>
>> Thanks,
>>
>> _Nipur
>>
>>
>>
>> On Wed, Aug 5, 2015 at 11:27 AM, Josh Wills <jwills@cloudera.com> wrote:
>>
>>> Hey Nipur,
>>>
>>> I'm not quite sure what you mean: do you want to output a PTable<String,
>>> String> via an AvroPathPerKeyTarget? Or a PTable<String, Pair<String,
>>> String>>?
>>>
>>> J
>>>
>>> On Tue, Aug 4, 2015 at 10:49 PM, Nipur Patodi <er.nipur.patodi@gmail.com
>>> > wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am trying to write  PGroupedTable contents to multiple output files
>>>> based on key of PGroupedTable. I know we have AvroPathPerKeyTarget for avro
>>>> kind of object.
>>>> But do we have some thing equivalent for Pair<Strings, Strings>?
>>>>
>>>> Please suggest.
>>>>
>>>> Thanks,
>>>>
>>>> _Nipur
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera <http://www.cloudera.com>
>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>
>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message