crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: multiple output format with crunch pipeline
Date Wed, 05 Aug 2015 06:39:19 GMT
Yeah, that we don't have right now. Writing custom Targets for this is
do-able (https://issues.apache.org/jira/browse/CRUNCH-555 ) but it isn't
super-fun.

J

On Tue, Aug 4, 2015 at 11:38 PM, Nipur Patodi <er.nipur.patodi@gmail.com>
wrote:

> Hey Josh,
>
> Thanks, AvroPathPerKeyTarget works just fine. But I am looking for some
> thing which is equivalent to it but for writable objects
>
> Thanks much,
>
> _Nipur
>
> On Wed, Aug 5, 2015 at 12:03 PM, Josh Wills <jwills@cloudera.com> wrote:
>
>> So that seems fine, although we just now added the support for creating
>> child directories in the keys:
>> https://issues.apache.org/jira/browse/CRUNCH-543
>>
>> Are you running into a problem using the AvroPathPerKeyTarget as the
>> output of that table once you've called ungroup() on it?
>>
>> On Tue, Aug 4, 2015 at 11:28 PM, Nipur Patodi <er.nipur.patodi@gmail.com>
>> wrote:
>>
>>> hey Josh,
>>>
>>> I want output from PGroupTable<String, String> to multiple files  where
>>> file name path  is actually key for PGroupTable.
>>> example PGroupTable<String, String> table =
>>>
>>>  [ /root/test, { data1,data2}],
>>>
>>>  [/root/test2,{data3,data4}]
>>>
>>> output should be
>>> $hadoop fs -cat /root/test/part-m-00000
>>> data1
>>> data2
>>>
>>> $hadoop fs -cat /root/test2/part-m-00000
>>> data3
>>> data4
>>>
>>>
>>> Thanks,
>>>
>>> _Nipur
>>>
>>>
>>>
>>> On Wed, Aug 5, 2015 at 11:27 AM, Josh Wills <jwills@cloudera.com> wrote:
>>>
>>>> Hey Nipur,
>>>>
>>>> I'm not quite sure what you mean: do you want to output a
>>>> PTable<String, String> via an AvroPathPerKeyTarget? Or a PTable<String,
>>>> Pair<String, String>>?
>>>>
>>>> J
>>>>
>>>> On Tue, Aug 4, 2015 at 10:49 PM, Nipur Patodi <
>>>> er.nipur.patodi@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am trying to write  PGroupedTable contents to multiple output files
>>>>> based on key of PGroupedTable. I know we have AvroPathPerKeyTarget for
avro
>>>>> kind of object.
>>>>> But do we have some thing equivalent for Pair<Strings, Strings>?
>>>>>
>>>>> Please suggest.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> _Nipur
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera <http://www.cloudera.com>
>>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>>
>>>
>>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message