crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Scrunch example project with SBT?
Date Sat, 21 Jun 2014 23:57:08 GMT
Ha! Not the prettiest thing, but it'll do. The CrunchTool trait also has a
done() method, so you can also do

pcol.write(to.textFile(outputPath))
done()


On Fri, Jun 20, 2014 at 2:32 PM, Daniel Siegmann <daniel.siegmann@velos.io>
wrote:

> Got it to work like so:
>
>
> read(from.textFile(inputPath)).write(to.textFile(outputPath)).native.getPipeline().done()
>
> Is that the correct way?
>
> Thanks for the help, I have a running word count example now. :-)
>
>
>
> On Fri, Jun 20, 2014 at 4:34 PM, Josh Wills <jwills@cloudera.com> wrote:
>
>> You need to manually call run() or done() to execute the pipeline if
>> you're not materializing the output. The user guide will be useful for the
>> basic concepts, even though it focuses on the Java API.
>>  On Jun 20, 2014 1:27 PM, "Daniel Siegmann" <daniel.siegmann@velos.io>
>> wrote:
>>
>>> Thanks Josh! The thrift and protobuf defs were what I was missing. I'm
>>> able to compile and run the code now. I also updated to Scrunch 0.10.0.
>>>
>>> Any idea why it might not write the output? If I have
>>>
>>> countWords(args(0)).materialize.foreach(line => println(s"**** $line"))
>>>
>>> I get all my output, but
>>>
>>> countWords(args(0)).write(to.textFile(args(1)))
>>>
>>> Doesn't even create the output directory, even though I see this in my
>>> logs
>>>
>>> 14/06/20 16:17:47 INFO impl.FileTargetImpl: Will write output files to
>>> new path:
>>> /var/folders/th/7vf9rjqd1955jnwnzg3x9ym40000gn/T/1403295466563-1/wordcounts
>>>
>>> No exceptions or anything. I'm probably missing something obvious. :-(
>>>
>>>
>>> On Thu, Jun 19, 2014 at 6:03 PM, Josh Wills <jwills@cloudera.com> wrote:
>>>
>>>> Here you go: https://github.com/jwills/scrunch-demo
>>>>
>>>> Did this w/Maven; you'll have to forgive me as my SBT-fu isn't great.
>>>> It looks like vanilla Hadoop 1.x doesn't include any thrift/protobuf
>>>> dependencies that Scrunch expects to be present at compile-time; I added
>>>> them as provided dependencies in this example and then verified that I
>>>> could run the -job.jar that I built w/mvn package under Hadoop 1.0.3.
>>>>
>>>> J
>>>>
>>>>
>>>> On Thu, Jun 19, 2014 at 2:33 PM, Daniel Siegmann <
>>>> daniel.siegmann@velos.io> wrote:
>>>>
>>>>> Hi Josh, thanks for the reply.
>>>>>
>>>>>  Which version of Hadoop are you looking to compile against?
>>>>>>
>>>>>
>>>>> I think any 1.x version will suffice (our production cluster is MapR).
>>>>>
>>>>> The Spotify comparison is interesting. Too bad they didn't evaluate
>>>>> Scoobi as well. Thanks for the info.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera <http://www.cloudera.com>
>>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>>
>>>
>>>
>>>
>>> --
>>> Daniel Siegmann, Software Developer
>>> Velos
>>> Accelerating Machine Learning
>>>
>>> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
>>> E: daniel.siegmann@velos.io W: www.velos.io
>>>
>>
>
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: daniel.siegmann@velos.io W: www.velos.io
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message