spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurvinder Singh <gurvinder.si...@uninett.no>
Subject Re: reading compress lzo files
Date Sun, 06 Jul 2014 06:22:36 GMT
On 07/06/2014 05:19 AM, Nicholas Chammas wrote:
> On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
> <gurvinder.singh@uninett.no <mailto:gurvinder.singh@uninett.no>> wrote:
> 
>     csv =
>     sc.newAPIHadoopFile(opts.input,"com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text").count()
> 
> Does anyone know what the rough equivalent of this would be in the Scala
> API?
> 
I am not sure, I haven't tested it using scala.
com.hadoop.mapreduce.LzoTextInputFormat class is from this package
https://github.com/twitter/hadoop-lzo

I have installed it from clourdera "hadoop-lzo" package with liblzo2-2
debian package on all of my workers. Make sure you have hadoop-lzo.jar
in your class path for spark.

- Gurvinder

> I am trying the following, but the first import yields an error on my
> |spark-ec2| cluster:
> 
> |import com.hadoop.mapreduce.LzoTextInputFormat
> import org.apache.hadoop.io.LongWritable
> import org.apache.hadoop.io.Text
> 
> sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data",
LzoTextInputFormat, LongWritable, Text)
> |
> 
> |scala> import com.hadoop.mapreduce.LzoTextInputFormat
> <console>:12: error: object hadoop is not a member of package com
>        import com.hadoop.mapreduce.LzoTextInputFormat
> |
> 
> Nick
> 
> ‚Äč



Mime
View raw message