spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bui, Tri" <Tri....@VerizonWireless.com.INVALID>
Subject RE: hdfs streaming context
Date Mon, 01 Dec 2014 23:02:31 GMT
For the streaming example I am working on, Its accepted ("hdfs:///user/data") without the localhost
info.  

Let me dig through my hdfs config.





-----Original Message-----
From: Sean Owen [mailto:sowen@cloudera.com] 
Sent: Monday, December 01, 2014 4:50 PM
To: Benjamin Cuthbert
Cc: user@spark.apache.org
Subject: Re: hdfs streaming context

Yes, in fact, that's the only way it works. You need "hdfs://localhost:8020/user/data", I
believe.

(No it's not correct to write "hdfs:///...")

On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert <cuthbert.ben@gmail.com> wrote:
> All,
>
> Is it possible to stream on HDFS directory and listen for multiple files?
>
> I have tried the following
>
> val sparkConf = new SparkConf().setAppName("HdfsWordCount")
> val ssc = new StreamingContext(sparkConf, Seconds(2)) val lines = 
> ssc.textFileStream("hdfs://localhost:8020/user/data/*")
> lines.filter(line => line.contains("GE"))
> lines.print()
> ssc.start()
>
> But I get
>
> 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time 
> 1417469742000 ms
> java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does not exist.
>         at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
>         at org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
>         at 
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputD
> Stream.scala:75)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For 
> additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail:
user-help@spark.apache.org

Mime
View raw message