spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: sc.textFile() on windows cannot access UNC path
Date Thu, 12 Mar 2015 06:05:32 GMT
Sounds like the way of doing it. Could you try accessing a file from UNC
Path with native Java nio code and make sure it is able access it with the
URI file:////10.196.119.230/folder1/abc.txt?

Thanks
Best Regards

On Wed, Mar 11, 2015 at 7:45 PM, Wang, Ningjun (LNG-NPV) <
ningjun.wang@lexisnexis.com> wrote:

>  Thanks for the reference. Is the following procedure correct?
>
>
>
> 1.            Copy of the Hadoop source code
> org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java as my own
> class, e.g. UncTextInputFormat.java
>
> 2.            Modify UncTextInputFormat.java to handle UNC path
>
> 3.            Call sc.newAPIHadoopFile(…) with
>
>
>
> sc.newAPIHadoopFile[LongWritable, Text, UncTextInputFormat](“file:////
> 10.196.119.230/folder1/abc.txt”,
>
>          classOf[UncTextInputFormat],
>
>          classOf[LongWritable],
>
>         classOf[Text], conf)
>
>
>
> Ningjun
>
>
>
> *From:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
> *Sent:* Wednesday, March 11, 2015 2:40 AM
> *To:* Wang, Ningjun (LNG-NPV)
> *Cc:* java8964; user@spark.apache.org
>
> *Subject:* Re: sc.textFile() on windows cannot access UNC path
>
>
>
> ​​
>
> I don't have a complete example for your usecase, but you can see a lot of
> codes showing how to use new APIHadoopFile from here
> <https://github.com/search?q=sc.newAPIHadoopFile&type=Code&utf8=%E2%9C%93>
>
>
>   Thanks
>
> Best Regards
>
>
>
> On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) <
> ningjun.wang@lexisnexis.com> wrote:
>
> This sounds like the right approach. Is there any sample code showing how
> to use sc.newAPIHadoopFile  ? I am new to Spark and don’t know much about
> Hadoop. I just want to read a text file from UNC path into an RDD.
>
>
>
> Thanks
>
>
>
>
>
> *From:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
> *Sent:* Tuesday, March 10, 2015 9:14 AM
> *To:* java8964
> *Cc:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
> *Subject:* Re: sc.textFile() on windows cannot access UNC path
>
>
>
> You can create your own Input Reader (using java.nio.*) and pass it to the
> sc.newAPIHadoopFile while reading.
>
>
>
>
>   Thanks
>
> Best Regards
>
>
>
> On Tue, Mar 10, 2015 at 6:28 PM, java8964 <java8964@hotmail.com> wrote:
>
> I think the work around is clear.
>
>
>
> Using JDK 7, and implement your own saveAsRemoteWinText() using
> java.nio.path.
>
>
>
> Yong
>  ------------------------------
>
> From: ningjun.wang@lexisnexis.com
> To: java8964@hotmail.com; user@spark.apache.org
> Subject: RE: sc.textFile() on windows cannot access UNC path
> Date: Tue, 10 Mar 2015 03:02:37 +0000
>
>
>
> Hi Yong
>
>
>
> Thanks for the reply. Yes it works with local drive letter. But I really
> need to use UNC path because the path is input from at runtime. I cannot
> dynamically assign a drive letter to arbitrary UNC path at runtime.
>
>
>
> Is there any work around that I can use UNC path for sc.textFile(…)?
>
>
>
>
>
> Ningjun
>
>
>
>
>
> *From:* java8964 [mailto:java8964@hotmail.com]
> *Sent:* Monday, March 09, 2015 5:33 PM
> *To:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
> *Subject:* RE: sc.textFile() on windows cannot access UNC path
>
>
>
> This is a Java problem, not really Spark.
>
>
>
> From this page:
> http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u
>
>
>
> You can see that using Java.nio.* on JDK 7, it will fix this issue. But
> Path class in Hadoop will use java.io.*, instead of java.nio.
>
>
>
> You need to manually mount your windows remote share a local driver, like
> "Z:", then it should work.
>
>
>
> Yong
>  ------------------------------
>
> From: ningjun.wang@lexisnexis.com
> To: user@spark.apache.org
> Subject: sc.textFile() on windows cannot access UNC path
> Date: Mon, 9 Mar 2015 21:09:38 +0000
>
> I am running Spark on windows 2008 R2. I use sc.textFile() to load text
> file  using UNC path, it does not work.
>
>
>
> *sc*.textFile(*raw"file:////10.196.119.230/folder1/abc.txt"*, 4).count()
>
>
>
> Input path does not exist: file:/10.196.119.230/folder1/abc.txt
>
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> file:/10.196.119.230/tar/Enron/enron-207-short.load
>
>             at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
>
>             at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
>
>             at
> org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
>
>             at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>
>             at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>
>             at scala.Option.getOrElse(Option.scala:120)
>
>             at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>
>             at
> org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>
>             at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>
>             at
> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>
>             at scala.Option.getOrElse(Option.scala:120)
>
>             at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>
>             at
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
>
>             at org.apache.spark.rdd.RDD.count(RDD.scala:910)
>
>             at
> ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)
>
>             at
> ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
>
>             at
> ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
>
>             at
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>
>             at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>
>             at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>
>             at org.scalatest.Transformer.apply(Transformer.scala:22)
>
>             at org.scalatest.Transformer.apply(Transformer.scala:20)
>
>             at
> org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>
>             at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>
>             at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
>
>             at
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>
>             at
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>
>             at
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>
>             at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>
>             at
> org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>
>             at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>
>             at
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>
>             at
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>
>             at
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>
>             at
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>
>             at scala.collection.immutable.List.foreach(List.scala:318)
>
>             at
> org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>
>             at org.scalatest.SuperEngine.org
> $scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>
>             at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>
>             at
> org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>
>             at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>
>             at org.scalatest.Suite$class.run(Suite.scala:1424)
>
>             at org.scalatest.FunSuite.org
> $scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>
>             at
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>
>             at
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>
>             at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>
>             at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>
>             at ltn.analytics.tests.IndexTest.org
> $scalatest$BeforeAndAfterAll$$super$run(IndexTest.scala:15)
>
>             at
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>
>             at
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>
>             at ltn.analytics.tests.IndexTest.run(IndexTest.scala:15)
>
>             at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>
>             at
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
>
>             at
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
>
>             at scala.collection.immutable.List.foreach(List.scala:318)
>
>             at
> org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
>
>             at
> org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
>
>             at
> org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
>
>             at
> org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
>
>             at
> org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
>
>             at org.scalatest.tools.Runner$.run(Runner.scala:883)
>
>             at org.scalatest.tools.Runner.run(Runner.scala)
>
>             at
> org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:137)
>
>             at
> org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
>
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>             at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>             at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>             at java.lang.reflect.Method.invoke(Method.java:606)
>
>             at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
>
>
>
>
>
> The path is correct, I can open windows explorer enter the following path
> to open the text file
>
> *\\10.196.119.230\folder1\abc.txt*
>
>
>
> I have tried to use 3 slah, 2 slah, and always got the same error
>
>
>
> *sc*.textFile(*raw"file:///10.196.119.230/folder1/abc.txt"*, 4).count()
>
> *sc*.textFile(*raw"file://10.196.119.230/folder1/abc.txt"*, 4).count()
>
>
>
> Please advise.
>
> Ningjun
>
>
>
>
>

Mime
View raw message