spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Recent spark sc.textFile needs hadoop for folders?!?
Date Fri, 26 Jun 2015 08:07:42 GMT
Yes, Spark Core depends on Hadoop libs, and there is this unfortunate
twist on Windows. You'll still need HADOOP_HOME set appropriately
since Hadoop needs some special binaries to work on Windows.

On Fri, Jun 26, 2015 at 11:06 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
> You just need to set your HADOOP_HOME which appears to be null in the
> stackstrace. If you are not having the winutils.exe, then you can download
> and put it there.
>
> Thanks
> Best Regards
>
> On Thu, Jun 25, 2015 at 11:30 PM, Ashic Mahtab <ashic@live.com> wrote:
>>
>> Hello,
>> Just trying out spark 1.4 (we're using 1.1 at present). On Windows, I've
>> noticed the following:
>>
>> * On 1.4, sc.textFile("D:\\folder\\").collect() fails from both
>> spark-shell.cmd and when running a scala application referencing the
>> spark-core package from maven.
>> * sc.textFile("D:\\folder\\file.txt").collect() succeeds.
>> * On 1.1, both succeed.
>> * When referencing the binaries in the scala application, this is the
>> error:
>>
>> 15/06/25 18:30:13 ERROR Shell: Failed to locate the winutils binary in the
>> hadoop binary path
>> java.io.IOException: Could not locate executable null\bin\winutils.exe in
>> the Hadoop binaries.
>> at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
>> at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
>> at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
>> at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
>> at
>> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
>> at
>> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:978)
>> at
>> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:978)
>> at
>> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>> at
>> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>>
>> This seems quite strange...is this a known issue? Worse, is this a
>> feature? I don't have to be using hadoop at all... just want to process some
>> files and data in Cassandra.
>>
>> Regards,
>> Ashic.
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message