spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan <velvia.git...@gmail.com>
Subject [Tachyon] Error reading from Parquet files in HDFS
Date Thu, 21 Aug 2014 19:22:29 GMT
Spark 1.0.2, Tachyon 0.4.1, Hadoop 1.0  (standard EC2 config)

scala> val gdeltT =
sqlContext.parquetFile("tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/")
14/08/21 19:07:14 INFO :
initialize(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005,
Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, hdfs-default.xml, hdfs-site.xml). Connecting to
Tachyon: tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005
14/08/21 19:07:14 INFO : Trying to connect master @ /172.31.42.40:19998
14/08/21 19:07:14 INFO : User registered at the master
ip-172-31-42-40.us-west-2.compute.internal/172.31.42.40:19998 got
UserId 14
14/08/21 19:07:14 INFO : Trying to get local worker host :
ip-172-31-42-40.us-west-2.compute.internal
14/08/21 19:07:14 INFO : No local worker on
ip-172-31-42-40.us-west-2.compute.internal
14/08/21 19:07:14 INFO : Connecting remote worker @
ip-172-31-47-74/172.31.47.74:29998
14/08/21 19:07:14 INFO : tachyon://172.31.42.40:19998
tachyon://172.31.42.40:19998
hdfs://ec2-54-213-113-173.us-west-2.compute.amazonaws.com:9000
14/08/21 19:07:14 INFO :
getFileStatus(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005):
HDFS Path: hdfs://ec2-54-213-113-173.us-west-2.compute.amazonaws.com:9000/gdelt-parquet/1979-2005
TPath: tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005
14/08/21 19:07:14 INFO : tachyon.client.TachyonFS@4b05b3ff
hdfs://ec2-54-213-113-173.us-west-2.compute.amazonaws.com:9000
/gdelt-parquet/1979-2005 tachyon.PrefixList@636c50d3
14/08/21 19:07:14 WARN : tachyon.home is not set. Using
/mnt/tachyon_default_home as the default value.
14/08/21 19:07:14 INFO : Get: /gdelt-parquet/1979-2005/_SUCCESS
14/08/21 19:07:14 INFO : Get: /gdelt-parquet/1979-2005/_metadata
14/08/21 19:07:14 INFO : Get: /gdelt-parquet/1979-2005/part-r-1.parquet

....

14/08/21 19:07:14 INFO :
getFileStatus(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/_metadata):
HDFS Path: hdfs://ec2-54-213-113-173.us-west-2.compute.amazonaws.com:9000/gdelt-parquet/1979-2005/_metadata
TPath: tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/_metadata
14/08/21 19:07:14 INFO :
open(tachyon://172.31.42.40:19998/gdelt-parquet/1979-2005/_metadata,
65536)
14/08/21 19:07:14 ERROR : The machine does not have any local worker.
14/08/21 19:07:14 ERROR : Reading from HDFS directly
14/08/21 19:07:14 ERROR : Reading from HDFS directly
java.io.IOException: can not read class parquet.format.FileMetaData: null
at parquet.format.Util.read(Util.java:50)
at parquet.format.Util.readFileMetaData(Util.java:34)
at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:310)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:296)


I'm not sure why this is saying that, as the Tachyon UI reports all 8
nodes being up?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message