spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Wei <>
Subject invalidate caching for hadoopFile input?
Date Tue, 21 Apr 2015 03:15:28 GMT
Hey folks,

I am trying to load a directory of avro files like this in spark-shell:

val data = sqlContext.avroFile("hdfs://path/to/dir/*").cache

This works fine, but when more files are uploaded to that directory
running these two lines again yields the same result. I suspect there
is some metadata caching in HadoopRDD, thus new files are ignored.

Does anyone know why this is happening? Is there a way to force reload
the whole directory without restarting spark-shell?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message