spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <>
Subject [jira] [Commented] (SPARK-20394) Replication factor value Not changing properly
Date Mon, 10 Jul 2017 23:01:00 GMT


Marcelo Vanzin commented on SPARK-20394:

Have you tried setting the replication to 1 in your {{hdfs-site.xml}}?

IIRC Spark 1.6 doesn't propagate the HiveContext configuration to the Hive library in some

> Replication factor value Not changing properly
> ----------------------------------------------
>                 Key: SPARK-20394
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Submit
>    Affects Versions: 1.6.0
>            Reporter: Kannan Subramanian
> To save SparkSQL dataframe to a persistent hive table using the below steps.
> a) RegisterTempTable to the dataframe as a tempTable
> b) create table <table name> (cols....)PartitionedBy(col1, col2) stored as parquet
> c) Insert into <table name> partition(col1, col2) select * from tempTable
> I have set dfs.replication is equal to "1" in hiveContext object. But It did not work
properly. That is replica is 1 for 80 % of the generated parquet files on HDFS and default
replica 3 is for remaining 20 % of parquet files in HDFS. I am not sure why the replica is
not reflecting to all the generated parquet files. Please let me know if you have any suggestions
or solutions

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message