carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipesala <...@git.apache.org>
Subject [GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...
Date Thu, 18 Jan 2018 04:46:20 GMT
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
  
    @xuchuanyin There is a reason why we do copy instead of directly writing to HDFS.
    1. We make sure that one complete carbondata file goes to one HDFS block only, while copying
it to HDFS from local disk we specify the block size. Other wise it impacts query performance
a lot.
    2. Remove the overhead of writing to HDFS directly (it internally writes to replication
as well) , so start copying in a different thread to avoid blocking of main loading flow.


---

Mime
View raw message