Sorry Prabhu for hijacking this discussion a bit.. I wonder , what is the best practice to load the data in HDFS in general. Considering the size of the data ( many times its in gbs or TBs generally), how are storage and time constraints handled.
If anybody can share your experiences or best practice it would great!
If it is not a single file, you can upload them using multiple threads to HDFS.
On Wed, Sep 5, 2012 at 7:21 AM, prabhu K <email@example.com> wrote:
Please clarify the below questions.
1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many slave (Data Nodes) machines required.
2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is the configuration setup for cloud computing.
Please suggest and help me on this.