hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saravanan Nagarajan <saravanan.nagarajan...@gmail.com>
Subject Need help on Hadoop cluster capacity and hardware specification
Date Thu, 29 Jan 2015 12:53:45 GMT

Need help on Hadoop cluster capacity and hardware specification:


We are plan to migrate the existing “Enterprise Data warehouse”/”Business
intelligent “ to Hadoop based solution.

In the current system has Teradata as storage, Abinitio for ETL  and
MicroStrategy for reporting.  We like to replace the current solution with
Hadoop based solution. In the Hadoop solution, should store all raw CDR in
HDFS and ETL processing of that CDR using hive/spark ( using any Hadoop SQL
tool) .

In the current system, Teradata has 128TB storage and 100TB+ CDR files.


1. How many Node needed to store and process 228TB(128TB+100TB )of data?

2. What hardware configuration required for each node slave node and master

3. Which is the best SQL on Hadoop tools for writing ETL jobs?

We are considered hive, spark,casandra and cascading for evaluation. Please

suggest me if you have any other tools.

Please provide the valable input, thanks for you support.




View raw message