hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivas Surasani <hivehadooplearn...@gmail.com>
Subject Hadoop with Sharded MySql
Date Thu, 31 May 2012 22:02:46 GMT
All,

We are trying to implement sqoop in our environment which has 30 mysql
sharded databases and all the databases have around 30 databases with
150 tables in each of the database which are all sharded (horizontally
sharded that means the data is divided into all the tables in mysql).

The problem is that we have a total of around 70K tables which needed
to be pulled from mysql into hdfs.

So, my question is that generating 70K sqoop commands and running them
parallel is feasible or not?

Also, doing incremental updates is going to be like invoking 70K
another sqoop jobs which intern kick of map-reduce jobs.

The main problem is monitoring and managing this huge number of jobs?

Can anyone suggest me the best way of doing it or is sqoop a good
candidate for this type of scenario?

Currently the same process is done by generating tsv files  mysql
server and dumped into staging server and  from there we'll generate
hdfs put statements..

Appreciate your suggestions !!!


Thanks,
Srinivas Surasani

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message