hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivas Surasani <hivehadooplearn...@gmail.com>
Subject Re: Hadoop with Sharded MySql
Date Fri, 01 Jun 2012 16:29:42 GMT
All,

I'm trying to get data into HDFS directly from sharded database and expose
to existing hive infrastructure.

( we are currently doing this way,, mysql->staging server->hdfs put
commands->hdfs, which is taking lot of time ).

If we have way of running single sqoop job across all shardes for single
table, I believe it makes life easier in terms of monotoring and exception
handlings..

Thanks,
Srinivas

On Fri, Jun 1, 2012 at 1:27 AM, anil gupta <anilgupta84@gmail.com> wrote:

> Hi Sujith,
>
> Srinivas is asking how to import data into HDFS using sqoop?  I believe he
> must have thought out well before designing the entire
> architecture/solution. He has not specified whether he would like to modify
> the data or not. Whether to use HIve or HBase is a different question
> altogether and depends on his use-case.
>
> Thanks,
> Anil
>
>
> On Thu, May 31, 2012 at 9:52 PM, Sujit Dhamale <sujitdhamale89@gmail.com
> >wrote:
>
> > Hi ,
> > instead of pulling 70K tables from mysql into hdfs.
> > take dump of all 30 table and put in to hBase data base .
> >
> > if you pulled 70K tables from mysql into hdfs , you need to use Hive ,
> but
> > modification will not possible in Hive :(
> >
> > *@ common-user :* please correct me , if i am wrong .
> >
> > Kind Regards
> > Sujit Dhamale
> > (+91 9970086652)
> > On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <edlinuxguru@gmail.com
> > >wrote:
> >
> > > Maybe you can do some VIEWs or unions or merge tables on the mysql
> > > side to overcome the aspect of launching so many sqoop jobs.
> > >
> > > On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
> > > <hivehadooplearning@gmail.com> wrote:
> > > > All,
> > > >
> > > > We are trying to implement sqoop in our environment which has 30
> mysql
> > > > sharded databases and all the databases have around 30 databases with
> > > > 150 tables in each of the database which are all sharded
> (horizontally
> > > > sharded that means the data is divided into all the tables in mysql).
> > > >
> > > > The problem is that we have a total of around 70K tables which needed
> > > > to be pulled from mysql into hdfs.
> > > >
> > > > So, my question is that generating 70K sqoop commands and running
> them
> > > > parallel is feasible or not?
> > > >
> > > > Also, doing incremental updates is going to be like invoking 70K
> > > > another sqoop jobs which intern kick of map-reduce jobs.
> > > >
> > > > The main problem is monitoring and managing this huge number of jobs?
> > > >
> > > > Can anyone suggest me the best way of doing it or is sqoop a good
> > > > candidate for this type of scenario?
> > > >
> > > > Currently the same process is done by generating tsv files  mysql
> > > > server and dumped into staging server and  from there we'll generate
> > > > hdfs put statements..
> > > >
> > > > Appreciate your suggestions !!!
> > > >
> > > >
> > > > Thanks,
> > > > Srinivas Surasani
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Regards,
-- Srinivas
Srinivas@cloudwick.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message