incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Boulon <jbou...@netflix.com>
Subject Re: Seeking a little advice
Date Tue, 24 Aug 2010 20:00:03 GMT
If the data is in 1 machine then there's probably no need to move the data.
So the question is more:

 *   Do you need more than one machine to do your ETL?
 *   Would you ever need more than one machine?

So if you need more than 1 machine then chukwa could be the right answer.
I have a tool that I could publish to transform any input file to Chukwa compressed dataSink
file. This could be a first step.
Also hadoop has a JDBC InputReader/Writer so you may want to take a look.

Could you give more info on your data(size and ETL)?

/Jerome.

On 8/24/10 12:39 PM, "hdev ml" <hdevml@gmail.com> wrote:

HI all,

This question is related partly to hadoop and partly to chukwa.

We have huge number of logged information sitting in one machine. I am not sure whether the
storage is in multiple files or in a database.

But what we want to do is get that log information, transform it and store it into the some
database for data mining/ data warehousing/ reporting purposes.

1. Since it is on one machine, is Chukwa the right kind of frame work to do this ETL process?

2. I understand that generally Hadoop works on large files. But assuming that the data sits
in a database, what if we somehow partition data for Hadoop/Chukwa? Is that the right strategy?

Any help will be appreciated.

Thanks,

Harshad


Mime
View raw message