hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: Can I configure multiple M/Rs and normal processes to one workflow?
Date Wed, 04 Feb 2015 22:21:16 GMT
I see this frequently as long runninng output phase to relational db's. So
your experience is reasonable. Sometimes it is possible to partition the
mysequel table, but if you need agreggates over the whole, you are sort of
stuck.

(Good luck,  may your business case never require you to run a single long
query ;{)



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 4, 2015 at 2:15 PM, 임정택 <kabhwan@gmail.com> wrote:

> Yes, it takes more than 10 hours in per CP.
> And we don't have enough resource to run all regions concurrently, it
> needs about one day to complete.
> On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <daemeonr@gmail.com>
> wrote:
>
>> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
>> rather long running?
>>
>>
>>
>> *.......*
>>
>>
>>
>>
>>
>>
>> *“Life should not be a journey to the grave with the intention of
>> arriving safely in apretty and well preserved body, but rather to skid in
>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <kabhwan@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>

Mime
View raw message