hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Reflect MySQL updates into Hive
Date Wed, 26 Dec 2012 14:52:52 GMT
Hello Ibrahim,

           Sorry for the late response. Those replies were for Kshiva. I
saw his question(exactly same as this one) multiple times on Pig mailing
list as well, so just thought of giving some pointers to him on how to use
the list. I should have specified it properly. Apologies for creating the
nuisance.

Coming back to the actual point, yes the flow is fine. Normally people do
it like this. But I was looking for some alternate way, so that we don't
have to go through this long process for the updates. I'll let you know
once I find something useful. But till now I haven't found anything better
than whatever Dean sir has suggested. Please, do let me know if you find
something before me.

Many thanks.


Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Wed, Dec 26, 2012 at 7:24 PM, Ibrahim Yakti <iyakti@souq.com> wrote:

> After more reading, a suggested scenario looks like:
>
> MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
> external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
> Queries ---> Update HBase as needed & Reload in Hive.
>
> What do you think please?
>
>
>
> --
> Ibrahim
>
>
> On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <iyakti@souq.com> wrote:
>
>> Mohammad, I am not sure if the answers & the link were to me or to
>> Kshiva's question.
>>
>> if I have partitioned my data based on status for example, when I run the
>> update query it will add the updated data on a new partition (success or
>> shipped for example) and it will keep the old data (confirmed or paid for
>> example), right?
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <dontariq@gmail.com>wrote:
>>
>>> Also, have a look at this :
>>> http://www.catb.org/~esr/faqs/smart-questions.html
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <dontariq@gmail.com>wrote:
>>>
>>>> Have a look at Beeswax.
>>>>
>>>> BTW, do you have access to Google at your station?Same question on the
>>>> Pig mailing list as well, that too twice.
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <kshivakps@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there any Hive editors and where we can write 100 to 150 Hive
>>>>> scripts I'm believing is not essay  to  do in CLI mode all scripts .
>>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>>>> dean.wampler@thinkbiganalytics.com> wrote:
>>>>>
>>>>>> This is not as hard as it sounds. The hardest part is setting up
the
>>>>>> incremental query against your MySQL database. Then you can write
the
>>>>>> results to new files in the HDFS directory for the table and Hive
will see
>>>>>> them immediately. Yes, even though Hive doesn't support updates,
it doesn't
>>>>>> care how many files are in the directory. The trick is to avoid lots
of
>>>>>> little files.
>>>>>>
>>>>>> As others have suggested, you should consider partitioning the data,
>>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of
data each
>>>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>>>> You'll need to add the partitions to the table as you go, but actually,
you
>>>>>> can add those once a month, for example, for all partitions. Hive
doesn't
>>>>>> care if the partition directories don't exist yet or the directories
are
>>>>>> empty. I also recommend using an external table, which gives you
more
>>>>>> flexibility on directory layout, etc.
>>>>>>
>>>>>> Sqoop might be the easiest tool for importing the data, as it will
>>>>>> even generate a Hive table schema from the original MySQL table.
However,
>>>>>> that feature may not be useful in this case, as you already have
the table.
>>>>>>
>>>>>> I think Oozie is horribly complex to use and overkill for this
>>>>>> purpose. A simple bash script triggered periodically by cron is all
you
>>>>>> need. If you aren't using a partitioned table, you have a single
sqoop
>>>>>> command to run. If you have partitioned data, you'll also need a
hive
>>>>>> statement in the script to create the partition, unless you do those
in
>>>>>> batch once a month, etc., etc.
>>>>>>
>>>>>> Hope this helps,
>>>>>> dean
>>>>>>
>>>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <iyakti@souq.com>wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>>>> run analytical queries and we are using sqoop to import data
into hive, in
>>>>>>> our RDBMS the data updated very frequently and this needs to
be reflected
>>>>>>> to hive. Hive does not support update/delete but there are many
workarounds
>>>>>>> to do this task.
>>>>>>>
>>>>>>> What's in our mind is importing all the tables into hive as is,
then
>>>>>>> we build the required tables for reporting.
>>>>>>>
>>>>>>> My questions are:
>>>>>>>
>>>>>>>    1. What is the best way to reflect MySQL updates into Hive
with
>>>>>>>    minimal resources?
>>>>>>>    2. Is sqoop the right tool to do the ETL?
>>>>>>>    3. Is Hive the right tool to do this kind of queries or we
>>>>>>>    should search for alternatives?
>>>>>>>
>>>>>>> Any hint will be useful, thanks in advanced.
>>>>>>>
>>>>>>> --
>>>>>>> Ibrahim
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Dean Wampler, Ph.D.*
>>>>>> thinkbiganalytics.com
>>>>>> +1-312-339-1330
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message