hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shashwat shriparv <dwivedishash...@gmail.com>
Subject Re: Confusing questions ! Hadoop Beginner
Date Thu, 10 May 2012 15:35:39 GMT
Use hbase to store the data map the tables from hbase to hive to do
mapreduce sql queries. the data will be stored at the hdfs only for hbase
also and for hive also.

Two option : Write map reduce jobs to fetch data from hbase.

or map the tables of hbase to hive(Create externals tables where it will
provide you many built in functions like join, avg and many more) to get
the data from hbase. any how hive also runs mapreduce internally to fetch
the data.

And also depends on what and how much data you need to store

Regards

∞
Shashwat Shriparv

On Thu, May 10, 2012 at 8:03 PM, yavuz gokirmak <ygokirmak@gmail.com> wrote:

> Hi,
>
> I am not an expert but can give some ideas. (Correct me if I am wrong
> please :) )
>
> Regardless of whether you use hbase or hive, data is stored HDFS at the end
> of the day.
>
> What hive provides is an sql interface over raw data. When you load data to
> hive; you define its fields, columns and parsing strategy etc.. Your data
> is stored as is in hdfs but hive maintains meta-data tables about this raw
> data. So you can write sql queries over log data and hive brings results.
> But use hive for dwh-like operations. When you want to transform data in a
> different format or get analysis report about data. As I know, Hive is not
> suitable for real-time queries..
>
> Hbase is a columnar database using HDFS as underlying filesystem. It stores
> data in its own format. You have to use hbase api when you want to insert a
> row to database. It does not provide an sql interface. Hbase is suitable if
> you want real-time insert/select. In your case, you can insert weblogs to
> hbase in realtime. And than you can query a users clicks over hbase. Hbase
> returns all clicks with timestamp...
>
> Regarding clickstream analysis what I prefer is to write a couple of
> mapreduce jobs to analyze log data and fill a datamode in a relational
> database for further analysis and queries. You can execute mapreduce jobs
> periodically... Once you decide on a "good" data model, generating further
> reports will be easy...
>
>
> king regards..
>
>
> I have very basic question regarding Hbase,HDFS and Hive,
> >
> > If Hbase, HDFS and Hive can be used to store a log data what is best to
> > store data means should we store it on HDFS or Hbase or Hive. And are
> there
> > any benefits associated with it.
> >
>
> W
>
>
>
>
> > Also, Do we need to follow any design principal in terms of data
> modelling
> > as I could not find anything on this subject.
> >
> > I am trying to learn Hadoop by implementing clickstream analysis use
> case.
> >
> > Thanks,
> > Kuldeep
> >
> >
>



-- 


∞
Shashwat Shriparv

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message