hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From viva v <vivamail...@gmail.com>
Subject Re: HBase, Hive, Hive over HBase or Pig over HBase
Date Thu, 27 Oct 2011 18:59:10 GMT
Thanks Amandeep.

The read/writes are random.

Random writes - Plan is to use "crowdsourcing" as the data source for these
updates so writes (primarily "updates" of values of a particular rowkey)
will be random.
Random Reads - Primarily for adhoc querying only.

Yes i could periodically perform the updates.

Did you happen to have some suggestions on the right approach to use.

On the data size, 30 million is the start size, it would be about 1 million
per week (conservative estimate)

-Vivek



On Thu, Oct 27, 2011 at 1:56 AM, Amandeep Khurana <amansk@gmail.com> wrote:

> Vivek,
>
> Can you elaborate on 4? Storing data in HDFS directly does not give you the
> option of updating it. However, that's not a good enough reason to use
> HBase. Do you need random reads/writes outside of just the selective
> increments? Can you store the increments in a separate file and then do a
> resolution in the final results and periodically collapse all the updates
> and make a new base table?
>
> Hive over HBase is not yet ready. Pig - HBase integration is relatively
> more
> mature.
>
> Also, like Doug said, 30m records can be handled by an RDBMS. Does that not
> solve your purpose? What are the challenges you faced, if any?
>
> -Amandeep
>
>
> On Wed, Oct 26, 2011 at 12:31 PM, viva v <vivamailers@gmail.com> wrote:
>
> > Hi,
> >
> > I am working on a use case that has the following characteristics.
> > 1) Data volume is in the order 30 million records
> > 2) Data schema is known & is fixed (for the application we are building)
> > 3) Data is NOT multi format. A single key will have integer data for
> > different aspects of that key
> > 4) Data will be incrementally updated (some column values will be updated
> > at
> > different points of time)
> > 5) There is a need to support adhoc (queries are not known ahead of time)
> > querying of data (without writing map reduce jobs)
> > 6) Queries are likely to have a lot of joins & aggregations
> >
> > Could you please help me with suggestions on whether i should use
> > 1) Hive
> > 2) HBase
> > 3) Hive over HBase
> > 4) Pig over HBase
> >
> > Thanks
> > Vivek
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message