hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject Lighter Map/Reduce on HBase
Date Wed, 09 Apr 2014 09:55:32 GMT
We operate a solution that stores large amounts of data in HBASE that needs
to be available for online access.

For efficient scanning, there are three pieces of data encoded in row keys
(in particular a time dimension) and for other reasons some columns hold
JSON encoded data.

Currently, analytics data is created in two ways:

a) a non-trivial M/R job that computes pre-aggregated data sets and
offloads them into an analytical data base for interactive reporting
b) other M/R jobs that create specialize reports (heuristics) that cannot
be computed from pre-aggregated data

In particular for b) but possibly also for variations of a) I would like to
find more "user friendly" ways than Java implemented M/R jobs - at least
for some cases.

So this is not about interactive querying of data directly from HBase
tables. It is rather about pre-processing HBase stored (large) data sets
into either input to interactive query engines (some other DB, Phoenix,...)
or into some other specialized format.

I spent some time with HIVE but found that the HBase integration simply
doesn't cut it (parsing a row key, mapping JSON column content). I know
there is some more out there, but before spending an eternity trying out
various methods, I am shamelessly trying to benefit from your expertise by
asking for some good pointers.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message