hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Performance for hive external to hbase with serval terabyte or more data
Date Thu, 12 May 2016 05:52:25 GMT
Why don't you export the data from hbase to  hive, eg in Orc format. You should not use mr
with Hive, but Tez. Also use a recent hive version (at least 1.2). You can then do queries
there. For large log file processing in real time, one alternative depending on your needs
could be Solr on Hadoop.

> On 12 May 2016, at 03:06, Yi Jiang <> wrote:
> Hi, Guys
> Recently we are debating the usage for hbase as our destination for data pipeline job.
> Basically, we want to save our logs into hbase, and our pipeline can generate 2-4 terabytes
data everyday, but our IT department think it is not good idea to scan so hbase, it will cause
the performance and memory issue. And they ask our just keep 15 minutes data amount in the
hbase for real time analysis.
> For now, I am using hive to external to hbase, but what I am thinking that for map reduce
job, what kind of mapper it is using to scan the data from hbase? Is it TableInputFormatBase?
and how many mapper it will use in hive to scan the hbase. Is it efficient or not? Will it
cause the performance issue if we have couple T’s or more larger data amount?
> I am also trying to index some columns that we might use to query. But  I am not sure
if it is good idea to keep so much history data in the hbase for query.
> Thank you
> Jacky

View raw message