hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkatesh b <venkateshmailingl...@gmail.com>
Subject Is it worth storing in ORC for one time read. And can replace hive with HBase?
Date Thu, 06 Aug 2015 19:56:45 GMT
Hi, here I got two things to know.

Columns size In hive tables
Size of each record is normal only(around 20 columns containing, int type
columns and string columns with length 50 chars, not very long columns are
present).

FIRST:
In our project we use hive.
We daily get new data. We need to process this new data only once. And send
this processed data to RDBMS. Here in processing we majorly use many
complex queries with joins, with where condition and grouping functions.
There are many intermediate tables generated around 50 tables while
processing. Till now we use text format as storage. We came across ORC file
format. I would like to know that since it is one Time querying the table
is it worth of storing as ORC format. We scan full table for processing.

SECOND:
I came to know about HBase, which is faster.
Can I replace hive with HBase for processing of data daily faster.
Currently it is taking 15hrs daily with hive.

We have two use cases one is 5 to 10 million records(new records) perday
processing
other case is 2 billion records(new records).


Please inform me if any other information is needed.

Thanks & regards
Venkatesh

Mime
View raw message