hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Re: Is it ok to build an entire ETL/ELT data flow using HIVE queries?
Date Tue, 16 Feb 2016 10:40:53 GMT
I'd say that so long as you can achieve a similar quality of engineering as
is possible with other software development domains, then 'yes, it is ok'.

Specifically, our Hive projects are packaged as RPMs, built and released
with Maven, covered by suites of unit tests developed with HiveRunner, and
part of the same Jenkins CI process as other Java based projects.
Decomposing large processes into sensible units is not as easy as with
other frameworks so this may require more thought and care.

More information here:
https://cwiki.apache.org/confluence/display/Hive/Unit+testing+HQL

You have many potential alternatives depending on which languages you are
comfortable using: Pig, Flink, Cascading, Spark, Crunch, Scrunch, Scalding,
etc.

Elliot.

On Tuesday, 16 February 2016, Ramasubramanian <
ramasubramanian.narayanan@gmail.com> wrote:

> Hi,
>
> Is it ok to build an entire ETL/ELT data flow using HIVE queries?
>
> Data is stored in HIVE. We have transactional and reference data. We need
> to build a small warehouse.
>
> Need suggestion on alternatives too.
>
> Regards,
> Rams

Mime
View raw message