hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <>
Subject [jira] Commented: (HIVE-449) Automatic memoization of intermediate data tables
Date Mon, 27 Apr 2009 03:56:30 GMT


Jeff Hammerbacher commented on HIVE-449:
---------------------------------------- would be another, potentially more elegant,
approach to this problem.

> Automatic memoization of intermediate data tables
> -------------------------------------------------
>                 Key: HIVE-449
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Venky Iyer
> Processing data with Hive encourages you to specify your data transformation in the form
of fairly complex nested joins/cluster bys/group bys etc, supplementing functionality with
custom transforms where necessary. This however has the disadvantage that it's hard to inspect
the output of intermediate phases; it's also an inconvenience when your custom TRANSFORM script
at the end of a long chain of mapreduce jobs fails with syntax errors/bugs -- because now
you need to run all the previous steps before you can check if you fixed the bugs in the custom
script. This can be alleviated by providing functionality to capture specific steps in intermediate
tables automatically,  allowing me to be expressive in HiveQL without having to bookkeep all
the intermediate tables. 
> You may need a way to name queries and phases, so that you have a way of identifying
which intermediate tables belong to which queries' phases.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message