hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venky Iyer (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-449) Automatic memoization of intermediate data tables
Date Mon, 27 Apr 2009 01:08:30 GMT
Automatic memoization of intermediate data tables

                 Key: HIVE-449
                 URL: https://issues.apache.org/jira/browse/HIVE-449
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Venky Iyer

Processing data with Hive encourages you to specify your data transformation in the form of
fairly complex nested joins/cluster bys/group bys etc, supplementing functionality with custom
transforms where necessary. This however has the disadvantage that it's hard to inspect the
output of intermediate phases; it's also an inconvenience when your custom TRANSFORM script
at the end of a long chain of mapreduce jobs fails with syntax errors/bugs -- because now
you need to run all the previous steps before you can check if you fixed the bugs in the custom
script. This can be alleviated by providing functionality to capture specific steps in intermediate
tables automatically,  allowing me to be expressive in HiveQL without having to bookkeep all
the intermediate tables. 

You may need a way to name queries and phases, so that you have a way of identifying which
intermediate tables belong to which queries' phases.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message