hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Roadmap" by ZhengShao
Date Wed, 23 Dec 2009 23:52:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Roadmap" page has been changed by ZhengShao.
http://wiki.apache.org/hadoop/Hive/Roadmap?action=diff&rev1=20&rev2=21

--------------------------------------------------

  #pragma section-numbers 2
- 
  Before adding to the list below, please check [[https://issues.apache.org/jira/browse/HADOOP/component/12312455|JIRA]]
to see if a ticket has already been opened for the feature. If not, please open a ticket on
the [[http://issues.apache.org/jira/browse/HADOOP|Hadoop JIRA]] and select "contrib/hive"
as the component and also update the following list.
  
+ = Features to be added =
+ == Features recently done ==
+  * ODBC driver [[Hive/HiveODBC]]
+  * [[http://issues.apache.org/jira/browse/HIVE-870|semijoin]]
+  * [[http://issues.apache.org/jira/browse/HIVE-655|UDTF]]
+  * [[http://issues.apache.org/jira/browse/HIVE-31|Support for Create Table as Select (is
available on trunk and versions later than 0.4.0)]]
  
- = Features to be added =
- 
- == Features actively being worked on ==
-   * ODBC driver
-   * semijoin
-   * UDTF
-   * [[Hive/ViewDev|Views]] in Hive so that data flows can be composed
- 
+ == Features working on now ==
+  * Hive CLI improvement/Error messages:
+   * Compile-time error message: Better error message for keyword, etc. [big]
+   * Execution-time error messages: categorize most popular errors and show easy-to-understand
messages.
+  * Debuggability / Resumability:
+   * Show users the last portion of the data that caused the task to fail
+   * Restart a job with a particular mapper (that failed earlier, for debugging purposes)
+   * Resume at map-reduce job level. This should also work for databee. [big]
+  * Ease-of-use:
+   * Select without map-reduce [big]
+   * Bucketed Medium/Percentile
+   * GraphViz for graphing operator tree
+   * Multiple-partition inserts [big]
+   * [[Hive/ViewDev|Views]] for changing table names/columns without breaking existing queries
[big]
+   * GenericUDTF
+  * Performance
+   * TypedBytes for user scripts
+  * Hive Freeway
+   * Allow Hive partition locations to be file/files.
  
  == Short-term Features ==
-   * Support for various statistical functions like Median, Standard Deviation, Variance
etc.
+  * Support for various statistical functions like Median, Standard Deviation, Variance etc.
-   * Data variables (possible followup to views)
+  * Data variables (possible followup to views)
-   * Integration with dumbo or map_reduce.py so that python code can be easily embedded in
Hive
+  * Integration with dumbo or map_reduce.py so that python code can be easily embedded in
Hive
  
- == More long term Features(yet to be prioritized) ==
+ == More long-term Features (yet to be prioritized) ==
-   * Support for Create Table as Select (is available on trunk and versions later than 0.4.0)
-   * Support for Inserts without listing the partitioning columns explicitly - the query
should be able to derive that
-   * Support for Indexes
+  * Support for Indexes
-   * UNIQUE JOINS - that support a different semantics than the outer joins
+  * UNIQUE JOINS - that support a different semantics than the outer joins
-   * Support for Insert Appends
+  * Support for Insert Appends
-   * Using sort and bucketing properties to optimize queries
+  * Using sort and bucketing properties to optimize queries
-   * Support for IN, exists and correlated subqueries
+  * Support for IN, exists and correlated subqueries
-   * More native types - Enums, timestamp
+  * More native types - Enums, timestamp
-   * Passing schema to scripts through an environment variable
+  * Passing schema to scripts through an environment variable
-   * HAVING clause support
+  * HAVING clause support
-   * Counters for streaming
+  * Counters for streaming
-   * Error Reporting Improvements.  - Make error reporting for parse errors better
+  * Error Reporting Improvements.  - Make error reporting for parse errors better
  
  == Others ==
-   * Support for Column Alias
+  * Support for Column Alias
-   * Support for Statistics. - These stats are needed to make optimization decisions
+  * Support for Statistics. - These stats are needed to make optimization decisions
-   * Join Optimizations. - FRJ techniques etc to do the join faster
+  * Join Optimizations. - FRJ techniques etc to do the join faster
-   * Transformations in LOAD. - LOAD currently does not transform the input data if it is
not in the format expected by the destination table.
+  * Transformations in LOAD. - LOAD currently does not transform the input data if it is
not in the format expected by the destination table.
-   * Help on CLI.  - add help to the CLI
+  * Help on CLI.  - add help to the CLI
-   * Explode and Collect Operators. - Explode and collect operators to convert collections
to individual items and vice versa.
-   * Multiple group-by inserts
+  * Multiple group-by inserts
-     * Generate multiple group-by results by scanning the source table only once
+   * Generate multiple group-by results by scanning the source table only once
-     * Example:
+   * Example:
-       * FROM src
+    * FROM src
-       * SELECT src.adid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.adid
+    * SELECT src.adid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.adid
-       * SELECT src.pageid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.pageid
+    * SELECT src.pageid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.pageid
-   * Let the user register UDF and UDAF
+  * Let the user register UDF and UDAF
-     * Expose register functions in UDFRegistry and UDAFRegistry
+   * Expose register functions in UDFRegistry and UDAFRegistry
-     * Provide commands in HiveCli to call those register functions
+   * Provide commands in HiveCli to call those register functions
  

Mime
View raw message