hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Roadmap" by AshishThusoo
Date Wed, 08 Jul 2009 03:03:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AshishThusoo:
http://wiki.apache.org/hadoop/Hive/Roadmap

The comment on the change is:
Incorpotating the new features that are being built

------------------------------------------------------------------------------
  Before adding to the list below, please check [https://issues.apache.org/jira/browse/HADOOP/component/12312455
JIRA] to see if a ticket has already been opened for the feature. If not, please open a ticket
on the [http://issues.apache.org/jira/browse/HADOOP Hadoop JIRA] and select "contrib/hive"
as the component and also update the following list.
  
  
- = Roadmap/call to add more features =
- The following is the list of useful features that are on the Hive Roadmap:
-   * HAVING clause support
+ = Features to be added =
+ == Features actively being worked on ==
+   * ODBC driver
+ 
+ == Short term Features == 
    * Support for various statistical functions like Median, Standard Deviation, Variance
etc.
+   * Views and data variables in Hive so that data flows can be composed
+   * Integration with dumbo or map_reduce.py so that python code can be easily embedded in
Hive
+ 
+ == More long term Features(yet to be prioritized) ==
    * Support for Create Table as Select
-   * Support for views
-   * Support for Insert Appends
    * Support for Inserts without listing the partitioning columns explicitly - the query
should be able to derive that
    * Support for Indexes
-   * Support for IN
+   * UNIQUE JOINS - that support a different semantics than the outer joins
+   * Support for Insert Appends
+   * Using sort and bucketing properties to optimize queries
+   * Support for IN, exists and correlated subqueries
+   * More native types - Enums, timestamp
+   * Passing schema to scripts through an environment variable
+   * HAVING clause support
+   * Counters for streaming
+   * Error Reporting Improvements.  - Make error reporting for parse errors better
+ 
+ == Others ==
    * Support for Column Alias
    * Support for Statistics. - These stats are needed to make optimization decisions
-   * Join Optimizations. - Mapside joins, semi join techniques etc to do the join faster
+   * Join Optimizations. - Semi join, FRJ techniques etc to do the join faster
-   * Optimizations to reduce the number of map files created by filter operations.
    * Transformations in LOAD. - LOAD currently does not transform the input data if it is
not in the format expected by the destination table.
-   * Schemaless map/reduce.  - TRANSFORM needs schema while map/reduce is schema less.
-   * Improvements to TRANSFORM.  - Make this more intuitive to map/reduce developers - evaluate
some other keywords etc..
-   * Error Reporting Improvements.  - Make error reporting for parse errors better
    * Help on CLI.  - add help to the CLI
    * Explode and Collect Operators. - Explode and collect operators to convert collections
to individual items and vice versa.
-   * Propagating sort properties to destination tables. - If the query produces sorted we
want to capture that in the destination table's metadata so that downstream optimizations
can be enabled. 
-   * Propagating bucketing properties to destination tables.
    * Multiple group-by inserts
      * Generate multiple group-by results by scanning the source table only once
      * Example:
        * FROM src
        * SELECT src.adid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.adid
        * SELECT src.pageid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.pageid
-   * SerDe refactoring, and DynamicSerDe
-     * Refactor SerDe library to make sure we can serialize/deserialize and let UDF handle
complex objects.
-     * We will be able to write a Hive Query to write data into a table that uses thrift
serialization.
    * Let the user register UDF and UDAF
      * Expose register functions in UDFRegistry and UDAFRegistry
      * Provide commands in HiveCli to call those register functions
-   * ODBC/JDBC driver
  

Mime
View raw message