hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Roadmap" by JohnSichi
Date Tue, 23 Nov 2010 22:08:53 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Roadmap" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/Roadmap?action=diff&rev1=29&rev2=30

--------------------------------------------------

   * [[https://issues.apache.org/jira/browse/HIVE-1293|Concurrency]]
   * [[https://issues.apache.org/jira/browse/HIVE-1642|Conversion to Map-Join at Runtime]]
   * [[https://issues.apache.org/jira/browse/HIVE-474|Support for Multiple Distincts]]
-  * [[https://issues.apache.org/jira/browse/HIVE-1750|Remove Partition Filtering Conditions]]
+  * [[https://issues.apache.org/jira/browse/HIVE-1750|Remove Partition Filtering Conditions]]

-  
  
  == Current Projects ==
   * [[https://issues.apache.org/jira/browse/HIVE-1721|Bloom Filters]]
   * [[https://issues.apache.org/jira/browse/HIVE-78|Authorization]]
+  * [[https://issues.apache.org/jira/browse/HIVE-842|Authentication]]
   * [[https://issues.apache.org/jira/browse/HIVE-1538|Remove Duplicate Filters]]
+  * [[https://issues.apache.org/jira/browse/HIVE-1644|Use Filter Pushdown for Automatically
Accessing Indexes]]
+  * [[https://issues.apache.org/jira/browse/HIVE-1803|Bitmap Index]]
+  * [[https://issues.apache.org/jira/browse/HIVE-1790|HAVING clause support]]
  
+ == Up For Grabs ==
- == (Old) Features recently done ==
-  * ODBC driver [[Hive/HiveODBC]]
-  * [[http://issues.apache.org/jira/browse/HIVE-870|semijoin]]
-  * [[http://issues.apache.org/jira/browse/HIVE-655|UDTF]]
-  * [[http://issues.apache.org/jira/browse/HIVE-31|Create Table as Select]]
-  * [[http://issues.apache.org/jira/browse/HIVE-931|Using sort and bucketing properties to
optimize queries]]
-  * [[http://issues.apache.org/jira/browse/HIVE-591|UNIQUE JOINS - that support a different
semantics than the outer joins]]
-  * [[http://issues.apache.org/jira/browse/HIVE-1023|TypedBytes for user scripts]]
-  * [[Hive/ViewDev|Views]] for changing table names/columns without breaking existing queries
[big]
-  * [[http://issues.apache.org/jira/browse/HIVE-917|Bucketed Map Join]]
-  * [[http://issues.apache.org/jira/browse/HIVE-74|Combine File Input Format]]
  
- 
- == Features working on now ==
+  * Cross-database queries
+  * View improvements
+  * Column-level statistics
+  * Geavy-duty test infrastructure
+  * Automated code coverage reports
-  * Hive CLI improvement/Error messages:
+  * Hive CLI improvement/Error messages
+  * HiveServer robustness
-   * Compile-time error message: Better error message for keyword, etc. [big]
-   * Execution-time error messages: categorize most popular errors and show easy-to-understand
messages.
   * Debuggability / Resumability:
    * Show users the last portion of the data that caused the task to fail
    * Restart a job with a particular mapper (that failed earlier, for debugging purposes)
+   * Resume at map-reduce job level.
-   * Resume at map-reduce job level. This should also work for databee. [big]
-  * Ease-of-use:
-   * Select without map-reduce [big]
-   * Bucketed Medium/Percentile
-   * GraphViz for graphing operator tree
-   * Multiple-partition inserts [big]
-   * GenericUDTF
-  * Performance
-   * [[http://issues.apache.org/jira/browse/HIVE-1194|Sort Merge Join]]
-  * Hive Freeway
-   * Allow Hive partition locations to be file/files.
-  * [[Hive/HBaseIntegration|HBase integration]]
- 
- == Short-term Features ==
-  * Support for various statistical functions like Median, Standard Deviation, Variance etc.
-  * Data variables (possible followup to views)
-  * Integration with dumbo or map_reduce.py so that python code can be easily embedded in
Hive
- 
- == More long-term Features (yet to be prioritized) ==
-  * Support for Indexes
   * Support for Insert Appends
   * Support for IN, exists and correlated subqueries
   * More native types - Enums, timestamp
-  * Passing schema to scripts through an environment variable
-  * HAVING clause support
-  * Counters for streaming
-  * Error Reporting Improvements.  - Make error reporting for parse errors better
+  * Persistent UDF's
+  * Cost-based Optimization
+  * SQL/OLAP
+  * Storage handler improvements
+  * System views
+  * JDBC/ODBC improvements
+  * mapred -> mapreduce transition
  
- == Others ==
-  * Support for Column Alias
-  * Support for Statistics. - These stats are needed to make optimization decisions
-  * Join Optimizations. - FRJ techniques etc to do the join faster
-  * Transformations in LOAD. - LOAD currently does not transform the input data if it is
not in the format expected by the destination table.
-  * Help on CLI.  - add help to the CLI
-  * Multiple group-by inserts
-   * Generate multiple group-by results by scanning the source table only once
-   * Example:
-    * FROM src
-    * SELECT src.adid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.adid
-    * SELECT src.pageid, COUNT(src.userid), COUNT(DISTINCT src.userid) GROUP BY src.pageid
-  * Let the user register UDF and UDAF
-   * Expose register functions in UDFRegistry and UDAFRegistry
-   * Provide commands in HiveCli to call those register functions
- 

Mime
View raw message