hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Maurer <bmau...@andrew.cmu.edu>
Subject Re: [hive-users] Hive Roadmap (Some information)
Date Mon, 27 Oct 2008 20:07:52 GMT
Have you guys considered translating the syntax tree for queries into Java 
bytecode? Java bytecode is great for this type of process because it's 
extremely high level -- the code generation mostly focuses on type 
checking and name resolution. However, it enables the JIT to perform 
register allocation and other low level optimizations for good 
performance.

-b

On Mon, 27 Oct 2008, Ashish Thusoo wrote:

> Folks,
>
> Here are some of the things that we are working on internally at Facebook. We thought
it would be a good idea to let everyone know what is going on with Hive development. We will
put this up on the wiki as well.
>
> 1. Integrating Dynamic SerDe with the DDL. (Zheng/Pete) - This allows the users to create
typed tables along with list and map types from the DDL
> 2. Support for Statistics. (Ashish) - These stats are needed to make optimization decisions
> 3. Join Optimizations. (Prasad) - Mapside joins, semi join techniques etc to do the join
faster
> 4. Predicate Pushdown Optimizations. (Namit) - pushing predicates just above the table
scan for certain situations in joins as well as ensuring that only required columns are sent
across map/reduce boundaries
> 5. Group By Optimizations. (Joydeep) - various optimizations to make group by faster
> 6. Optimizations to reduce the number of map files created by filter operations. (Dhrubha)
- Filters with a large number of mappers produces a lot of files which slows down the following
operations. This tries to address problems with that.
> 7. Transformations in LOAD. (Joydeep) - LOAD currently does not transform the input data
if it is not in the format expected by the destination table.
> 8. Schemaless map/reduce. (Zheng) - TRANSFORM needs schema while map/reduce is schema
less.
> 9. Improvements to TRANSFORM. (Zheng) - Make this more intuitive to map/reduce developers
- evaluate some other keywords etc..
> 10. Error Reporting Improvements. (Pete) - Make error reporting for parse errors better
> 11. Help on CLI. (Joydeep) - add help to the CLI
> 12. Explode and Collect Operators. (Zheng) - Explode and collect operators to convert
collections to individual items and vice versa.
> 13. Propagating sort properties to destination tables. (Prasad) - If the query produces
sorted we want to capture that in the destination table's metadata so that downstream optimizations
can be enabled.
>
> Other contributions from outside FB ...
> 1. JDBC driver (Michi Mutsuzaki @ stanford.edu, Raghu @ stanford.edu)
> 2. Fixes to CLI driver (Jeremy Huylebroeck)
> 3. Web interface...
>
> Most of these have a JIRA associated. A lot of focus is on running things faster in Hive
considering that we have a good feature set now...
>
> Comments/contributions are welcome. Please go to the JIRA and check out contrib/hive...
>
> Thanks,
> Ashish
> _______________________________________________
> hive-users mailing list
> hive-users@publists.facebook.com
> http://publists.facebook.com/mailman/listinfo/hive-users
>
>

Mime
View raw message