hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 2.2.3
Date Thu, 31 Mar 2011 20:23:06 GMT


Ning Zhang commented on HIVE-2084:

@Namit, yeah, 2.2.3 support filter push down for non-equality. Even the older version of 2.0.3
supposes it too. Mac's patch actually supports range queries, but since range queries could
be complicated on multiple partition columns (what if the range is on the column that is not
the top partition column), I didn't dig deep into it, but it the push down filtering criteria
can certainly be relaxed. 

Having said that, my test results shows that JDO filter pushing down may not be the dominate
factor (comparing to the patch in HIVE-2050). In the experiments I've done for HIVE-2050,
listing partition names and filtering partitions in the Hive client side may take 10 sec,
but retrieving all Partition objects takes about 10 mins in total. The best of pushing down
JDO filtering can only reduce the 10 sec to 0, but the 10 mins overhead is still there. We
need to find a way to optimize that away.

> Upgrade datanucleus from 2.0.3 to 2.2.3
> ---------------------------------------
>                 Key: HIVE-2084
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-2084.patch
> It seems the datanucleus 2.2.3 does a better join in caching. The time it takes to get
the same set of partition objects takes about 1/4 of the time it took for the first time.
While with 2.0.3, it took almost the same amount of time in the second execution. We should
retest the test case mentioned in HIVE-1853, HIVE-1862.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message