hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-170) map-side aggregations does not work properly
Date Fri, 12 Dec 2008 23:20:44 GMT


Namit Jain commented on HIVE-170:

I agree with the flushing optimization - but that can be a follow-up.

Default value of Runtime.getRuntime().maxMemory() - 512m is also OK.

I was using the number of rows as an escape value.

But I dont think invoking garbage collector explicitly is a good idea. 

A third category where developers often mistakenly think they are helping the garbage collector
is the use of System.gc(), which triggers a garbage collection (actually, it merely suggests
that this might be a good time for a garbage collection). Unfortunately, System.gc() triggers
a full collection, which includes tracing all live objects in the heap and sweeping and compacting
the old generation. This can be a lot of work. In general, it is better to let the system
decide when it needs to collect the heap, and whether or not to do a full collection. Most
of the time, a minor collection will do the job. Worse, calls to System.gc() are often deeply
buried where developers may be unaware of their presence, and where they might get triggered
far more often than necessary. If you are concerned that your application might have hidden
calls to System.gc() buried in libraries, you can invoke the JVM with the -XX:+DisableExplicitGC
option to prevent calls to System.gc() and triggering a garbage collection. 

I think it is too expensive to use that.

Instead of asking the user to specify hive hashmap memory, the user can specify that as a
fraction of total task memory. The default will be close to 1 i.e all task memory will be

> map-side aggregations does not work properly
> --------------------------------------------
>                 Key: HIVE-170
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: 170.patch, patch2
> map-side aggregation depends on runtime.freememory() which is not guaranteed to return
the freeable memory - it depends on when the garbage collector is invoked last.
> It might be a good idea to estimate the number of rows that can fit in the hash table
and then flush the hash table based on that

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message