hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
Date Sun, 20 Sep 2009 07:38:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757715#action_12757715
] 

Joydeep Sen Sarma commented on HIVE-224:
----------------------------------------

no - i guess we didn't - although it's an easy one.. fallout of reading the SOSP paper?

ridiculous - they are reporting 'accumator partial-hash' as something new (never reported
in literature) when reference #1 in their paper implements exactly that. so much for research.


> implement lfu based flushing policy for map side aggregates
> -----------------------------------------------------------
>
>                 Key: HIVE-224
>                 URL: https://issues.apache.org/jira/browse/HIVE-224
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>
> currently we flush some random set of rows when the map side hash table approaches memory
limits.
> we have discussed a strategy of flushing hash table entries that have the been seen the
least number of times (effectively LFU flushing strategy). This will be very effective at
reducing the amount of data sent from map to reduce step - as well as reduce the chances for
any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message