pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (PIG-5357) BagFactory interface should support creating a distinct bag from a set
Date Mon, 08 Oct 2018 22:27:00 GMT

     [ https://issues.apache.org/jira/browse/PIG-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohini Palaniswamy reassigned PIG-5357:
---------------------------------------

         Assignee: Jacob Tolar
     Hadoop Flags: Reviewed
    Fix Version/s: 0.18.0

{quote}All of the internal code now uses InternalDistinctBag instead of DistinctDataBag.
{quote}
Difference is that InternalDistinctBag proactively spills based on memory usage and caching
limit configured. It also spills when spill() is called if read is not already started. 
DistinctDataBag does not have proactive spilling, but takes care of spilling even if it is
in the middle of a read when spill() is called. So it is fine to still use it.

 

+1. Committed to trunk. Thanks [~jtolar] for this enhancement.

> BagFactory interface should support creating a distinct bag from a set
> ----------------------------------------------------------------------
>
>                 Key: PIG-5357
>                 URL: https://issues.apache.org/jira/browse/PIG-5357
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jacob Tolar
>            Assignee: Jacob Tolar
>            Priority: Minor
>             Fix For: 0.18.0
>
>         Attachments: PIG-5357-1.patch, PIG-5357-2.patch
>
>
> It would be nice if BagFactory supported creating a distinct bag from a set of tuples,
similar to:
> {code:java}
> newDefaultBag(List<Tuple> listOfTuples);
> {code}
> [https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/BagFactory.java]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message