hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
Date Fri, 21 Aug 2009 06:54:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745830#action_12745830
] 

Daniel Dai commented on PIG-890:
--------------------------------

Comments:
1. Can you include unit test?
2. PoissonSampleLoader.java
{noformat}
		try {
			numSplits = Integer.valueOf(pcProps.getProperty(MAPSPLITS_COUNT));
		} catch (NumberFormatException e) {
			numSplits = 1;
		}
{noformat}
We shall throw exception rather than continue.
Same to 
{noformat}
		try {
			float f = (Runtime.getRuntime().maxMemory() * heapPerc) / (float) (FileLocalizer.getSize(fname)
* convFactor);
			baseNumSamples = (long) Math.ceil(1.0 / f);
		} catch (IOException e) {
			baseNumSamples = 1; // default value 
		}
{noformat}
3. Are PoissonSampleLoader.next and PoissonSampleLoader.bindTo the same with RandomSampleLoader?
If so, we shall put them in base class rather than copy
4. For DEFAULT_SAMPLE_RATE, can you provide some other values in the comment, such as confidence
90%, 85%, and also put a link of how to get these magic numbers. I know this is Poisson cdf,
but it is better to have something we can check really quick

> Create a sampler interface and improve the skewed join sampler
> --------------------------------------------------------------
>
>                 Key: PIG-890
>                 URL: https://issues.apache.org/jira/browse/PIG-890
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>         Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a better sampling
interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message