pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3645) Move FileLocalizer.setR() calls to unit tests
Date Sun, 29 Dec 2013 23:01:50 GMT

     [ https://issues.apache.org/jira/browse/PIG-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Cheolsoo Park updated PIG-3645:

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to trunk. I removed FiliLocalizer.getR().

Thank you Rohini for all the help with this jira!

> Move FileLocalizer.setR() calls to unit tests
> ---------------------------------------------
>                 Key: PIG-3645
>                 URL: https://issues.apache.org/jira/browse/PIG-3645
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.13.0
>         Attachments: PIG-3645-1.patch, PIG-3645-2.patch, PIG-3645-3.patch, PIG-3645-4.patch,
> Currently, temporary paths are generated by FileLocalizer using Random.nextInt(). To
provide strong randomness, MapReduceLauncher resets the Random object every time when compiling
physical plan to MR plan:
> {code}
> MRCompiler comp = new MRCompiler(php, pc); 
> comp.randomizeFileLocalizer(); // This in turn calls FileLocalizer.setR(new Random()).
> {code}
> Besides, there are a couple of places calling FileLocalizer.setR() (e.g. MRCompiler)
with some random seed.
> I think-
> # Randomizing Random seed is unnecessary if we switch to UUID.
> # Setting Random objects in code like this is error-prone because it can be easily broken
by having or missing a FileLocalizer.setR() somewhere else. See an example [here|http://search-hadoop.com/m/2nxTzQXfHw1].
> So I propose that we remove all this "randomizing Random seed" code and use UUID instead
in temporary paths.
> For unit tests that compare the results against gold files, we should still allow to
set Random seed through FileLocalizer.setR(). But this method will be annotated as "VisibleForTesting"
to ensure it is not used nowhere else other than in unit tests.
> Regarding the existing gold files, they can be easily regenerated by TestMRCompiler as
> {code}
> FileOutputStream fos = new FileOutputStream(expectedFile + "_new");
> PrintWriter pw = new PrintWriter(fos);
> pw.write(compiledPlan);
> {code}
> I assume there won't be any kind of regressions due to this change. But please let me
know if I am wrong.

This message was sent by Atlassian JIRA

View raw message