hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-240) Support launching concurrent Pig jobs from one VM
Date Fri, 16 May 2008 15:25:55 GMT

    [ https://issues.apache.org/jira/browse/PIG-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597512#action_12597512

Tom White commented on PIG-240:

The following classes have shared state in (non-final) static fields. (I used FindBugs to
get these, it would be nice if it was run automatically like Hadoop.)

BagFactory has a static SpillableMemoryManager. Since BagFactory is a singleton, SpillableMemoryManager
can just be an instance variable.

MapReduceLauncher has several static fields and associated setters. POMapreduce has a static
instance of MapReduceLauncher. This can be fixed by making HExecutionEngine create a MapReduceLauncher
instance, set its non-static fields and set the instance on POMapreduce.

The static field totalHadoopTimeSpent on MapReduceLauncher cannot be dealt with in this way
since it is used by PigServer to accumulate the time spent on jobs. This can be fixed by keeping
it static (but accessing through a method rather than the field) and using AtomicLong for
thread-safety. Longer term it would be better to have MapReduceLauncher.launchPig return a
result object that PigServer gets the time from.

LogicalPlanBuilder has a static classloader field which is set by PigContext.addJar() and
Main.main(). This field is widely used. It is used by the static method resolveClassName()
on PigContext which is widely used via instantiateFuncFromSpec(). I think the proper approach
is to make the classloader field an instance variable of PigContext, and make the PigContext
available as needed.

PigMapReduce has two static fields: reporter and pigContext. DataBag.reportProgress uses reporter
- DataBags should be constructed with a PigContext so they can get its non-static reporter.
HadoopExecutableManager.configure uses pigContext - but it could just be given a JobConf in
its constructor.

PigInputFormat has a static activeSplit field. Making the RecordReader hold a reference to
the activeSplit would help here, but that might not be a solution for everywhere that uses
the static field (e.g. FileLocalizer.openDFSFile, but that might not matter since it doesn't
work locally anyway). 

> Support launching concurrent Pig jobs from one VM
> -------------------------------------------------
>                 Key: PIG-240
>                 URL: https://issues.apache.org/jira/browse/PIG-240
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Tom White
> For some applications it would be convenient to launch concurrent Pig jobs from a single
VM. This is currently not possible since Pig has static mutable state.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message