pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4168) Initial implementation of unit tests for Pig on Spark
Date Wed, 24 Sep 2014 02:05:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145775#comment-14145775
] 

liyunzhang_intel commented on PIG-4168:
---------------------------------------

Hi [~rohini]
very thanks for your comment!
I have updated the patch(PIG-4168_1.patch) and modified somewhere you pointed out:
1.*Previous*: New ExecTypes are pluggable using ServiceLoader. Please do not add them to ExecType
class. 
   *In this patch*:
{code}
TestSpark#setUp
      public void setUp() throws Exception {
		pigServer = new PigServer(new SparkExecType(), cluster.getProperties());
    }
{code}
2.*Previous*:
{code:xml}<copy file="${basedir}/test/core-site.xml" tofile="${test.build.classes}/core-site.xml"/>{code}
Why do you have to create an empty core-site.xml and copy to build dir?
   *In this patch*:   I create class SparkMiniCluster, now it generates build/classes/hadoop-site.xml
by code and not generate core-site.xml by build.xml. This file is needed because of the check
in HExecutionEngine#getExecConf.
{code}
SparkMiniCluster#setupMiniDfsAndMrClusters
  private static final File CONF_DIR = new File("build/classes");
  private static final File CONF_FILE = new File(CONF_DIR, "hadoop-site.xml");

 @Override
    protected void setupMiniDfsAndMrClusters() {
        try {
            CONF_DIR.mkdirs();
            if (CONF_FILE.exists()) {
                CONF_FILE.delete();
            }
            m_conf = new Configuration();
            m_conf.set("io.sort.mb", "1");
            m_conf.writeXml(new FileOutputStream(CONF_FILE));
        } catch (IOException e) {
            throw new RuntimeException(e);

        }
    }
{code}
{code}
HExecutionEngine#getExecConf
public JobConf getExecConf(Properties properties) throws ExecException {
        JobConf jc = null;
        // Check existence of user provided configs
        String isHadoopConfigsOverriden = properties.getProperty("pig.use.overriden.hadoop.configs");
        if (isHadoopConfigsOverriden != null && isHadoopConfigsOverriden.equals("true"))
{
            jc = new JobConf(ConfigurationUtil.toConfiguration(properties));
        } else {
            // Check existence of hadoop-site.xml or core-site.xml in
            // classpath if user provided confs are not being used
            Configuration testConf = new Configuration();
            ClassLoader cl = testConf.getClassLoader();
            URL hadoop_site = cl.getResource(HADOOP_SITE);
            URL core_site = cl.getResource(CORE_SITE);

            if (hadoop_site == null && core_site == null) {
                throw new ExecException(
                        "Cannot find hadoop configurations in classpath "
                                + "(neither hadoop-site.xml nor core-site.xml was found in
the classpath)."
                                + " If you plan to use local mode, please put -x local option
in command line",
                                4010);
            }
            jc = new JobConf();
        }
        jc.addResource("pig-cluster-hadoop-site.xml");
        jc.addResource(YARN_SITE);
        return jc;
    }
{code}

> Initial implementation of unit tests for Pig on Spark
> -----------------------------------------------------
>
>                 Key: PIG-4168
>                 URL: https://issues.apache.org/jira/browse/PIG-4168
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Praveen Rachabattuni
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4168.patch
>
>
> 1.ant clean jar;  pig-0.14.0-SNAPSHOT-core-h1.jar will be generated by the command
> 2.export SPARK_PIG_JAR=$PIG_HOME/pig-0.14.0-SNAPSHOT-core-h1.jar 
> 3.build hadoop1 and spark env.spark run in local mode
>   jps:
> 	11647 Master #spark master runs
> 	6457 DataNode #hadoop datanode runs
> 	22374 Jps
> 	11705 Worker# spark worker runs
> 	27009 JobTracker #hadoop jobtracker runs
> 	26602 NameNode  #hadoop namenode runs
> 	29486 org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar
> 	19692 Main
>  
> 4.ant test-spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message