crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Williams,Jacob" <Jacob.Willi...@Cerner.com>
Subject using multiple HBase sources
Date Wed, 05 Dec 2012 16:28:38 GMT
When attempting to join or cogroup collections that originate from two different HBaseSourceTargets,
I get the error below. HBaseSourceTarget does not call CrunchInputs.addInputPath, so when
CrunchInputs.getFormatNodeMap is called, RuntimeParameters.MULTI_INPUTS is null in the configuration.
Any thoughts on how to fix or work around this?

Thanks,
Jacob Williams


Error:

12/12/05 09:56:02 INFO exec.CrunchJob: java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
at com.google.common.base.Splitter.split(Splitter.java:371)
at org.apache.crunch.impl.mr.run.CrunchInputs.getFormatNodeMap(CrunchInputs.java:51)
at org.apache.crunch.impl.mr.run.CrunchInputFormat.getSplits(CrunchInputFormat.java:44)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:331)
at org.apache.crunch.impl.mr.exec.CrunchJob.submit(CrunchJob.java:126)
at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:247)
at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.run(CrunchJobControl.java:275)
at java.lang.Thread.run(Thread.java:662)


Example which produces this error:

import java.io.IOException;

import org.apache.crunch.MapFn;
import org.apache.crunch.PCollection;
import org.apache.crunch.PTable;
import org.apache.crunch.Pair;
import org.apache.crunch.io.hbase.HBaseSourceTarget;
import org.apache.crunch.Pipeline;
import org.apache.crunch.impl.mr.MRPipeline;
import org.apache.crunch.io.hbase.HBaseSourceTarget;
import org.apache.crunch.types.writable.Writables;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

public class Main {
  public static void main(String[] args) throws IOException {
    Configuration config = HBaseConfiguration.create();

    Pipeline pipeline = new MRPipeline(Main.class, "join-problem", config);

    PTable<ImmutableBytesWritable, Result> results1 = pipeline.read(new HBaseSourceTarget("some_table",
new Scan()));
    PTable<ImmutableBytesWritable, Result> results2 = pipeline.read(new HBaseSourceTarget("some_table",
new Scan()));
    pipeline.writeTextFile(results1.join(results2), "join-test");
    pipeline.run();
  }
}

CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation
and are intended only for the addressee. The information contained in this message is confidential
and may constitute inside or non-public information under international, federal, or state
securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such
information is strictly prohibited and may be unlawful. If you are not the addressee, please
promptly delete this message and notify the sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Mime
View raw message