beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davor Bonaci (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1859) sorter extension depends on hadoop but does not declare as such in repository artifact
Date Mon, 03 Apr 2017 04:46:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952988#comment-15952988
] 

Davor Bonaci commented on BEAM-1859:
------------------------------------

"beam-sdks-java-extensions-sorter" depends on the Hadoop codebase in the "provided" scope:

{code}
  <properties>
    <hadoop.version>2.7.1</hadoop.version>
  </properties>
[...]
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>${hadoop.version}</version>
      <scope>provided</scope>
    </dependency>
    
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
      <scope>provided</scope>
    </dependency>
{code}

It is intended for the caller to provide these dependencies -- manually providing the dependency,
as [~jbonofre] suggested, should solve the problem.

Many Hadoop dependencies are marked as "provided" because they tend to be available on a Hadoop
cluster by default -- it is a recommended practice in the Hadoop ecosystem not to include
such dependencies with user code to avoid conflicts. On the other hand, when running locally
with a Direct Runner, such dependencies tend not to be available, causing the issue you just
saw.

Therefore, this specific issue in the sorter extension is "Working as Intended". Separately,
it is debatable should the direct runner try to mimic a real cluster more reliably -- the
answer is not clear to me yet, and I think this is worth a dev@ discussion.

> sorter extension depends on hadoop but does not declare as such in repository artifact
> --------------------------------------------------------------------------------------
>
>                 Key: BEAM-1859
>                 URL: https://issues.apache.org/jira/browse/BEAM-1859
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>    Affects Versions: 0.6.0
>            Reporter: Wesley Tanaka
>            Assignee: Davor Bonaci
>
> When SortValues is used via {{org.apache.beam:beam-sdks-java-extensions-sorter:0.6.0}},
this exception is raised:
> {noformat}
> Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> 	at org.apache.beam.sdk.extensions.sorter.BufferedExternalSorter.create(BufferedExternalSorter.java:98)
> 	at org.apache.beam.sdk.extensions.sorter.SortValues$SortValuesDoFn.processElement(SortValues.java:153)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.apache.beam.sdk.extensions.sorter.BufferedExternalSorter.create(BufferedExternalSorter.java:98)
> 	at org.apache.beam.sdk.extensions.sorter.SortValues$SortValuesDoFn.processElement(SortValues.java:153)
> 	at org.apache.beam.sdk.extensions.sorter.SortValues$SortValuesDoFn$auxiliary$uK25yOmK.invokeProcessElement(Unknown
Source)
> 	at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:198)
> 	at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:159)
> 	at org.apache.beam.runners.core.PushbackSideInputDoFnRunner.processElement(PushbackSideInputDoFnRunner.java:111)
> 	at org.apache.beam.runners.core.PushbackSideInputDoFnRunner.processElementInReadyWindows(PushbackSideInputDoFnRunner.java:77)
> 	at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:134)
> 	at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:51)
> 	at org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
> 	at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I think the issue is that beam-sdks-java-extensions-sorter should declare that it depends
on that hadoop library but does not?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message