spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: Calling external classes added by sc.addJar needs to be through reflection
Date Sun, 18 May 2014 16:46:29 GMT
Btw, I tried

rdd.map { i =>
  System.getProperty("java.class.path")
}.collect()

but didn't see the jars added via "--jars" on the executor classpath.

-Xiangrui

On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The
> reflection approach mentioned by DB didn't work either. I checked the
> distributed cache on a worker node and found the jar there. It is also
> in the Environment tab of the WebUI. The workaround is making an
> assembly jar.
>
> DB, could you create a JIRA and describe what you have found so far? Thanks!
>
> Best,
> Xiangrui
>
> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mridul@gmail.com> wrote:
>> Can you try moving your mapPartitions to another class/object which is
>> referenced only after sc.addJar ?
>>
>> I would suspect CNFEx is coming while loading the class containing
>> mapPartitions before addJars is executed.
>>
>> In general though, dynamic loading of classes means you use reflection to
>> instantiate it since expectation is you don't know which implementation
>> provides the interface ... If you statically know it apriori, you bundle it
>> in your classpath.
>>
>> Regards
>> Mridul
>> On 17-May-2014 7:28 am, "DB Tsai" <dbtsai@stanford.edu> wrote:
>>
>>> Finally find a way out of the ClassLoader maze! It took me some times to
>>> understand how it works; I think it worths to document it in a separated
>>> thread.
>>>
>>> We're trying to add external utility.jar which contains CSVRecordParser,
>>> and we added the jar to executors through sc.addJar APIs.
>>>
>>> If the instance of CSVRecordParser is created without reflection, it
>>> raises *ClassNotFound
>>> Exception*.
>>>
>>> data.mapPartitions(lines => {
>>>     val csvParser = new CSVRecordParser((delimiter.charAt(0))
>>>     lines.foreach(line => {
>>>       val lineElems = csvParser.parseLine(line)
>>>     })
>>>     ...
>>>     ...
>>>  )
>>>
>>>
>>> If the instance of CSVRecordParser is created through reflection, it works.
>>>
>>> data.mapPartitions(lines => {
>>>     val loader = Thread.currentThread.getContextClassLoader
>>>     val CSVRecordParser =
>>>         loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")
>>>
>>>     val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
>>>         .newInstance(delimiter.charAt(0).asInstanceOf[Character])
>>>
>>>     val parseLine = CSVRecordParser
>>>         .getDeclaredMethod("parseLine", classOf[String])
>>>
>>>     lines.foreach(line => {
>>>        val lineElems = parseLine.invoke(csvParser,
>>> line).asInstanceOf[Array[String]]
>>>     })
>>>     ...
>>>     ...
>>>  )
>>>
>>>
>>> This is identical to this question,
>>>
>>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection
>>>
>>> It's not intuitive for users to load external classes through reflection,
>>> but couple available solutions including 1) messing around
>>> systemClassLoader by calling systemClassLoader.addURI through reflection or
>>> 2) forking another JVM to add jars into classpath before bootstrap loader
>>> are very tricky.
>>>
>>> Any thought on fixing it properly?
>>>
>>> @Xiangrui,
>>> netlib-java jniloader is loaded from netlib-java through reflection, so
>>> this problem will not be seen.
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> -------------------------------------------------------
>>> My Blog: https://www.dbtsai.com
>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>

Mime
View raw message