spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Calling external classes added by sc.addJar needs to be through reflection
Date Sun, 18 May 2014 18:54:37 GMT
@xiangrui - we don't expect these to be present on the system
classpath, because they get dynamically added by Spark (e.g. your
application can call sc.addJar well after the JVM's have started).

@db - I'm pretty surprised to see that behavior. It's definitely not
intended that users need reflection to instantiate their classes -
something odd is going on in your case. If you could create an
isolated example and post it to the JIRA, that would be great.

On Sun, May 18, 2014 at 9:58 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870
>
> DB, could you add more info to that JIRA? Thanks!
>
> -Xiangrui
>
> On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
>> Btw, I tried
>>
>> rdd.map { i =>
>>   System.getProperty("java.class.path")
>> }.collect()
>>
>> but didn't see the jars added via "--jars" on the executor classpath.
>>
>> -Xiangrui
>>
>> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The
>>> reflection approach mentioned by DB didn't work either. I checked the
>>> distributed cache on a worker node and found the jar there. It is also
>>> in the Environment tab of the WebUI. The workaround is making an
>>> assembly jar.
>>>
>>> DB, could you create a JIRA and describe what you have found so far? Thanks!
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mridul@gmail.com>
wrote:
>>>> Can you try moving your mapPartitions to another class/object which is
>>>> referenced only after sc.addJar ?
>>>>
>>>> I would suspect CNFEx is coming while loading the class containing
>>>> mapPartitions before addJars is executed.
>>>>
>>>> In general though, dynamic loading of classes means you use reflection to
>>>> instantiate it since expectation is you don't know which implementation
>>>> provides the interface ... If you statically know it apriori, you bundle
it
>>>> in your classpath.
>>>>
>>>> Regards
>>>> Mridul
>>>> On 17-May-2014 7:28 am, "DB Tsai" <dbtsai@stanford.edu> wrote:
>>>>
>>>>> Finally find a way out of the ClassLoader maze! It took me some times
to
>>>>> understand how it works; I think it worths to document it in a separated
>>>>> thread.
>>>>>
>>>>> We're trying to add external utility.jar which contains CSVRecordParser,
>>>>> and we added the jar to executors through sc.addJar APIs.
>>>>>
>>>>> If the instance of CSVRecordParser is created without reflection, it
>>>>> raises *ClassNotFound
>>>>> Exception*.
>>>>>
>>>>> data.mapPartitions(lines => {
>>>>>     val csvParser = new CSVRecordParser((delimiter.charAt(0))
>>>>>     lines.foreach(line => {
>>>>>       val lineElems = csvParser.parseLine(line)
>>>>>     })
>>>>>     ...
>>>>>     ...
>>>>>  )
>>>>>
>>>>>
>>>>> If the instance of CSVRecordParser is created through reflection, it
works.
>>>>>
>>>>> data.mapPartitions(lines => {
>>>>>     val loader = Thread.currentThread.getContextClassLoader
>>>>>     val CSVRecordParser =
>>>>>         loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")
>>>>>
>>>>>     val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
>>>>>         .newInstance(delimiter.charAt(0).asInstanceOf[Character])
>>>>>
>>>>>     val parseLine = CSVRecordParser
>>>>>         .getDeclaredMethod("parseLine", classOf[String])
>>>>>
>>>>>     lines.foreach(line => {
>>>>>        val lineElems = parseLine.invoke(csvParser,
>>>>> line).asInstanceOf[Array[String]]
>>>>>     })
>>>>>     ...
>>>>>     ...
>>>>>  )
>>>>>
>>>>>
>>>>> This is identical to this question,
>>>>>
>>>>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection
>>>>>
>>>>> It's not intuitive for users to load external classes through reflection,
>>>>> but couple available solutions including 1) messing around
>>>>> systemClassLoader by calling systemClassLoader.addURI through reflection
or
>>>>> 2) forking another JVM to add jars into classpath before bootstrap loader
>>>>> are very tricky.
>>>>>
>>>>> Any thought on fixing it properly?
>>>>>
>>>>> @Xiangrui,
>>>>> netlib-java jniloader is loaded from netlib-java through reflection,
so
>>>>> this problem will not be seen.
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> DB Tsai
>>>>> -------------------------------------------------------
>>>>> My Blog: https://www.dbtsai.com
>>>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>>>

Mime
View raw message