flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Resnick <cre...@gmail.com>
Subject Re: classloading strangeness with Avro in Flink
Date Tue, 21 Aug 2018 12:02:19 GMT
Hi Aljoscha,

We need flink-shaded-hadoop2-uber.jar because there is no hadoop distro on
the instance the Flink session/jobs is managed from and the process that
launches Flink is not a java process, but execs a process that calls the
flink script.

-Cliff

On Tue, Aug 21, 2018 at 5:11 AM Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi Cliff,
>
> Do you actually need the flink-shaded-hadoop2-uber.jar in lib. If you're
> running on YARN, you should be able to just remove them because with YARN
> you will have Hadoop in the classpath anyways.
>
> Aljoscha
>
> On 21. Aug 2018, at 03:45, vino yang <yanghua1127@gmail.com> wrote:
>
> Hi Cliff,
>
> If so, you can explicitly exclude Avro's dependencies from related
> dependencies (using <exclude>) and then directly introduce dependencies on
> the Avro version you need.
>
> Thanks, vino.
>
> Cliff Resnick <cresny@gmail.com> 于2018年8月21日周二 上午5:13写道:
>
>> Hi Vino,
>>
>> Unfortunately, I'm still stuck here. By moving the avro dependency chain
>> to lib (and removing it from user jar), my OCFs decode but I get the error
>> described here:
>>
>> https://github.com/confluentinc/schema-registry/pull/509
>>
>> However, the Flink fix described in the PR above was to move the Avro
>> dependency to the user jar. However, since I'm using YARN, I'm required to
>> have flink-shaded-hadoop2-uber.jar loaded from lib -- and that has
>> avro bundled un-shaded. So I'm back to the start problem...
>>
>> Any advice is welcome!
>>
>> -Cliff
>>
>>
>> On Mon, Aug 20, 2018 at 1:42 PM Cliff Resnick <cresny@gmail.com> wrote:
>>
>>> Hi Vino,
>>>
>>> You were right in your assumption -- unshaded avro was being added to
>>> our application jar via third-party dependency. Excluding it in packaging
>>> fixed the issue. For the record, it looks flink-avro must be loaded from
>>> the lib or there will be errors in checkpoint restores.
>>>
>>> On Mon, Aug 20, 2018 at 8:43 AM Cliff Resnick <cresny@gmail.com> wrote:
>>>
>>>> Hi Vino,
>>>>
>>>> Thanks for the explanation, but the job only ever uses the Avro (1.8.2)
>>>> pulled in by flink-formats/avro, so it's not a class version conflict
>>>> there.
>>>>
>>>> I'm using default child-first loading. It might be a further transitive
>>>> dependency, though it's not clear by stack trace or stepping through the
>>>> process. When I get a chance I'll look further into it but in case anyone
>>>> is experiencing similar problems, what is clear is that classloader order
>>>> does matter with Avro.
>>>>
>>>> On Sun, Aug 19, 2018, 11:36 PM vino yang <yanghua1127@gmail.com> wrote:
>>>>
>>>>> Hi Cliff,
>>>>>
>>>>> My personal guess is that this may be caused by Job's Avro conflict
>>>>> with the Avro that the Flink framework itself relies on.
>>>>> Flink has provided some configuration parameters which allows you to
>>>>> determine the order of the classloaders yourself. [1]
>>>>> Alternatively, you can debug classloading and participate in the
>>>>> documentation.[2]
>>>>>
>>>>> [1]:
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html
>>>>> [2]:
>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html
>>>>>
>>>>> Thanks, vino.
>>>>>
>>>>> Cliff Resnick <cresny@gmail.com> 于2018年8月20日周一 上午10:40写道:
>>>>>
>>>>>> Our Flink/YARN pipeline has been reading Avro from Kafka for a while
>>>>>> now. We just introduced a source of Avro OCF (Object Container Files)
read
>>>>>> from S3. The Kafka Avro continued to decode without incident, but
the OCF
>>>>>> files failed 100% with anomalous parse errors in the decoding phase
after
>>>>>> the schema and codec were successfully read from them. The pipeline
would
>>>>>> work on my laptop, and when I submitted a test Main program to the
Flink
>>>>>> Session in YARN, that would also successfully decode. Only the actual
>>>>>> pipeline run from the TaskManager failed. At one point I even remote
>>>>>> debugged the TaskManager process and stepped through what looked
like a
>>>>>> normal Avro decode (if you can describe Avro code as normal!) --
until it
>>>>>> abruptly failed with an int decode or what-have-you.
>>>>>>
>>>>>> This stumped me for a while, but I finally tried moving
>>>>>> flink-avro.jar from the lib to the application jar, and that fixed
it. I'm
>>>>>> not sure why this is, especially since there were no typical
>>>>>> classloader-type errors.  This issue was observed both on Flink 1.5
and 1.6
>>>>>> in Flip-6 mode.
>>>>>>
>>>>>> -Cliff
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>

Mime
View raw message