incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dingyu Yang <yangdin...@gmail.com>
Subject Re: checkpoint problem
Date Tue, 26 Mar 2013 08:28:09 GMT
Yes, when I run at -futureSerializedState.get(1000, TimeUnit.MILLISECONDS),
then I get the  error previous mentioned.
My program sets the frequency setting as follows:
                wordSumPE.setCheckpointingConfig(new
CheckpointingConfig.Builder(CheckpointingMode.TIME).frequency(20).timeUnit(TimeUnit.SECONDS).build());

The sending in adapter is very easy and just some test words(10 words).
So I think the problem is at futureSerializedState class.
I am not familiar with jira system. Or Can I join the contribution of S4?
Thank you !
dingyu


2013/3/26 Matthieu Morel <mmorel@apache.org>

> Thanks for the feedback.
>
> When you write "cannot pass" what do you mean? the exception that you
> reported is logged and the program continues? something else?
>
> Besides, the standard tests that we run for the release pass and show that
> checkpointing works. The problem is might be related to the speed of
> checkpointing and of sending events. Note that it might not be necessary to
> checkpoint for every single event, and checkpointing every n events (n
> relatively small) and losing at worst n-1 events per PE in case of failure
> might be ok.
>
> It would be good to know in which conditions exactly you encounter the
> issue, i.e. frequency of checkpointing and frequency of events
> sent/received. Reporting a bug on our jira system would be the best place
> to follow that conversation.
>
> Thanks and regards,
>
> Matthieu
>
>
>
>
> On Mar 26, 2013, at 08:56 , Dingyu Yang wrote:
>
> Hi,Matthieu
> I debug the program and still have this problem.
> I find the problem when debuging at:
> SaveStateTask.run-----futureSerializedState.get(1000,
> TimeUnit.MILLISECONDS).
> It cannot pass at here. I don't know what the problem is, Even I have just
> one PE instance.  Is it my program problem or S4?
> Are you able to checkpoint?
>
> Waiting for your answer!
>
>
> 2013/3/26 Matthieu Morel <mmorel@apache.org>
>
>> This looks like a bug, from a race condition in the serializer.
>>
>> Can you file a bug? Also, are you able to reproduce it systematically?
>>
>> Thanks,
>>
>> Matthieu
>>
>> On Mar 23, 2013, at 07:33 , Dingyu Yang wrote:
>>
>> > Hi,all
>> > I run a checkpoint example and get some problems.
>> > The version is S4 0.6 RC3 .
>> > ./s4 deploy -a=example.wordcountApp -c=testCluster1 -appName=wordApp
>> -p=s4.checkpointing.filesystem.storageRootPath=/home/tmp/s4checkpoint
>> -emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule
>> >
>> > Then I get this error:
>> > 14:21:50.251 [Checkpointing-storage-0] WARN
>>  org.apache.s4.core.ft.SaveStateTask - Cannot save checkpoint :
>> [PROTO_ID];[KEY] --> [example.WordSumPE];[./s4]
>> > java.util.concurrent.ExecutionException:
>> com.esotericsoftware.kryo.KryoException:
>> java.util.ConcurrentModificationException
>> > Serialization trace:
>> > classes (sun.misc.Launcher$AppClassLoader)
>> > contextClassLoader (java.lang.Thread)
>> > thread (java.util.concurrent.ThreadPoolExecutor$Worker)
>> > workers (java.util.concurrent.ThreadPoolExecutor)
>> > fetchingThreadPool (org.apache.s4.core.ft.SafeKeeper)
>> > checkpointingFramework (example.wordcountApp)
>> > app (org.apache.s4.core.Stream)
>> > downStream (example.WordSumPE)
>> >     at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
>> ~[na:1.6.0_22]
>> >     at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>> ~[na:1.6.0_22]
>> >     at org.apache.s4.core.ft.SaveStateTask.run(SaveStateTask.java:66)
>> ~[bin/:na]
>> >     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> [na:1.6.0_22]
>> >     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> [na:1.6.0_22]
>> >     at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]
>> > Caused by: com.esotericsoftware.kryo.KryoException:
>> java.util.ConcurrentModificationException
>> > Serialization trace:
>> > classes (sun.misc.Launcher$AppClassLoader)
>> > contextClassLoader (java.lang.Thread)
>> > thread (java.util.concurrent.ThreadPoolExecutor$Worker)
>> > workers (java.util.concurrent.ThreadPoolExecutor)
>> > fetchingThreadPool (org.apache.s4.core.ft.SafeKeeper)
>> > checkpointingFramework (example.wordcountApp)
>> > app (org.apache.s4.core.Stream)
>> > downStream (example.WordSumPE)
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:552)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:68)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:571)
>> ~[kryo-2.20.jar:na]
>> >     at
>> org.apache.s4.comm.serialize.KryoSerDeser.serialize(KryoSerDeser.java:91)
>> ~[bin/:na]
>> >     at
>> org.apache.s4.core.ProcessingElement.serializeState(ProcessingElement.java:802)
>> ~[bin/:na]
>> >     at org.apache.s4.core.ft.SerializeTask.call(SerializeTask.java:42)
>> ~[bin/:na]
>> >     at org.apache.s4.core.ft.SerializeTask.call(SerializeTask.java:1)
>> ~[bin/:na]
>> >     at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> ~[na:1.6.0_22]
>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> ~[na:1.6.0_22]
>> >     ... 3 common frames omitted
>> > Caused by: java.util.ConcurrentModificationException: null
>> >     at
>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>> ~[na:1.6.0_22]
>> >     at java.util.AbstractList$Itr.next(AbstractList.java:343)
>> ~[na:1.6.0_22]
>> >     at
>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:74)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
>> ~[kryo-2.20.jar:na]
>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>> ~[kryo-2.20.jar:na]
>> >     at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ~[kryo-2.20.jar:na]
>> >     ... 35 common frames omitted
>> >
>> >
>>
>>
>
>

Mime
View raw message