Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3EE15200D39 for ; Sat, 11 Nov 2017 15:09:54 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3D8E7160C03; Sat, 11 Nov 2017 14:09:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8D6C6160BF1 for ; Sat, 11 Nov 2017 15:09:52 +0100 (CET) Received: (qmail 40855 invoked by uid 500); 11 Nov 2017 14:09:51 -0000 Mailing-List: contact user-help@gobblin.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@gobblin.incubator.apache.org Delivered-To: mailing list user@gobblin.incubator.apache.org Received: (qmail 40845 invoked by uid 99); 11 Nov 2017 14:09:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Nov 2017 14:09:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D9459191FD6 for ; Sat, 11 Nov 2017 14:09:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.129 X-Spam-Level: ** X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id uSFWhpBioCdE for ; Sat, 11 Nov 2017 14:09:45 +0000 (UTC) Received: from mail-ua0-f179.google.com (mail-ua0-f179.google.com [209.85.217.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E83665FD72 for ; Sat, 11 Nov 2017 14:09:44 +0000 (UTC) Received: by mail-ua0-f179.google.com with SMTP id 21so5875467uas.13 for ; Sat, 11 Nov 2017 06:09:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=WbnLbtPPd6XpkysBr4XGzobnWdAJzaZmodwngxc240s=; b=D+J5v9KYdKMnO3XnKB3LIOtcYO2z1WOdrMoROFsurkvuYXAYe9WtMBkgIZkEk1jnYg b1Cw9DnbN6lcJsPHc8WDi3W3FV4pjUEDvgwus2VswxjxZu+C34+y9qcVO5q7fVmywLYU i8yidtT5iyKh7/Dy7iCc8jsjjYbYJvN3fUYRMEatPvwUV3Oow/eqbz5Lr3aGnUNp98T7 2hHcuUfDGYwnI3uCQWo+2UxQ/nbGYqKnDwfHyorRyCSZdLQ8DNw5LjrCKUlceEMc6MBe hhTYOiONWw/LURZtJ8rOkykQvxSY0Tzzsx++5UfQRBzb9FoAt6CXoPrIokn+YfTp97lT aztw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=WbnLbtPPd6XpkysBr4XGzobnWdAJzaZmodwngxc240s=; b=sZYfrl+N+zrUwnKzVmahgsznvlqffbC68T2AjojCtXLlIIygJHyllwDuxe7odybLDw emKeT94esd+yhAsmdcaPj+9xn4JAQqTBlysWXDfet1NZ1HJj9z1hPzYtmx83H/rEFwk1 8556y01+psl3YhbbJsFMn99QAOyOvIzoYeWNv9Z/AW7LubxmPzC8iSVJU6MxcAwTnT3Q 5OAusAzcsvmiHdfqtoMjBwzvsVZYRhaJd3BqVN7EjwHdMIbF0631IXP6wGBbS/zKoUqU kovsb/O0Ilnytlru3NV+qsdEqClpqyIiW6x/7Jn78GbCntN7FVBuhKxvEnWd/oNDGhdo 3Dzg== X-Gm-Message-State: AJaThX7bHUuv+LZ/uKBiQ84n8veIj9d0FRuZ2O2BsXJvszT3qWuEQYua cHNFljBhG5sGbBpm4cnKc4OTEsHnvnP5CIB6Oz4= X-Google-Smtp-Source: AGs4zMadw0cOSgSrKAtMu4pnL8Lbvat9E4uTOfJeDiAsaJCDc70dcuZ+OoFvrGFdp7k1tLiT19Mw0xcHQHT+V4TIkIg= X-Received: by 10.159.55.82 with SMTP id a18mr3304427uae.130.1510409384035; Sat, 11 Nov 2017 06:09:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.31.93.194 with HTTP; Sat, 11 Nov 2017 06:09:43 -0800 (PST) Received: by 10.31.93.194 with HTTP; Sat, 11 Nov 2017 06:09:43 -0800 (PST) In-Reply-To: References: From: Mohan Date: Sat, 11 Nov 2017 19:39:43 +0530 Message-ID: Subject: Re: Corrupted state file when Jobs are being run in parallel. To: user@gobblin.incubator.apache.org Content-Type: multipart/alternative; boundary="001a1145d602cd0bed055db596d9" archived-at: Sat, 11 Nov 2017 14:09:54 -0000 --001a1145d602cd0bed055db596d9 Content-Type: text/plain; charset="UTF-8" Could you please tell me how to perform stress test on Gobblin And please let me know how to enable debug option in log file. Thanks On Nov 11, 2017 7:29 PM, "Vicky Kak" wrote: > Hi Guys, > > I have been running the stress tests and am seeing the following errors > > Error 1 > ************************************************************ > ********************************************************************* > 017-11-11 11:20:56 UTC INFO [pool-11-thread-421] org.apache.hadoop.fs.FSInputChecker > 284 - Found checksum error: b[0, 512]=53455106196f72672e617061636865 > 2e6861646f6f702e696f2e5465787425676f62626c696e2e72756e74696d > 652e4a6f6253746174652444617461736574537461746501012a6f72672e > 6170616368652e6861646f6f702e696f2e636f6d70726573732e44656661 > 756c74436f6465630000000044e218b9e6aad3f1aa97f2210fb5c7f0ffff > ffff44e218b9e6aad3f1aa97f2210fb5c7f00109789c6304000002000209 > 789c630000000100010b789cebb3d50200025100f68e0ab4789ced5b7b73 > dbc611971c3b8d5ff233b6d324ad861337e9d804013e4451692643d1924c > 51a26489962da71ece013890270238f870904479fc1592ffdb4fd1e9f4b3 > 64a6dfa3ff770f0fbe244384eca6c998d2f081bbddc5deede2f6777bcbcf > 974da275d8266ae1a543ce90c629dbf44cb3a9e48ae93daa3663fa9b4a41 > 9195ac5c2a147373a5a9a9e9e6df4adf3c0a3ee7ff39e5ff5da7172b1beb > ebd54663097aa6a6c52b995c9923b7333e79530e15f93954e41f8122d7fe > 0d6f8f6fe0805bb291855d0769f8aee14b961c102da17d4625576b630b5d > 7ae561d6954c64b7ce75d81742098639b4f036c348772835250b1dbae408 > 4f672fba1c1a2d89e85f159031870d944fe7545d4be70b46313d5f9071ba > 24e772459445322aea331479bc2df96f1e33bf6d73eeb80b998c4d74506c > 1f3349a356c627ca4a72467c520637fa9e > org.apache.hadoop.fs.ChecksumException: Checksum error: > file:/home/Installable/gobblin-dist/working-dir/state-store/ > FlickrPageExtractorPull_137/current.jst at 0 exp: 36820587 got: 91149211 > at org.apache.hadoop.fs.FSInputChecker.verifySums( > FSInputChecker.java:322) > at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk( > FSInputChecker.java:278) > at org.apache.hadoop.fs.FSInputChecker.fill( > FSInputChecker.java:213) > at org.apache.hadoop.fs.FSInputChecker.read1( > FSInputChecker.java:231) > at org.apache.hadoop.fs.FSInputChecker.read( > FSInputChecker.java:195) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.apache.hadoop.io.SequenceFile$Reader.init( > SequenceFile.java:1845) > at org.apache.hadoop.io.SequenceFile$Reader. > initialize(SequenceFile.java:1810) > at org.apache.hadoop.io.SequenceFile$Reader.( > SequenceFile.java:1759) > at org.apache.hadoop.io.SequenceFile$Reader.( > SequenceFile.java:1773) > at gobblin.runtime.FsDatasetStateStore.getAll( > FsDatasetStateStore.java:119) > at gobblin.runtime.FsDatasetStateStore. > getLatestDatasetStatesByUrns(FsDatasetStateStore.java:173) > at gobblin.runtime.JobContext.(JobContext.java:136) > at gobblin.runtime.AbstractJobLauncher.( > AbstractJobLauncher.java:131) > at gobblin.runtime.local.LocalJobLauncher.( > LocalJobLauncher.java:62) > at gobblin.runtime.JobLauncherFactory.newJobLauncher( > JobLauncherFactory.java:80) > at gobblin.runtime.JobLauncherFactory.newJobLauncher( > JobLauncherFactory.java:59) > at com.bph.JobLauncherResource.search(JobLauncherResource. > java:107) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.linkedin.restli.internal.server.RestLiMethodInvoker. > doInvoke(RestLiMethodInvoker.java:186) > at com.linkedin.restli.internal.server.RestLiMethodInvoker. > invoke(RestLiMethodInvoker.java:141) > at com.linkedin.restli.server.RestLiServer.handleResourceRequest( > RestLiServer.java:286) > at com.linkedin.restli.server.RestLiServer.doHandleRequest( > RestLiServer.java:167) > at com.linkedin.restli.server.BaseRestServer.handleRequest( > BaseRestServer.java:56) > at com.linkedin.restli.server.DelegatingTransportDispatcher. > handleRestRequest(DelegatingTransportDispatcher.java:56) > at com.linkedin.r2.filter.transport.DispatcherRequestFilter. > onRestRequest(DispatcherRequestFilter.java:81) > at com.linkedin.r2.filter.FilterChainImpl$ > RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328) > at com.linkedin.r2.filter.ComposedFilter.onRequest( > ComposedFilter.java:55) > at com.linkedin.r2.filter.FilterChainIterator.onRequest( > FilterChainIterator.java:50) > at com.linkedin.r2.filter.compression.ServerCompressionFilter. > onRestRequest(ServerCompressionFilter.java:126) > at com.linkedin.r2.filter.FilterChainImpl$ > RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328) > at com.linkedin.r2.filter.ComposedFilter.onRequest( > ComposedFilter.java:55) > at com.linkedin.r2.filter.FilterChainIterator.onRequest( > FilterChainIterator.java:50) > at com.linkedin.r2.filter.ComposedFilter.onRequest( > ComposedFilter.java:59) > at com.linkedin.r2.filter.FilterChainIterator.onRequest( > FilterChainIterator.java:50) > at com.linkedin.r2.filter.FilterChainImpl.onRestRequest( > FilterChainImpl.java:103) > at com.linkedin.r2.filter.transport.FilterChainDispatcher. > handleRestRequest(FilterChainDispatcher.java:74) > at com.linkedin.r2.transport.http.server.HttpDispatcher. > handleRequest(HttpDispatcher.java:95) > at com.linkedin.r2.transport.http.server.HttpDispatcher. > handleRequest(HttpDispatcher.java:62) > at com.linkedin.r2.transport.http.server.HttpNettyServer$ > Handler.messageReceived(HttpNettyServer.java:171) > at org.jboss.netty.channel.SimpleChannelUpstreamHandler. > handleUpstream(SimpleChannelUpstreamHandler.java:80) > at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream( > DefaultChannelPipeline.java:545) > at org.jboss.netty.channel.DefaultChannelPipeline$ > DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at org.jboss.netty.handler.execution.ChannelEventRunnable.run( > ChannelEventRunnable.java:69) > at org.jboss.netty.handler.execution. > OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run( > OrderedMemoryAwareThreadPoolExecutor.java:316) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-11-11 11:20:56 UTC ERROR [pool-11-thread-421] > com.bph.JobLauncherResource 110 - Job Id fk_137 failed while searching > key beryls Failed to create job launcher: org.apache.hadoop.fs.ChecksumException: > Checksum error: file:/home/Installable/gobblin-dist/working-dir/ > state-store/FlickrPageExtractorPull_137/current.jst at 0 exp: 36820587 > got: 91149211 > 2017-11-11 11:20:56 UTC INFO [pool-11-thread-402] > gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: > java.util.concurrent.ThreadPoolExecutor@6bce96a5[Shutting down, pool size > = 1, active threads = 0, queued tasks = 0, completed tasks = 1] > 2017-11-11 11:20:56 UTC INFO [pool-11-thread-402] > gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: > java.util.concurrent.ThreadPoolExecutor@6bce96a5[Terminated, pool size = > 0, active threads = 0, queued tasks = 0, completed tasks = 1] > > ************************************************************ > ********************************************************************* > > Error 2: > ************************************************************ > ********************************************************************* > > 2017-11-10 10:24:10 UTC WARN [pool-11-thread-13] org.apache.hadoop.fs. > ChecksumFileSystem$ChecksumFSInputChecker 154 - Problem opening checksum > file: file:/home/Installable/gobblin-dist/working-dir/state-store/ > YoutubePageExtractorPull_138/current.jst. Ignoring exception: > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.apache.hadoop.fs.ChecksumFileSystem$ > ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at org.apache.hadoop.fs.ChecksumFileSystem.open( > ChecksumFileSystem.java:339) > at org.apache.hadoop.io.SequenceFile$Reader.openFile( > SequenceFile.java:1832) > at org.apache.hadoop.io.SequenceFile$Reader.( > SequenceFile.java:1752) > at org.apache.hadoop.io.SequenceFile$Reader.( > SequenceFile.java:1773) > at gobblin.runtime.FsDatasetStateStore.getAll( > FsDatasetStateStore.java:119) > at gobblin.runtime.FsDatasetStateStore. > getLatestDatasetStatesByUrns(FsDatasetStateStore.java:173) > at gobblin.runtime.JobContext.(JobContext.java:136) > at gobblin.runtime.AbstractJobLauncher.( > AbstractJobLauncher.java:131) > at gobblin.runtime.local.LocalJobLauncher.( > LocalJobLauncher.java:62) > at gobblin.runtime.JobLauncherFactory.newJobLauncher( > JobLauncherFactory.java:80) > at gobblin.runtime.JobLauncherFactory.newJobLauncher( > JobLauncherFactory.java:59) > at com.bph.JobLauncherResource.search(JobLauncherResource. > java:107) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.linkedin.restli.internal.server.RestLiMethodInvoker. > doInvoke(RestLiMethodInvoker.java:186) > at com.linkedin.restli.internal.server.RestLiMethodInvoker. > invoke(RestLiMethodInvoker.java:141) > at com.linkedin.restli.server.RestLiServer.handleResourceRequest( > RestLiServer.java:286) > at com.linkedin.restli.server.RestLiServer.doHandleRequest( > RestLiServer.java:167) > at com.linkedin.restli.server.BaseRestServer.handleRequest( > BaseRestServer.java:56) > at com.linkedin.restli.server.DelegatingTransportDispatcher. > handleRestRequest(DelegatingTransportDispatcher.java:56) > at com.linkedin.r2.filter.transport.DispatcherRequestFilter. > onRestRequest(DispatcherRequestFilter.java:81) > at com.linkedin.r2.filter.FilterChainImpl$ > RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328) > at com.linkedin.r2.filter.ComposedFilter.onRequest( > ComposedFilter.java:55) > at com.linkedin.r2.filter.FilterChainIterator.onRequest( > FilterChainIterator.java:50) > at com.linkedin.r2.filter.compression.ServerCompressionFilter. > onRestRequest(ServerCompressionFilter.java:126) > at com.linkedin.r2.filter.FilterChainImpl$ > RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328) > at com.linkedin.r2.filter.ComposedFilter.onRequest( > ComposedFilter.java:55) > at com.linkedin.r2.filter.FilterChainIterator.onRequest( > FilterChainIterator.java:50) > at com.linkedin.r2.filter.ComposedFilter.onRequest( > ComposedFilter.java:59) > at com.linkedin.r2.filter.FilterChainIterator.onRequest( > FilterChainIterator.java:50) > at com.linkedin.r2.filter.FilterChainImpl.onRestRequest( > FilterChainImpl.java:103) > at com.linkedin.r2.filter.transport.FilterChainDispatcher. > handleRestRequest(FilterChainDispatcher.java:74) > at com.linkedin.r2.transport.http.server.HttpDispatcher. > handleRequest(HttpDispatcher.java:95) > at com.linkedin.r2.transport.http.server.HttpDispatcher. > handleRequest(HttpDispatcher.java:62) > at com.linkedin.r2.transport.http.server.HttpNettyServer$ > Handler.messageReceived(HttpNettyServer.java:171) > at org.jboss.netty.channel.SimpleChannelUpstreamHandler. > handleUpstream(SimpleChannelUpstreamHandler.java:80) > at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream( > DefaultChannelPipeline.java:545) > at org.jboss.netty.channel.DefaultChannelPipeline$ > DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) > at org.jboss.netty.handler.execution.ChannelEventRunnable.run( > ChannelEventRunnable.java:69) > at org.jboss.netty.handler.execution. > OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run( > OrderedMemoryAwareThreadPoolExecutor.java:316) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-11-10 10:24:11 UTC ERROR [pool-11-thread-13] > com.bph.JobLauncherResource 110 - Job Id yt_138 failed while searching > key ostfold Failed to create job launcher: java.io.EOFException > > ************************************************************ > ********************************************************************* > > Error 3 > ************************************************************ > ********************************************************************* > 2017-11-10 13:38:49 UTC ERROR [Commit-thread-0] gobblin.runtime.SafeDatasetCommit > 118 - Failed to persist dataset state for dataset of job > job_TwitterPageExtractorPull_135_1510321111647 > java.io.FileNotFoundException: Failed to rename /home/Installable/gobblin- > dist/working-dir/state-store/TwitterPageExtractorPull_135/_tmp_/current.jst > to /home/Installable/gobblin-dist/working-dir/state-store/ > TwitterPageExtractorPull_135/current.jst: src not found > at gobblin.util.HadoopUtils.renamePath(HadoopUtils.java:173) > at gobblin.util.HadoopUtils.renamePath(HadoopUtils.java:164) > at gobblin.util.HadoopUtils.copyFile(HadoopUtils.java:333) > at gobblin.metastore.FsStateStore.createAlias(FsStateStore.java:283) > at gobblin.runtime.FsDatasetStateStore.persistDatasetState( > FsDatasetStateStore.java:221) > at gobblin.runtime.SafeDatasetCommit.persistDatasetState( > SafeDatasetCommit.java:255) > at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:115) > at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:43) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > ************************************************************ > ********************************************************************* > > There errors are seeing during the stress tests for the same Jobs. For our > use case we can't afford the jobs to fail due to system issue like above. I > did some basic investigation and could find the issue could be happening to > to non atomic operations on the state file which is of extension .jst. It > seems we could disable the statestore, I looked at the following code in > gobblin.runtime.JobContext::createStateStore > ************************************************************ > ********************************************************************* > if (jobProps.containsKey(ConfigurationKeys.STATE_STORE_ENABLED) && > !Boolean.parseBoolean(jobProps.getProperty( > ConfigurationKeys.STATE_STORE_ENABLED))) { > return new NoopDatasetStateStore(stateStoreFs, stateStoreRootDir); > } else { > return new FsDatasetStateStore(stateStoreFs, stateStoreRootDir); > } > ************************************************************ > ********************************************************************* > > It seems that by disabling the statestore we may get over this issue, but > for our case the source implementation is passing the information to the > Extractor via the WorkUnitStoreState. > > > We don't want the Job Retry features and hence did disable it as explained > here > https://gobblin.readthedocs.io/en/latest/user-guide/ > Configuration-Properties-Glossary/#retry-properties > > I was expecting the disabling happening by setting the follwing only > workunit.retry.enabled=false > we have to set this also > task.maxretries=0 > As we don't rely on retries would it not be good to have a flag what will > ignore the the following calls when we have have > workunit.retry.enabled=false > > 1) Reading the initial value from the store > 2) Commit the final state to the store. > > As mentioned about we can't disable the state store as we are generating > some data in the Source implementation and passed to the corresponding > Extractor implementation via State. > > I do anticipate of having these issues in GAAS too. > > I will be working to fix this issue as this is a critical issue for us. > > Thanks, > Vicky > --001a1145d602cd0bed055db596d9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Could you please tell me how to perform=C2=A0 stress test= on Gobblin
And please let me know how to enable debug opt= ion in log file. Thanks

On Nov 11, 2017 7:29 PM, "Vicky Kak" <vicky.kak@gmail.com> wrote:
Hi Gu= ys,

I have been running the stress tests and am seeing = the following errors

Error 1
*******************= *****************************************************************= *********************************************
017-= 11-11 11:20:56 UTC INFO=C2=A0 [pool-11-thread-421] org.apache.hadoop.fs.FSInputChecker=C2=A0 284 - Found checksum error: b[0, 512]=3D5345510= 6196f72672e6170616368652e6861646f6f702e696f2e5465787425676f62626c= 696e2e72756e74696d652e4a6f625374617465244461746173657453746174650= 1012a6f72672e6170616368652e6861646f6f702e696f2e636f6d70726573732e= 44656661756c74436f6465630000000044e218b9e6aad3f1aa97f2210fb5c7f0f= fffffff44e218b9e6aad3f1aa97f2210fb5c7f00109789c6304000002000209789c630000000100010b789cebb3d50200025100f68e0ab4789ced5b7b73db= c611971c3b8d5ff233b6d324ad861337e9d804013e4451692643d1924c51a2648= 9962da71ece013890270238f870904479fc1592ffdb4fd1e9f4b364a6dfa3ff77= 0f0fbe244384eca6c998d2f081bbddc5deede2f6777bcbcf974da275d8266ae1a= 543ce90c629dbf44cb3a9e48ae93daa3663fa9b4a419195ac5c2a147373a5a9a9= e9e6df4adf3c0a3ee7ff39e5ff5da7172b1bebebd54663097aa6a6c52b995c992= 3b7333e79530e15f93954e41f8122d7fe0d6f8f6fe0805bb291855d0769f8aee14b961c102da17d4625576b630b5d7ae561d6954c64b7ce75d81742098639= b4f036c348772835250b1dbae4084f672fba1c1a2d89e85f159031870d944fe75= 45d4be70b46313d5f9071ba24e772459445322aea331479bc2df96f1e33bf6d73= eeb80b998c4d74506c1f3349a356c627ca4a72467c520637fa9e
or= g.apache.hadoop.fs.ChecksumException: Checksum error: file:/home/Insta= llable/gobblin-dist/working-dir/state-store/FlickrPageExtrac= torPull_137/current.jst at 0 exp: 36820587 got: 91149211
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.fs.FSInputChecker.verify= Sums(FSInputChecker.java:322)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInp= utChecker.java:278)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.had= oop.fs.FSInputChecker.fill(FSInputChecker.java:213)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.fs.FSInputChecker.read1(= FSInputChecker.java:231)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.= apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:195)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.DataInputStream.readFull= y(DataInputStream.java:195)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at j= ava.io.DataInputStream.readFully(DataInputStream.java:169)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at = org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.ja= va:1845)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.initi= alize(SequenceFile.java:1810)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= org.apache.hadoop.io.Sequ= enceFile$Reader.<init>(SequenceFile.java:1759)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.= hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:= 1773)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.FsDatas= etStateStore.getAll(FsDatasetStateStore.java:119)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at gobblin.runtime.FsDatasetStateStore.getLates= tDatasetStatesByUrns(FsDatasetStateStore.java:173)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.JobContext.<init>(JobCon= text.java:136)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:131)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.local.LocalJobLaun= cher.<init>(LocalJobLauncher.java:62)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at gobblin.runtime.JobLauncherFactory.newJobLaunche= r(JobLauncherFactory.java:80)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLaunch= erFactory.java:59)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.bph.JobLaun= cherResource.search(JobLauncherResource.java:107)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.GeneratedMethodAccessor8.= invoke(Unknown Source)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect= .DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso= rImpl.java:43)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.refl= ect.Method.invoke(Method.java:498)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at com.linkedin.restli.internal.server.RestLiMethodInvoker.do= Invoke(RestLiMethodInvoker.java:186)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:141)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at com.linkedin.restli.server.RestLiServer.handleResourceR= equest(RestLiServer.java:286)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= com.linkedin.restli.server.RestLiServer.doHandleRequest(RestLiSe= rver.java:167)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli= .server.BaseRestServer.handleRequest(BaseRestServer.java:56)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.server.Delega= tingTransportDispatcher.handleRestRequest(DelegatingTransportDisp= atcher.java:56)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.= r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:81)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at com.linkedin.r2.filter.FilterChainImpl$RestRequestFilterAd= apter.onRequest(FilterChainImpl.java:328)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilter.java:55)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linke= din.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.j= ava:50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.compression.ServerCompressionFilter.onRestRequest(ServerCo= mpressionFilter.java:126)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com= .linkedin.r2.filter.FilterChainImpl$RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at com.linkedin.r2.filter.ComposedFilter.onRequest(Compose= dFilter.java:55)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.f= ilter.FilterChainIterator.onRequest(FilterChainIterator.java:50)<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.Compos= edFilter.onRequest(ComposedFilter.java:59)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com= .linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl= .java:103)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.= transport.FilterChainDispatcher.handleRestRequest(Filte= rChainDispatcher.java:74)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.link= edin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDi= spatcher.java:95)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedi= n.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispa= tcher.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r= 2.transport.http.server.HttpNettyServer$Handler.messageReceived(<= wbr>HttpNettyServer.java:171)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.= jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(<= wbr>SimpleChannelUpstreamHandler.java:80)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at org.jboss.netty.channel.DefaultChannelPipeline.send= Upstream(DefaultChannelPipeline.java:545)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at org.jboss.netty.channel.DefaultChannelPipeline$= DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.= java:754)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.han= dler.execution.ChannelEventRunnable.run(ChannelEventRunnable= .java:69)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.= execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor= .run(OrderedMemoryAwareThreadPoolExecutor.java:316)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.ru= nWorker(ThreadPoolExecutor.java:1142)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thre= adPoolExecutor.java:617)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang= .Thread.run(Thread.java:745)
2017-11-11 11:20:56 UTC ERROR [= pool-11-thread-421] com.bph.JobLauncherResource=C2=A0 110 -=C2=A0 Job Id fk= _137 failed while searching key beryls Failed to create job launcher: org.a= pache.hadoop.fs.ChecksumException: Checksum error: file:/home/Installa= ble/gobblin-dist/working-dir/state-store/FlickrPageExtractor= Pull_137/current.jst at 0 exp: 36820587 got: 91149211
2017-1= 1-11 11:20:56 UTC INFO=C2=A0 [pool-11-thread-402] gobblin.util.ExecutorsUti= ls=C2=A0 125 - Attempting to shutdown ExecutorService: java.util.concurrent= .ThreadPoolExecutor@6bce96a5[Shutting down, pool size =3D 1, acti= ve threads =3D 0, queued tasks =3D 0, completed tasks =3D 1]
2017= -11-11 11:20:56 UTC INFO=C2=A0 [pool-11-thread-402] gobblin.util.ExecutorsU= tils=C2=A0 144 - Successfully shutdown ExecutorService: java.util.concurren= t.ThreadPoolExecutor@6bce96a5[Terminated, pool size =3D 0, active= threads =3D 0, queued tasks =3D 0, completed tasks =3D 1]
=

**************************************************= *****************************************************************= **************

Error 2:
****************************************************************************************************************************= *****

2017-11-10 10:24:10 UTC WARN=C2= =A0 [pool-11-thread-13] org.apache.hadoop.fs.ChecksumFileSystem$C= hecksumFSInputChecker=C2=A0 154 - Problem opening checksum file: file:/home= /Installable/gobblin-dist/working-dir/state-store/YoutubePag= eExtractorPull_138/current.jst.=C2=A0 Ignoring exception:
ja= va.io.EOFException
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.DataInp= utStream.readFully(DataInputStream.java:197)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at java.io.DataInputStream.readFully(DataInputStream= .java:169)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.= fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Ch= ecksumFileSystem.java:146)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apa= che.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.openFile(Seq= uenceFile.java:1832)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reade= r.<init>(SequenceFile.java:1752)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at org.apache.hadoop.io.= SequenceFile$Reader.<init>(SequenceFile.java:1773)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.FsDatasetStateStore.g= etAll(FsDatasetStateStore.java:119)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at gobblin.runtime.FsDatasetStateStore.getLatestDatasetStates= ByUrns(FsDatasetStateStore.java:173)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at gobblin.runtime.JobContext.<init>(JobContext.java:136)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.AbstractJobL= auncher.<init>(AbstractJobLauncher.java:131)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.local.LocalJobLauncher.<ini= t>(LocalJobLauncher.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLaun= cherFactory.java:80)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runti= me.JobLauncherFactory.newJobLauncher(JobLauncherFactory.java= :59)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.bph.JobLauncherResource.<= wbr>search(JobLauncherResource.java:107)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native= Method)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.NativeMe= thodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.DelegatingMethodAcc= essorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.reflect.Method.invoke(Me= thod.java:498)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli= .internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker= .java:186)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restl= i.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.= java:141)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli= .server.RestLiServer.handleResourceRequest(RestLiServer.java= :286)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.server.<= wbr>RestLiServer.doHandleRequest(RestLiServer.java:167)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.server.BaseRestServer.= handleRequest(BaseRestServer.java:56)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:56)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.= java:81)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filt= er.FilterChainImpl$RestRequestFilterAdapter.onRequest(Filter= ChainImpl.java:328)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linke= din.r2.filter.ComposedFilter.onRequest(ComposedFilter.java:55)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterCh= ainIterator.onRequest(FilterChainIterator.java:50)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.compression.Server= CompressionFilter.onRestRequest(ServerCompressionFilter.java:126)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.= FilterChainImpl$RestRequestFilterAdapter.onRequest(FilterChainImp= l.java:328)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.f= ilter.ComposedFilter.onRequest(ComposedFilter.java:55)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainItera= tor.onRequest(FilterChainIterator.java:50)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at com.linkedin.r2.filter.ComposedFilter.onRequest(Com= posedFilter.java:59)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.= r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:= 50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.Fi= lterChainImpl.onRestRequest(FilterChainImpl.java:103)
=C2=A0= =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.transport.FilterC= hainDispatcher.handleRestRequest(FilterChainDispatcher.java:74)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.transport.http= .server.HttpDispatcher.handleRequest(HttpDispatcher.java:95)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.transport.http.se= rver.HttpDispatcher.handleRequest(HttpDispatcher.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.transport.http.serve= r.HttpNettyServer$Handler.messageReceived(HttpNettyServer.java:17= 1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.channel.Si= mpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHa= ndler.java:80)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.nett= y.channel.DefaultChannelPipeline.sendUpstream(DefaultChannel= Pipeline.java:545)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.= netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext= .sendUpstream(DefaultChannelPipeline.java:754)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.execution.Cha= nnelEventRunnable.run(ChannelEventRunnable.java:69)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.execution.Ordered= MemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwar= eThreadPoolExecutor.java:316)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec= utor.java:1142)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurre= nt.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java= :745)
2017-11-10 10:24:11 UTC ERROR [pool-11-thread-13] com.bph.J= obLauncherResource=C2=A0 110 -=C2=A0 Job Id yt_138 failed while searching k= ey ostfold Failed to create job launcher: java.io.EOFException
<= div>
**********************************************= *****************************************************************= ******************

Error 3
**************************************************************= *****************************************************************= **
2017-11-10 13:38:49 UTC ERROR [Commit-thread-0] gob= blin.runtime.SafeDatasetCommit=C2=A0 118 - Failed to persist dataset s= tate for dataset=C2=A0 of job job_TwitterPageExtractorPull_135_1510321= 111647
java.io.FileNotFoundException: Failed to rename /home/Inst= allable/gobblin-dist/working-dir/state-store/TwitterPageExtractor= Pull_135/_tmp_/current.jst to /home/Installable/gobblin-dist/work= ing-dir/state-store/TwitterPageExtractorPull_135/current.jst: src= not found
at gobbli= n.util.HadoopUtils.renamePath(HadoopUtils.java:173)
at gobblin.util.HadoopUtils.= renamePath(HadoopUtils.java:164)
at gobblin.util.HadoopUtils.copyFile(HadoopUtils.jav= a:333)
at gobblin.me= tastore.FsStateStore.createAlias(FsStateStore.java:283)
at gobblin.runtime.FsDat= asetStateStore.persistDatasetState(FsDatasetStateStore.java:221)<= /div>
at gobblin.runtime.<= wbr>SafeDatasetCommit.persistDatasetState(SafeDatasetCommit.java:= 255)
at gobblin.runt= ime.SafeDatasetCommit.call(SafeDatasetCommit.java:115)
= at gobblin.runtime.SafeDa= tasetCommit.call(SafeDatasetCommit.java:43)
at java.util.concurrent.FutureTask.run(Fu= tureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Execut= ors.java:511)
at jav= a.util.concurrent.FutureTask.run(FutureTask.java:266)
<= span style=3D"white-space:pre-wrap"> at java.util.concurrent.Th= readPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.Thre= adPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
**************************************= *****************************************************************= **************************

There errors= are seeing during the stress tests for the same Jobs. For our use case we = can't afford the jobs to fail due to system issue like above. I did som= e basic investigation and could find the issue could be happening to to non= atomic operations on the state file which is of extension .jst. It seems w= e could disable the statestore, I looked at the following code in gobblin.r= untime.JobContext::createStateStore
************************= *****************************************************************= ****************************************
if (jobPr= ops.containsKey(ConfigurationKeys.STATE_STORE_ENABLED) &&=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 !Boolean.parseBoolean(jobProps.= getProperty(ConfigurationKeys.STATE_STORE_ENABLED))) {
= =C2=A0 =C2=A0 =C2=A0 return new NoopDatasetStateStore(stateStoreFs, st= ateStoreRootDir);
=C2=A0 =C2=A0 } else {
=C2=A0 =C2=A0 = =C2=A0 return new FsDatasetStateStore(stateStoreFs, stateStoreRootDir)= ;
=C2=A0 =C2=A0 }
**************************= ****************************************************************<= wbr>***************************************

It seems that by disabling the statestore we may get over t= his issue, but for our case the source implementation is passing the inform= ation to the Extractor via the WorkUnitStoreState.


We don't want the Job Retry features and hence did disa= ble it as explained here
https://gobblin.readthedocs.io/en/latest/user-g= uide/Configuration-Properties-Glossary/#retry-properties
<= /div>

I was expecting the disabling happening by setting= the follwing only

wor= kunit.retry.enabled=3Dfalse

we have to set this also
task.maxretries=3D0
A= s we don't rely on retries would it not be good to have a flag what wil= l ignore the the following calls when we have have workunit.retry.enabled= =3Dfalse

1) Reading the initial value from the sto= re
2) Commit the final state to the store.

As mentioned about we can't disable the state store as we are genera= ting some data in the Source implementation and passed to the corresponding= Extractor implementation via State.=C2=A0

I do an= ticipate of having these issues in GAAS too.

I wil= l be working to fix this issue as this is a critical issue for us.

Thanks,
Vicky
--001a1145d602cd0bed055db596d9--