Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 20101200D38 for ; Sun, 12 Nov 2017 07:58:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1EA54160BF0; Sun, 12 Nov 2017 06:58:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BC07F160BD7 for ; Sun, 12 Nov 2017 07:58:03 +0100 (CET) Received: (qmail 51555 invoked by uid 500); 12 Nov 2017 06:58:02 -0000 Mailing-List: contact user-help@gobblin.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@gobblin.incubator.apache.org Delivered-To: mailing list user@gobblin.incubator.apache.org Received: (qmail 51545 invoked by uid 99); 12 Nov 2017 06:58:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Nov 2017 06:58:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DF296DAA42 for ; Sun, 12 Nov 2017 06:58:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.629 X-Spam-Level: ** X-Spam-Status: No, score=2.629 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 7D_M1H5O32ZF for ; Sun, 12 Nov 2017 06:57:58 +0000 (UTC) Received: from mail-ot0-f193.google.com (mail-ot0-f193.google.com [74.125.82.193]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8CFB05F3DE for ; Sun, 12 Nov 2017 06:57:58 +0000 (UTC) Received: by mail-ot0-f193.google.com with SMTP id 105so2223829oth.10 for ; Sat, 11 Nov 2017 22:57:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=TpCIcPN2/a2o5PvDvEnm08+l4Wu3CNSrvf+n08lRdPg=; b=N8o0xUCVaPuXSHqDw/xEUZiESFo/AW+4FSfZbziWwBga2OixNHhNgZA0VaqSWl0qHd QbMEkAo29sWRpPEjon1k/mNUiFxRu0vADO+vt5m/WEXY6tH2KILKhlTIrrKD7k+S9A5F 7xxVw2M/+aG9T2T7+RZxAHG0UvwDFGF+UggkDWp5MiHTpxYyUX8ESdThg3tSMZ9rH5YK W9dJUbWtPOefY9EQBXnrpiP5aUv4hMzp/ZOjkzzo5rZ6asL0M8gOtHwBMPEFdK7eZEpN 5OVV3lWHpAN84jLMcJRuIzGOQL85XjBVeek0vOZAj3KJlwkf6J0tyGdNmvso4574D9Sz CmvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=TpCIcPN2/a2o5PvDvEnm08+l4Wu3CNSrvf+n08lRdPg=; b=BMvmGlXfQQNAzqdWjMZdM4uYoMm/Hm/UxDsCiniKbrWWEcgzKCQ27X2XzQbQ4yJbru oYpiGYcE9XvevRki+2y+gyePIjGBT6wVhQ5Dmq75rT8g03bt0pFheljOmoiqm4QMnEgQ nykqhgEduVkP4THm3182uMvGqiKinnouNWQugywu70ApIGKRVFPf8KKwBKvEcqatr/RQ FZN2MRFihjr1tVQGRcNTy1dlnH5oON3ly/JFe0jQ76/dVAhu3CzjjmVE21h7FRIDFQzm kB7dxI+KIYoSbz/3TyMa/7wJU4PuKfHW/D5s1eHdIPxLZS2UUW5AndhEGmcla5Vc+qhm 6fkA== X-Gm-Message-State: AJaThX7bofLrHLXW0h4WILflAws/TAbRzcg8/fIQ0V01vM2Xf2uwpM8N 4j7WquCMtb+31i8OYKAea2ozu93/CNxYyHw9ki3IhQ== X-Google-Smtp-Source: AGs4zMa9YL8ke0k5rT6pEPLxv1wfhEnua7M9qzrvYtoZ6dBsNXzeh3MBhVc6y4/FDcvgQT9PNrOZLWQUABxg6J/E8Ks= X-Received: by 10.157.32.194 with SMTP id x60mr3837967ota.345.1510469877609; Sat, 11 Nov 2017 22:57:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.13.100 with HTTP; Sat, 11 Nov 2017 22:57:57 -0800 (PST) In-Reply-To: References: From: Vicky Kak Date: Sun, 12 Nov 2017 12:27:57 +0530 Message-ID: Subject: Re: Corrupted state file when Jobs are being run in parallel. To: user@gobblin.incubator.apache.org Content-Type: multipart/alternative; boundary="94eb2c04f80a7fba06055dc3acbe" archived-at: Sun, 12 Nov 2017 06:58:05 -0000 --94eb2c04f80a7fba06055dc3acbe Content-Type: text/plain; charset="UTF-8" Hi Mohan, I am not sure how you are using the Gobblin. You can follow the email chain and find out how we are using. You can enable debug logging via the log4j properties, I did some sample application sometime back where I had tried it https://github.com/dallaybatta/gobblin-examples/blob/master/src/main/resources/log4j.properties You may read this https://lists.apache.org/thread.html/3f73f7c32e6e92a9533bded39b51e22a03c513d86ae8813ba7097a12@%3Cuser.gobblin.apache.org%3E Please post your queries in the separate post, having unrelated question in the same thread will create confusion and will not be useful for referring later too. Thanks, Vicky On Sat, Nov 11, 2017 at 7:39 PM, Mohan wrote: > Could you please tell me how to perform stress test on Gobblin > And please let me know how to enable debug option in log file. Thanks > > On Nov 11, 2017 7:29 PM, "Vicky Kak" wrote: > >> Hi Guys, >> >> I have been running the stress tests and am seeing the following errors >> >> Error 1 >> ************************************************************ >> ********************************************************************* >> 017-11-11 11:20:56 UTC INFO [pool-11-thread-421] >> org.apache.hadoop.fs.FSInputChecker 284 - Found checksum error: b[0, >> 512]=53455106196f72672e6170616368652e6861646f6f702e696f2e546 >> 5787425676f62626c696e2e72756e74696d652e4a6f62537461746524446 >> 17461736574537461746501012a6f72672e6170616368652e6861646f6f7 >> 02e696f2e636f6d70726573732e44656661756c74436f646563000000004 >> 4e218b9e6aad3f1aa97f2210fb5c7f0ffffffff44e218b9e6aad3f1aa97f >> 2210fb5c7f00109789c6304000002000209789c630000000100010b789ce >> bb3d50200025100f68e0ab4789ced5b7b73dbc611971c3b8d5ff233b6d32 >> 4ad861337e9d804013e4451692643d1924c51a26489962da71ece0138902 >> 70238f870904479fc1592ffdb4fd1e9f4b364a6dfa3ff770f0fbe244384e >> ca6c998d2f081bbddc5deede2f6777bcbcf974da275d8266ae1a543ce90c >> 629dbf44cb3a9e48ae93daa3663fa9b4a419195ac5c2a147373a5a9a9e9e >> 6df4adf3c0a3ee7ff39e5ff5da7172b1bebebd54663097aa6a6c52b995c9 >> 923b7333e79530e15f93954e41f8122d7fe0d6f8f6fe0805bb291855d076 >> 9f8aee14b961c102da17d4625576b630b5d7ae561d6954c64b7ce75d8174 >> 2098639b4f036c348772835250b1dbae4084f672fba1c1a2d89e85f15903 >> 1870d944fe7545d4be70b46313d5f9071ba24e772459445322aea331479b >> c2df96f1e33bf6d73eeb80b998c4d74506c1f3349a356c627ca4a72467c520637fa9e >> org.apache.hadoop.fs.ChecksumException: Checksum error: >> file:/home/Installable/gobblin-dist/working-dir/state-store/ >> FlickrPageExtractorPull_137/current.jst at 0 exp: 36820587 got: 91149211 >> at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecke >> r.java:322) >> at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInpu >> tChecker.java:278) >> at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker. >> java:213) >> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker. >> java:231) >> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker. >> java:195) >> at java.io.DataInputStream.readFully(DataInputStream.java:195) >> at java.io.DataInputStream.readFully(DataInputStream.java:169) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile. >> java:1845) >> at org.apache.hadoop.io.SequenceFile$Reader.initialize( >> SequenceFile.java:1810) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile >> .java:1759) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile >> .java:1773) >> at gobblin.runtime.FsDatasetStateStore.getAll(FsDatasetStateSto >> re.java:119) >> at gobblin.runtime.FsDatasetStateStore.getLatestDatasetStatesBy >> Urns(FsDatasetStateStore.java:173) >> at gobblin.runtime.JobContext.(JobContext.java:136) >> at gobblin.runtime.AbstractJobLauncher.(AbstractJobLaunch >> er.java:131) >> at gobblin.runtime.local.LocalJobLauncher.(LocalJobLaunch >> er.java:62) >> at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLaunche >> rFactory.java:80) >> at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLaunche >> rFactory.java:59) >> at com.bph.JobLauncherResource.search(JobLauncherResource.java: >> 107) >> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at com.linkedin.restli.internal.server.RestLiMethodInvoker.doIn >> voke(RestLiMethodInvoker.java:186) >> at com.linkedin.restli.internal.server.RestLiMethodInvoker.invo >> ke(RestLiMethodInvoker.java:141) >> at com.linkedin.restli.server.RestLiServer.handleResourceReques >> t(RestLiServer.java:286) >> at com.linkedin.restli.server.RestLiServer.doHandleRequest(Rest >> LiServer.java:167) >> at com.linkedin.restli.server.BaseRestServer.handleRequest(Base >> RestServer.java:56) >> at com.linkedin.restli.server.DelegatingTransportDispatcher.han >> dleRestRequest(DelegatingTransportDispatcher.java:56) >> at com.linkedin.r2.filter.transport.DispatcherRequestFilter.onR >> estRequest(DispatcherRequestFilter.java:81) >> at com.linkedin.r2.filter.FilterChainImpl$RestRequestFilterAdap >> ter.onRequest(FilterChainImpl.java:328) >> at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilt >> er.java:55) >> at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterC >> hainIterator.java:50) >> at com.linkedin.r2.filter.compression.ServerCompressionFilter.o >> nRestRequest(ServerCompressionFilter.java:126) >> at com.linkedin.r2.filter.FilterChainImpl$RestRequestFilterAdap >> ter.onRequest(FilterChainImpl.java:328) >> at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilt >> er.java:55) >> at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterC >> hainIterator.java:50) >> at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilt >> er.java:59) >> at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterC >> hainIterator.java:50) >> at com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterC >> hainImpl.java:103) >> at com.linkedin.r2.filter.transport.FilterChainDispatcher.handl >> eRestRequest(FilterChainDispatcher.java:74) >> at com.linkedin.r2.transport.http.server.HttpDispatcher.handleR >> equest(HttpDispatcher.java:95) >> at com.linkedin.r2.transport.http.server.HttpDispatcher.handleR >> equest(HttpDispatcher.java:62) >> at com.linkedin.r2.transport.http.server.HttpNettyServer$Handle >> r.messageReceived(HttpNettyServer.java:171) >> at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleU >> pstream(SimpleChannelUpstreamHandler.java:80) >> at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream( >> DefaultChannelPipeline.java:545) >> at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChanne >> lHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) >> at org.jboss.netty.handler.execution.ChannelEventRunnable.run(C >> hannelEventRunnable.java:69) >> at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPo >> olExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExe >> cutor.java:316) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> 2017-11-11 11:20:56 UTC ERROR [pool-11-thread-421] >> com.bph.JobLauncherResource 110 - Job Id fk_137 failed while searching >> key beryls Failed to create job launcher: org.apache.hadoop.fs.ChecksumException: >> Checksum error: file:/home/Installable/gobblin >> -dist/working-dir/state-store/FlickrPageExtractorPull_137/current.jst at >> 0 exp: 36820587 got: 91149211 >> 2017-11-11 11:20:56 UTC INFO [pool-11-thread-402] >> gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: >> java.util.concurrent.ThreadPoolExecutor@6bce96a5[Shutting down, pool >> size = 1, active threads = 0, queued tasks = 0, completed tasks = 1] >> 2017-11-11 11:20:56 UTC INFO [pool-11-thread-402] >> gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: >> java.util.concurrent.ThreadPoolExecutor@6bce96a5[Terminated, pool size = >> 0, active threads = 0, queued tasks = 0, completed tasks = 1] >> >> ************************************************************ >> ********************************************************************* >> >> Error 2: >> ************************************************************ >> ********************************************************************* >> >> 2017-11-10 10:24:10 UTC WARN [pool-11-thread-13] >> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker 154 - >> Problem opening checksum file: file:/home/Installable/gobblin >> -dist/working-dir/state-store/YoutubePageExtractorPull_138/current.jst. >> Ignoring exception: >> java.io.EOFException >> at java.io.DataInputStream.readFully(DataInputStream.java:197) >> at java.io.DataInputStream.readFully(DataInputStream.java:169) >> at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputCheck >> er.(ChecksumFileSystem.java:146) >> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSys >> tem.java:339) >> at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFi >> le.java:1832) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile >> .java:1752) >> at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile >> .java:1773) >> at gobblin.runtime.FsDatasetStateStore.getAll(FsDatasetStateSto >> re.java:119) >> at gobblin.runtime.FsDatasetStateStore.getLatestDatasetStatesBy >> Urns(FsDatasetStateStore.java:173) >> at gobblin.runtime.JobContext.(JobContext.java:136) >> at gobblin.runtime.AbstractJobLauncher.(AbstractJobLaunch >> er.java:131) >> at gobblin.runtime.local.LocalJobLauncher.(LocalJobLaunch >> er.java:62) >> at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLaunche >> rFactory.java:80) >> at gobblin.runtime.JobLauncherFactory.newJobLauncher(JobLaunche >> rFactory.java:59) >> at com.bph.JobLauncherResource.search(JobLauncherResource.java: >> 107) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at com.linkedin.restli.internal.server.RestLiMethodInvoker.doIn >> voke(RestLiMethodInvoker.java:186) >> at com.linkedin.restli.internal.server.RestLiMethodInvoker.invo >> ke(RestLiMethodInvoker.java:141) >> at com.linkedin.restli.server.RestLiServer.handleResourceReques >> t(RestLiServer.java:286) >> at com.linkedin.restli.server.RestLiServer.doHandleRequest(Rest >> LiServer.java:167) >> at com.linkedin.restli.server.BaseRestServer.handleRequest(Base >> RestServer.java:56) >> at com.linkedin.restli.server.DelegatingTransportDispatcher.han >> dleRestRequest(DelegatingTransportDispatcher.java:56) >> at com.linkedin.r2.filter.transport.DispatcherRequestFilter.onR >> estRequest(DispatcherRequestFilter.java:81) >> at com.linkedin.r2.filter.FilterChainImpl$RestRequestFilterAdap >> ter.onRequest(FilterChainImpl.java:328) >> at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilt >> er.java:55) >> at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterC >> hainIterator.java:50) >> at com.linkedin.r2.filter.compression.ServerCompressionFilter.o >> nRestRequest(ServerCompressionFilter.java:126) >> at com.linkedin.r2.filter.FilterChainImpl$RestRequestFilterAdap >> ter.onRequest(FilterChainImpl.java:328) >> at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilt >> er.java:55) >> at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterC >> hainIterator.java:50) >> at com.linkedin.r2.filter.ComposedFilter.onRequest(ComposedFilt >> er.java:59) >> at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterC >> hainIterator.java:50) >> at com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterC >> hainImpl.java:103) >> at com.linkedin.r2.filter.transport.FilterChainDispatcher.handl >> eRestRequest(FilterChainDispatcher.java:74) >> at com.linkedin.r2.transport.http.server.HttpDispatcher.handleR >> equest(HttpDispatcher.java:95) >> at com.linkedin.r2.transport.http.server.HttpDispatcher.handleR >> equest(HttpDispatcher.java:62) >> at com.linkedin.r2.transport.http.server.HttpNettyServer$Handle >> r.messageReceived(HttpNettyServer.java:171) >> at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleU >> pstream(SimpleChannelUpstreamHandler.java:80) >> at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream( >> DefaultChannelPipeline.java:545) >> at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChanne >> lHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) >> at org.jboss.netty.handler.execution.ChannelEventRunnable.run(C >> hannelEventRunnable.java:69) >> at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPo >> olExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExe >> cutor.java:316) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> 2017-11-10 10:24:11 UTC ERROR [pool-11-thread-13] >> com.bph.JobLauncherResource 110 - Job Id yt_138 failed while searching >> key ostfold Failed to create job launcher: java.io.EOFException >> >> ************************************************************ >> ********************************************************************* >> >> Error 3 >> ************************************************************ >> ********************************************************************* >> 2017-11-10 13:38:49 UTC ERROR [Commit-thread-0] >> gobblin.runtime.SafeDatasetCommit 118 - Failed to persist dataset state >> for dataset of job job_TwitterPageExtractorPull_135_1510321111647 >> java.io.FileNotFoundException: Failed to rename >> /home/Installable/gobblin-dist/working-dir/state-store/Twitt >> erPageExtractorPull_135/_tmp_/current.jst to >> /home/Installable/gobblin-dist/working-dir/state-store/Twitt >> erPageExtractorPull_135/current.jst: src not found >> at gobblin.util.HadoopUtils.renamePath(HadoopUtils.java:173) >> at gobblin.util.HadoopUtils.renamePath(HadoopUtils.java:164) >> at gobblin.util.HadoopUtils.copyFile(HadoopUtils.java:333) >> at gobblin.metastore.FsStateStore.createAlias(FsStateStore.java:283) >> at gobblin.runtime.FsDatasetStateStore.persistDatasetState(FsDa >> tasetStateStore.java:221) >> at gobblin.runtime.SafeDatasetCommit.persistDatasetState(SafeDa >> tasetCommit.java:255) >> at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:115) >> at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:43) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at java.util.concurrent.Executors$RunnableAdapter.call( >> Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> ************************************************************ >> ********************************************************************* >> >> There errors are seeing during the stress tests for the same Jobs. For >> our use case we can't afford the jobs to fail due to system issue like >> above. I did some basic investigation and could find the issue could be >> happening to to non atomic operations on the state file which is of >> extension .jst. It seems we could disable the statestore, I looked at the >> following code in gobblin.runtime.JobContext::createStateStore >> ************************************************************ >> ********************************************************************* >> if (jobProps.containsKey(ConfigurationKeys.STATE_STORE_ENABLED) && >> !Boolean.parseBoolean(jobProps.getProperty(ConfigurationKeys.STATE_STORE_ENABLED))) >> { >> return new NoopDatasetStateStore(stateStoreFs, stateStoreRootDir); >> } else { >> return new FsDatasetStateStore(stateStoreFs, stateStoreRootDir); >> } >> ************************************************************ >> ********************************************************************* >> >> It seems that by disabling the statestore we may get over this issue, but >> for our case the source implementation is passing the information to the >> Extractor via the WorkUnitStoreState. >> >> >> We don't want the Job Retry features and hence did disable it as >> explained here >> https://gobblin.readthedocs.io/en/latest/user-guide/Configur >> ation-Properties-Glossary/#retry-properties >> >> I was expecting the disabling happening by setting the follwing only >> workunit.retry.enabled=false >> we have to set this also >> task.maxretries=0 >> As we don't rely on retries would it not be good to have a flag what will >> ignore the the following calls when we have have >> workunit.retry.enabled=false >> >> 1) Reading the initial value from the store >> 2) Commit the final state to the store. >> >> As mentioned about we can't disable the state store as we are generating >> some data in the Source implementation and passed to the corresponding >> Extractor implementation via State. >> >> I do anticipate of having these issues in GAAS too. >> >> I will be working to fix this issue as this is a critical issue for us. >> >> Thanks, >> Vicky >> > --94eb2c04f80a7fba06055dc3acbe Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Mohan,

I am not sure how you are usi= ng the Gobblin. You can follow the email chain and find out how we are usin= g.
You can enable debug logging via the log4j properties, I did s= ome sample application sometime back where I had tried it=C2=A0
<= a href=3D"https://github.com/dallaybatta/gobblin-examples/blob/master/src/m= ain/resources/log4j.properties">https://github.com/dallaybatta/gobblin-exam= ples/blob/master/src/main/resources/log4j.properties

=
You may read this=C2=A0

Please post your queries in the separ= ate post, having unrelated question in the same thread will create confusio= n and will not be useful for referring later too.

= Thanks,
Vicky

On Sat, Nov 11, 2017 at 7:39 PM, Mohan = <mohandoss.t= r@gmail.com> wrote:
Could you please tell me how to perform=C2=A0 stress test on Gob= blin
And please let me know how to enable debug option in = log file. Thanks

On Nov 11, 2017 7:29 PM,= "Vicky Kak" <vicky.kak@gmail.com> wrote:
Hi Guys,

= I have been running the stress tests and am seeing the following errors
Error 1
**************************************= *****************************************************************= **************************
017-11-11 11:20:56 UTC INFO= =C2=A0 [pool-11-thread-421] org.apache.hadoop.fs.FSInputChecker=C2=A0 = 284 - Found checksum error: b[0, 512]=3D53455106196f72672e617061636865= 2e6861646f6f702e696f2e5465787425676f62626c696e2e72756e74696d652e4= a6f6253746174652444617461736574537461746501012a6f72672e6170616368= 652e6861646f6f702e696f2e636f6d70726573732e44656661756c74436f64656= 30000000044e218b9e6aad3f1aa97f2210fb5c7f0ffffffff44e218b9e6aad3f1= aa97f2210fb5c7f00109789c6304000002000209789c630000000100010b789ce= bb3d50200025100f68e0ab4789ced5b7b73dbc611971c3b8d5ff233b6d32= 4ad861337e9d804013e4451692643d1924c51a26489962da71ece013890270238= f870904479fc1592ffdb4fd1e9f4b364a6dfa3ff770f0fbe244384eca6c998d2f= 081bbddc5deede2f6777bcbcf974da275d8266ae1a543ce90c629dbf44cb3a9e4= 8ae93daa3663fa9b4a419195ac5c2a147373a5a9a9e9e6df4adf3c0a3ee7ff39e= 5ff5da7172b1bebebd54663097aa6a6c52b995c9923b7333e79530e15f93954e4= 1f8122d7fe0d6f8f6fe0805bb291855d0769f8aee14b961c102da17d4625576b6= 30b5d7ae561d6954c64b7ce75d81742098639b4f036c348772835250b1db= ae4084f672fba1c1a2d89e85f159031870d944fe7545d4be70b46313d5f9071ba= 24e772459445322aea331479bc2df96f1e33bf6d73eeb80b998c4d74506c1f334= 9a356c627ca4a72467c520637fa9e
org.apache.hadoop.fs.ChecksumE= xception: Checksum error: file:/home/Installable/gobblin-dist/wor= king-dir/state-store/FlickrPageExtractorPull_137/current.jst at 0= exp: 36820587 got: 91149211
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.a= pache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:322= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:278)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.fs.FSInputChecker.fill(FSInp= utChecker.java:213)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apach= e.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:231)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.fs.FSInputChecker.= read(FSInputChecker.java:195)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= java.io.DataInputStream.readFully(DataInputStream.java:195)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.DataInputStream.readFully(Data= InputStream.java:169)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.Sequ= enceFile$Reader.init(SequenceFile.java:1845)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.initialize(Seque= nceFile.java:1810)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.Sequenc= eFile$Reader.<init>(SequenceFile.java:1759)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.<init>(= SequenceFile.java:1773)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobbl= in.runtime.FsDatasetStateStore.getAll(FsDatasetStateStore.java:11= 9)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.FsDatasetState<= wbr>Store.getLatestDatasetStatesByUrns(FsDatasetStateStore.java:1= 73)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.JobContext.<= ;init>(JobContext.java:136)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a= t gobblin.runtime.AbstractJobLauncher.<init>(AbstractJobLauncher.java:131)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.lo= cal.LocalJobLauncher.<init>(LocalJobLauncher.java:62)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.JobLauncherFactory= .newJobLauncher(JobLauncherFactory.java:80)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at gobblin.runtime.JobLauncherFactory.newJobLauncher(Job= LauncherFactory.java:59)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.bph.JobLauncherResource.s= earch(JobLauncherResource.java:107)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown S= ource)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.DelegatingMetho= dAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.reflect.Method.invoke(Met= hod.java:498)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.= internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.= java:186)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli= .internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.j= ava:141)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.= server.RestLiServer.handleResourceRequest(RestLiServer.java:286)<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.server.RestLiServer.doHandleRequest(RestLiServer.java:167)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at com.linkedin.restli.server.BaseRestServer.handleR= equest(BaseRestServer.java:56)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a= t com.linkedin.restli.server.DelegatingTransportDispatcher.handle= RestRequest(DelegatingTransportDispatcher.java:56)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.transport.DispatcherReq= uestFilter.onRestRequest(DispatcherRequestFilter.java:81)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainIm= pl$RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.Compose= dFilter.onRequest(ComposedFilter.java:55)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at com.linkedin.r2.filter.FilterChainIterator.onRequest(Fil= terChainIterator.java:50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com= .linkedin.r2.filter.compression.ServerCompressionFilter.onRestReq= uest(ServerCompressionFilter.java:126)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at com.linkedin.r2.filter.FilterChainImpl$RestRequestFilterAdap= ter.onRequest(FilterChainImpl.java:328)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at com.linkedin.r2.filter.ComposedFilter.onRequest(Compo= sedFilter.java:55)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linked= in.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.ja= va:50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.Comp= osedFilter.onRequest(ComposedFilter.java:59)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainIterator.onRequ= est(FilterChainIterator.java:50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0= at com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterCh= ainImpl.java:103)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.= filter.transport.FilterChainDispatcher.handleRestRequest(FilterCh= ainDispatcher.java:74)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.li= nkedin.r2.transport.http.server.HttpDispatcher.handleRequest(Http= Dispatcher.java:95)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r= 2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatch= er.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.transp= ort.http.server.HttpNettyServer$Handler.messageReceived(HttpNetty= Server.java:171)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.ne= tty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleCha= nnelUpstreamHandler.java:80)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at = org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(Defau= ltChannelPipeline.java:545)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at o= rg.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandl= erContext.sendUpstream(DefaultChannelPipeline.java:754)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.execution.Chan= nelEventRunnable.run(ChannelEventRunnable.java:69)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.execution.OrderedMemor= yAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThre= adPoolExecutor.java:316)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java= .util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.Th= readPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)=
2017-11-11 11:20:56 UTC ERROR [pool-11-thread-421] com.bph.JobLa= uncherResource=C2=A0 110 -=C2=A0 Job Id fk_137 failed while searching key b= eryls Failed to create job launcher: org.apache.hadoop.fs.ChecksumExce= ption: Checksum error: file:/home/Installable/gobblin-dist/working-dir= /state-store/FlickrPageExtractorPull_137/current.jst at 0 exp: 36= 820587 got: 91149211
2017-11-11 11:20:56 UTC INFO=C2=A0 [pool-11-= thread-402] gobblin.util.ExecutorsUtils=C2=A0 125 - Attempting to shutdown = ExecutorService: java.util.concurrent.ThreadPoolExecutor@6bce96a5[Shut= ting down, pool size =3D 1, active threads =3D 0, queued tasks =3D 0, compl= eted tasks =3D 1]
2017-11-11 11:20:56 UTC INFO=C2=A0 [pool-11-thr= ead-402] gobblin.util.ExecutorsUtils=C2=A0 144 - Successfully shutdown Exec= utorService: java.util.concurrent.ThreadPoolExecutor@6bce96a5[Terminat= ed, pool size =3D 0, active threads =3D 0, queued tasks =3D 0, completed ta= sks =3D 1]

****************************= *****************************************************************************************************

Error 2:
*************************************= *****************************************************************= ***************************

2017-1= 1-10 10:24:10 UTC WARN=C2=A0 [pool-11-thread-13] org.apache.hadoop.fs.Check= sumFileSystem$ChecksumFSInputChecker=C2=A0 154 - Problem opening = checksum file: file:/home/Installable/gobblin-dist/working-dir/state-s= tore/YoutubePageExtractorPull_138/current.jst.=C2=A0 Ignoring exc= eption:
java.io.EOFException
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at java.io.DataInputStream.readFully(DataInputStream.java:197)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.DataInputStream.readFull= y(DataInputStream.java:169)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.ap= ache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init= >(ChecksumFileSystem.java:146)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceF<= wbr>ile$Reader.openFile(SequenceFile.java:1832)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.<init>(Sequence= File.java:1752)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.Seque= nceFile$Reader.<init>(SequenceFile.java:1773)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.FsDatasetStateStore.getAll= (FsDatasetStateStore.java:119)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a= t gobblin.runtime.FsDatasetStateStore.getLatestDatasetStatesByUrn= s(FsDatasetStateStore.java:173)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = at gobblin.runtime.JobContext.<init>(JobContext.java:136)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.AbstractJobLauncher= .<init>(AbstractJobLauncher.java:131)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at gobblin.runtime.local.LocalJobLauncher.<init>(L= ocalJobLauncher.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobb= lin.runtime.JobLauncherFactory.newJobLauncher(JobLauncherFactory.= java:80)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at gobblin.runtime.JobLaunch= erFactory.newJobLauncher(JobLauncherFactory.java:59)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.bph.JobLauncherResource.search(JobLauncherResource.java:= 107)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.NativeMethod= AccessorImpl.invoke0(Native Method)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.De= legatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja= va:43)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.reflect.Method.in= voke(Method.java:498)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.lin= kedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiM= ethodInvoker.java:186)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.li= nkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMe= thodInvoker.java:141)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.lin= kedin.restli.server.RestLiServer.handleResourceRequest(RestLiServ= er.java:286)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.s= erver.RestLiServer.doHandleRequest(RestLiServer.java:167)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.restli.server.BaseRestS= erver.handleRequest(BaseRestServer.java:56)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at com.linkedin.restli.server.DelegatingTransportDispatc= her.handleRestRequest(DelegatingTransportDispatcher.java:56)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.transport.= DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.jav= a:81)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.Filte= rChainImpl$RestRequestFilterAdapter.onRequest(FilterChainImpl.java:328)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter= .ComposedFilter.onRequest(ComposedFilter.java:55)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainIterator= .onRequest(FilterChainIterator.java:50)
=C2=A0 =C2=A0 =C2=A0= =C2=A0 at com.linkedin.r2.filter.compression.ServerCompressionFilter.= onRestRequest(ServerCompressionFilter.java:126)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainImpl$RestReq= uestFilterAdapter.onRequest(FilterChainImpl.java:328)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.ComposedFilter.o= nRequest(ComposedFilter.java:55)
=C2=A0 =C2=A0 =C2=A0 =C2=A0= at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterCh= ainIterator.java:50)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.= r2.filter.ComposedFilter.onRequest(ComposedFilter.java:59)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainI= terator.onRequest(FilterChainIterator.java:50)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at com.linkedin.r2.filter.FilterChainImpl.onRestRequest= (FilterChainImpl.java:103)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at co= m.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRe= quest(FilterChainDispatcher.java:74)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at com.linkedin.r2.transport.http.server.HttpDispatcher.handleR= equest(HttpDispatcher.java:95)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a= t com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequ= est(HttpDispatcher.java:62)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.li= nkedin.r2.transport.http.server.HttpNettyServer$Handler.messageRe= ceived(HttpNettyServer.java:171)
=C2=A0 =C2=A0 =C2=A0 =C2=A0= at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUp= stream(SimpleChannelUpstreamHandler.java:80)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.jboss.netty.channel.DefaultChannelPipeline.sendUp= stream(DefaultChannelPipeline.java:545)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.jboss.netty.channel.DefaultChannelPipeline$Defaul= tChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:7= 54)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.e= xecution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.handler.execut= ion.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(Ordere= dMemoryAwareThreadPoolExecutor.java:316)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.runWorker(Thread= PoolExecutor.java:1142)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.= util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:617)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thr= ead.java:745)
2017-11-10 10:24:11 UTC ERROR [pool-11-thread-= 13] com.bph.JobLauncherResource=C2=A0 110 -=C2=A0 Job Id yt_138 failed whil= e searching key ostfold Failed to create job launcher: java.io.EOFException=

*********************************= *****************************************************************= *******************************

Error 3=
******************************************************= *****************************************************************= **********
2017-11-10 13:38:49 UTC ERROR [Commit-= thread-0] gobblin.runtime.SafeDatasetCommit=C2=A0 118 - Failed to pers= ist dataset state for dataset=C2=A0 of job job_TwitterPageExtractorPull_135_1510321111647
java.io.FileNotFoundException: Failed to rena= me /home/Installable/gobblin-dist/working-dir/state-store/Twitter= PageExtractorPull_135/_tmp_/current.jst to /home/Installable/gobblin-d= ist/working-dir/state-store/TwitterPageExtractorPull_135/current.jst: src not found
at gobblin.util.HadoopUtils.renamePath(HadoopUtils.java:173)
=
at gobblin.util.HadoopUti= ls.renamePath(HadoopUtils.java:164)
at gobblin.util.HadoopUtils.copyFile(HadoopUtils.= java:333)
at gobblin= .metastore.FsStateStore.createAlias(FsStateStore.java:283)
<= div> at gobblin.runtime.FsDatas= etStateStore.persistDatasetState(FsDatasetStateStore.java:221)
at gobblin.runtime.Saf= eDatasetCommit.persistDatasetState(SafeDatasetCommit.java:255)
at gobblin.runtime.Saf= eDatasetCommit.call(SafeDatasetCommit.java:115)
at gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:43)
at java.util.concurrent.FutureTask.run(FutureTas= k.java:266)
at java.= util.concurrent.Executors$RunnableAdapter.call(Executors.java:511= )
at java.util.concu= rrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.= runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Wo= rker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
<= /div>
*******************************************************= *****************************************************************= *********

There errors are seeing durin= g the stress tests for the same Jobs. For our use case we can't afford = the jobs to fail due to system issue like above. I did some basic investiga= tion and could find the issue could be happening to to non atomic operation= s on the state file which is of extension .jst. It seems we could disable t= he statestore, I looked at the following code in gobblin.runtime.JobContext= ::createStateStore
************************************= *****************************************************************= ****************************
if (jobProps.containsKey(C= onfigurationKeys.STATE_STORE_ENABLED) &&
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 !Boolean.parseBoolean(jobProps.getProperty(Configura= tionKeys.STATE_STORE_ENABLED))) {
=C2=A0 =C2=A0 =C2=A0 retur= n new NoopDatasetStateStore(stateStoreFs, stateStoreRootDir);
=C2=A0 =C2=A0 } else {
=C2=A0 =C2=A0 =C2=A0 return new FsDatase= tStateStore(stateStoreFs, stateStoreRootDir);
=C2=A0 =C2=A0 = }
***********************************************= *****************************************************************= *****************

It seems t= hat by disabling the statestore we may get over this issue, but for our cas= e the source implementation is passing the information to the Extractor via= the WorkUnitStoreState.


We don'= ;t want the Job Retry features and hence did disable it as explained here

I= was expecting the disabling happening by setting the follwing only
workuni= t.retry.enabled=3Dfalse
we have to set this also

task.maxretries= =3D0

As we don't rely on retries would it not be good to= have a flag what will ignore the the following calls when we have have wor= kunit.retry.enabled=3Dfalse

1) Reading the initial= value from the store
2) Commit the final state to the store.

As mentioned about we can't disable the state sto= re as we are generating some data in the Source implementation and passed t= o the corresponding Extractor implementation via State.=C2=A0
I do anticipate of having these issues in GAAS too.
<= br>
I will be working to fix this issue as this is a critical iss= ue for us.

Thanks,
Vicky

--94eb2c04f80a7fba06055dc3acbe--