gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Kalhans <rohit.kalh...@gmail.com>
Subject Query about using gobblin streaming.
Date Mon, 29 Jan 2018 18:11:23 GMT
*Hello, Sorry if my last similar email was delivered, resending just to be
sure.*

*We have recently started evaluating gobblin for ingestion purpose. As it
turns out, we specifically hit these road-blocks.*

*1. When using Kafka to Kafka streaming, I keep hitting this error post
which the ingestion streaming stops. *

ERROR  [22:54:35.306] [kafka-producer-network-thread | gobblin]
o.a.k.c.u.KafkaThread [KafkaThread.java:30]  -  Uncaught exception in
kafka-producer-network-thread | gobblin:
java.lang.AssertionError: The acknowledgement counter for this watermark
went negative. Please file a bug!
at org.apache.gobblin.writer.AcknowledgableWatermark.ack(
AcknowledgableWatermark.java:42)
at org.apache.gobblin.stream.StreamEntity.ack(StreamEntity.java:82)
at org.apache.gobblin.writer.AsyncWriterManager$1.
onSuccess(AsyncWriterManager.java:321)
at org.apache.gobblin.writer.AsyncWriterManager$1.
onSuccess(AsyncWriterManager.java:316)
at org.apache.gobblin.kafka.writer.Kafka09DataWriter$2.onCompletion(
Kafka09DataWriter.java:124)
at org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.
java:97)
at org.apache.kafka.clients.producer.internals.Sender.
completeBatch(Sender.java:299)
at org.apache.kafka.clients.producer.internals.Sender.
handleProduceResponse(Sender.java:260)
at org.apache.kafka.clients.producer.internals.Sender.
access$100(Sender.java:56)
at org.apache.kafka.clients.producer.internals.Sender$1.
onComplete(Sender.java:342)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
at java.lang.Thread.run(Thread.java:748)


*Here are the key points:*
*1. Gobblin is running in the embedded mode.  It is a part of a  bigger
application.*
*2. here is the config for the job in Json format. the json is converted to
properties internally*
   "config":{
  "job.lock.enabled": "false",
  "task.executionMode": "STREAMING",
  "gobblin.streaming.kafka.topic.key.deserializer":
"org.apache.kafka.common.serialization.StringDeserializer",
  "gobblin.streaming.kafka.topic.value.deserializer":
"org.apache.kafka.common.serialization.ByteArrayDeserializer",
  "source.class": "org.apache.gobblin.source.extractor.extract.kafka.
KafkaSimpleStreamingSource",
  "gobblin.streaming.kafka.topic.singleton": "archive",
  "kafka.brokers": "localhost:9092",
  "streaming.watermarkStateStore.type": "mysql",
  "state.store.db.url": "jdbc:mysql://localhost:3306/Saboo",
  "state.store.db.user": "saboo_root",
  "state.store.db.password": "sabootest",
  "streaming.watermarkStateStore.config.state.store.zk.connectString":
"localhost:2181",
  "streaming.watermark.commitIntervalMillis": "2000",
  "converter.classes": "org.apache.gobblin.converter.SamplingConverter",
  "converter.sample.ratio": "0.10",
  "writer.builder.class": "org.apache.gobblin.kafka.
writer.KafkaDataWriterBuilder",
  "writer.kafka.topic": "archive2",
  "writer.kafka.producerConfig.bootstrap.servers": "localhost:9092",
  "writer.kafka.producerConfig.value.serializer": "org.apache.kafka.common.
serialization.ByteArraySerializer",
  "data.publisher.type": "org.apache.gobblin.publisher.NoopPublisher"
}


*2. When using zk for state store, I keep getting this error which
terminates the job*

o.a.g.r.AbstractJobLauncher [AbstractJobLauncher.java:468]  -  Failed to
launch and run job job_kafkaStreaming_1517248081731:
java.lang.NoClassDefFoundError:
Could not initialize class org.apache.log4j.LogManager
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.log4j.LogManager
at org.apache.log4j.Logger.getLogger(Logger.java:117)
at org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.<clinit>(
ZkCacheBaseDataAccessor.java:50)
at org.apache.gobblin.metastore.ZkStateStore.<init>(ZkStateStore.java:89)
at org.apache.gobblin.metastore.ZkStateStoreFactory.createStateStore(
ZkStateStoreFactory.java:38)
at org.apache.gobblin.runtime.StateStoreBasedWatermarkStorage.<init>(
StateStoreBasedWatermarkStorage.java:101)
at org.apache.gobblin.runtime.TaskContext.getWatermarkStorage(
TaskContext.java:389)
at org.apache.gobblin.runtime.Task.<init>(Task.java:234)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.createTaskRunnable(
GobblinMultiTaskAttempt.java:363)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUnits(
GobblinMultiTaskAttempt.java:344)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.run(
GobblinMultiTaskAttempt.java:134)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.
runAndOptionallyCommitTaskAttempt(GobblinMultiTaskAttempt.java:369)
at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUnits(
GobblinMultiTaskAttempt.java:391)
at org.apache.gobblin.runtime.local.LocalJobLauncher.runWorkUnitStream(
LocalJobLauncher.java:142)
at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(
AbstractJobLauncher.java:443)
at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$
DriverRunnable.call(JobLauncherExecutionDriver.java:159)
at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$
DriverRunnable.call(JobLauncherExecutionDriver.java:147)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)



-- 
Cheerio!

*Rohit*

Mime
View raw message