gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vicky Kak <vicky....@gmail.com>
Subject Re: Zero byte file, need help on Gobbli
Date Mon, 13 Nov 2017 07:25:42 GMT
I understand that com.mmk.gobblin.ParquetDataWriterBuilder is creating the
Writer implementation which is writing to the HDFS in parquet format.
It seems that your custom writer implementation is having issue, I can help
much unless I have the full log details too.

I am wondering if you have got a chance to look at this pull request, you
can refer to this implementation as this too does write the data into HDFS
using parquet format.
https://github.com/apache/incubator-gobblin/pull/2106

Thanks,
Vicky

On Mon, Nov 13, 2017 at 12:32 PM, Mohan <mohandoss.tr@gmail.com> wrote:

> We are fetching data from Kafka topic every 5 min and load to HDFS while
> loading some time it's creating zero byte file
>
> bootstrap.with.offset=latest
> converter.classes=com.mmk.gobblin.LogMessageToAvroConverter
> data.publisher.final.dir=${env:DATA_DIR}
> data.publisher.permissions=775
> data.publisher.replace.final.dir=false
> data.publisher.type=gobblin.publisher.TimePartitionedDataPublisher
> extract.limit.enabled=true
> extract.limit.time.limit=3
> extract.limit.time.limit.timeunit=minutes
> extract.limit.type=time
> extract.namespace=mmk.extract.kafka
> job.description=Gobblin job to extract Hotel Avail logs
> job.lock.dir=${env:GOBBLIN_WORK_DIR}/${job.name}
> job.name=sample_job
> kafka.brokers=192.168.0.1:9092
> launcher.type=MAPREDUCE
> metrics.enabled=false
> metrics.report.interval=60000
> metrics.reporting.file.enabled=true
> metrics.log.dir=/app/gobblin/0.9.0/logs
> mr.job.root.dir=${env:GOBBLIN_WORK_DIR}/working
> reset.on.offset.out.of.range=nearest
> source.class=gobblin.source.extractor.extract.kafka.KafkaSimpleSource
> state.store.dir=${env:GOBBLIN_WORK_DIR}/${job.name}/statestore
> writer.builder.class=com.mmk.gobblin.ParquetDataWriterBuilder
> writer.destination.type=HDFS
> writer.dir.permissions=775
> writer.file.path=logs
> writer.file.permissions=644
> writer.output.dir=${env:GOBBLIN_WORK_DIR}/${job.name}/output
> writer.output.format=PARQUET
> writer.staging.dir=${env:GOBBLIN_WORK_DIR}/${job.name}/staging
> writer.partitioner.class=com.mmk.gobblin.writer.partitioner.
> MmkSchemaTimestampPartitioner
>
>
> On Nov 13, 2017 11:42 AM, "Vicky Kak" <vicky.kak@gmail.com> wrote:
>
>> Please explain your use case and attach the corresponding job
>> configuration and gobblin log file if possible.
>>
>> On Mon, Nov 13, 2017 at 11:02 AM, Mohan <mohandoss.tr@gmail.com> wrote:
>>
>>> Some time I'm getting zero byte parquet file, could you please tell me
>>> is there any reason and size of the data level
>>>
>>> What is the max range gobbling can without any issue.
>>>
>>
>>

Mime
View raw message