flume-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dio Jin (Jira)" <j...@apache.org>
Subject [jira] [Created] (FLUME-3377) file channel data file corrupted, file channel can't be started
Date Thu, 30 Jul 2020 06:53:00 GMT
Dio Jin created FLUME-3377:
------------------------------

             Summary: file channel data file corrupted, file channel can't be started
                 Key: FLUME-3377
                 URL: https://issues.apache.org/jira/browse/FLUME-3377
             Project: Flume
          Issue Type: Bug
          Components: File Channel
    Affects Versions: 1.9.0
         Environment: Run in Kubernetes cluster, with 4 replicas, each of which has its own
separate persistent volume for file channel, and there were almost 95% free disk space when
the issue occured.
            Reporter: Dio Jin
         Attachments: flume.conf, flume_exception.log

Hi, we used flume 1.9.0 to ingest data from Kafka to HDFS, our config file is attached. It
ran smoothly for some time, however, it currently failed to ingest data and kept throwing
error logs, some important log is attached.  Per log, the file channel failed to be started
due to corrupted data file, and it tried relentlessly but always failed.  The flume instance
is hosted in Kubernetes and has 4 replicas, each of which has its own separate persistent
volume for file channel, and there was almost 95% free disk space when the issue occured.

So there are two questions, 
 # what is the cause for the corrupted data files? since it is our production apps, and we
trust flume's robustness, so we don't expect to see this corrupted data file. Moreover, how
could we avoid such corrupted data files?
 # How do we resume from this situation without losing any data in channel? Removing checkoutDir
and dataDir isn't acceptable.    

Thanks very much.

Here are some very key logs, full logs can be seen in attached file. 

org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)] Failed to start the
file channel [channel=channel2HDFS1]
2020-07-29T07:15:31.640949847Z java.lang.RuntimeException: org.apache.flume.channel.file.CorruptEventException:
Could not parse event from data file. 

2020-07-29T07:15:31.638860323Z at org.apache.flume.channel.file.TransactionEventRecord.fromByteArray(TransactionEventRecord.java:212)

...

2020-07-29T07:15:31.64750767Z 2020-07-29 00:15:31,646 (SinkRunner-PollingRunner-DefaultSinkProcessor)
[ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:158)] Unable to deliver
event. Exception follows.
2020-07-29T07:15:31.647539686Z java.lang.IllegalStateException: Channel closed [channel=channel2HDFS1].
Due to java.lang.RuntimeException: org.apache.flume.channel.file.CorruptEventException: Could
not parse event from data file.
2020-07-29T07:15:31.647552984Z at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:358)

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org


Mime
View raw message