flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy <jimmyj...@gmail.com>
Subject hdfs.fileType = CompressedStream
Date Thu, 30 Jan 2014 18:51:03 GMT
I am running few tests and would like to confirm whether this is true...

hdfs.codeC = gzip
hdfs.fileType = CompressedStream
hdfs.writeFormat = Text
hdfs.batchSize = 100


now lets assume I have large number of transactions I roll file every 10
minutes

it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
if I dont use compression, the file will grow as data are written to HDFS

is this correct?

Do you see any drawback in using compressedstream and with very large
files? In my case 120MB compressed file (block size) is 10x uncompressed

Mime
View raw message