chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ying Tang <>
Subject Sink file has omitted chunks?
Date Tue, 23 Nov 2010 07:39:04 GMT
Hi all ,
    After reading the chukwa docs , per my understanding , the log data flow
is :
    adaptor-->agent-->collector-->sink file--->....
    In the doc says , "* **Data in the sink may include duplicate and
omitted chunks*."
    And it is not recommanded to write MapReduce jobs that directly examine
the data sink , "*becaues ** jobs will likely discard most of their input*".

    Here is my question:
    1. Why data in sink file include duplicate and ommitted chunks ? Because
the distributed environmrnt ?
    2. How to solve the problem above ?  The Simple Archiver generates the
archive file , and duplicates have been removed . So the simple archiver can
only solve the duplicate data , right?

Best regards,

Ivy Tang

View raw message