incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: Sink file has omitted chunks?
Date Tue, 23 Nov 2010 07:47:51 GMT
"Omitted chunks" is an error. By definition, if chunks are omitted
they won't be there. Duplicates and other peculiarities will happen in
the event of failures. As you say, it's a consequence of the
distributed environment.

SimpleArchiver should do the cleanup you want.

--Ari

On Mon, Nov 22, 2010 at 11:39 PM, Ying Tang <ivytang0812@gmail.com> wrote:
> Hi all ,
>     After reading the chukwa docs , per my understanding , the log data flow
> is :
>     adaptor-->agent-->collector-->sink file--->....
>     In the doc says , " Data in the sink may include duplicate and omitted
> chunks."
>     And it is not recommanded to write MapReduce jobs that directly examine
> the data sink , "becaues  jobs will likely discard most of their input ".
>
>     Here is my question:
>     1. Why data in sink file include duplicate and ommitted chunks ? Because
> the distributed environmrnt ?
>     2. How to solve the problem above ?  The Simple Archiver generates the
> archive file , and duplicates have been removed . So the simple archiver can
> only solve the duplicate data , right?
>
> --
> Best regards,
> Ivy Tang
>
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message