chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: ChukwaAgent and MD% computation
Date Tue, 23 Jun 2009 22:40:01 GMT
In the current codebase, adaptor names are unique, and an attempt to
create a duplicate will just return the previous adaptor.  By default,
the adaptor name is the MD5 hash is taken over the adaptor name, data
type, and params.  This means you can have two different adaptors look
at a file, or two adaptors with different datatype tags, but not two
instances of the same adaptor.

Offset should NOT be included in that hash. If it is, it's a bug. And
a fairly subtle one, because the code doesn't, on its face, do any
such thing.  If you have a test case showing misbehavior, can you post
it?

Note, by the way, that anybody who creates an adaptor can specify any
name they like -- including the file name, or a hash thereof.  So
there's a really easy workaround, in the client library.

On Tue, Jun 23, 2009 at 3:29 PM, Jerome Boulon<jboulon@yahoo-inc.com> wrote:
> Hi,
> I have some questions on the synthesizeAdaptorID method from ChukwaAgent.
> In previous version we used to have a check on fileName to avoid adding the
> same adaptor for the same file twice.
>
> This code is no longer there. Is this what we really want?
>
> Also current MD5 could not be used to replace that functionality since the
> offset is included in the MD5 computation. Is there any plan to fix this?
>
> Thanks,
>  Jerome.
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message