chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Boulon <>
Subject Re: ChukwaAgent and MD% computation
Date Tue, 23 Jun 2009 23:34:33 GMT
It will be better only if the client has enough knowledge on how the underlying implementation
is done and if the client want to take care of that ....
It's good to have the ability to create your own naming convention for the adaptor but I assume
that most developers will just use the implementation that do not specify the name as it simple
and they will not have to worry about duplicate and so on ....

My proposal:

 *   add a method to the adaptor interface (isDuplicate, that takes the same parameters as
the start method)
 *   each adaptor will register to the factory class so AdaptorManager will be able to call
the isDuplicate's method on the right Adaptor
 *   if there's no instance for  specific adaptor then it's because there's no duplicate

At the AdaptorManager level:

 *   if name is present then do not check anything
 *   if name not there then rely on the adaptor to found if it's a duplicate or not.

>> How many places actually specify a non-zero offset at adaptor creation time?
Log4j appender is the one that sends non zero start offset.


On 6/23/09 4:03 PM, "Ariel Rabkin" <> wrote:

A much better option would be for whoever  starts the adaptor to
specify a name.  The control protocol already supports this.

You just say:
  add name = ...., and then the adaptor will be called "name".
So if you want to take an MD5 of some params but not others, that's possible.


On Tue, Jun 23, 2009 at 3:55 PM, Jerome Boulon<> wrote:
> Param actually contains an offset and the fileName and assuming that we could have more
parameteres inside the param string there's no way for
> The agent to build the correct MD5.
> So, given that, if we add a method to the adaptor, the adaptor will then be able to give
you the correct MD5.
> /Jerome.
> On 6/23/09 3:40 PM, "Ariel Rabkin" <> wrote:
> In the current codebase, adaptor names are unique, and an attempt to
> create a duplicate will just return the previous adaptor.  By default,
> the adaptor name is the MD5 hash is taken over the adaptor name, data
> type, and params.  This means you can have two different adaptors look
> at a file, or two adaptors with different datatype tags, but not two
> instances of the same adaptor.
> Offset should NOT be included in that hash. If it is, it's a bug. And
> a fairly subtle one, because the code doesn't, on its face, do any
> such thing.  If you have a test case showing misbehavior, can you post
> it?
> Note, by the way, that anybody who creates an adaptor can specify any
> name they like -- including the file name, or a hash thereof.  So
> there's a really easy workaround, in the client library.
> On Tue, Jun 23, 2009 at 3:29 PM, Jerome Boulon<> wrote:
>> Hi,
>> I have some questions on the synthesizeAdaptorID method from ChukwaAgent.
>> In previous version we used to have a check on fileName to avoid adding the
>> same adaptor for the same file twice.
>> This code is no longer there. Is this what we really want?
>> Also current MD5 could not be used to replace that functionality since the
>> offset is included in the MD5 computation. Is there any plan to fix this?
>> Thanks,
>>  Jerome.
> --
> Ari Rabkin
> UC Berkeley Computer Science Department

Ari Rabkin
UC Berkeley Computer Science Department

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message