flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hansen <dsche...@gmail.com>
Subject Re: Restricted character in logical node name
Date Fri, 16 Sep 2011 15:00:13 GMT
Thanks -- I think I missed part of the point of logical sinks the
first time I went through the doc.  I was more interested in being
able to dynamically add or remove nodes from a flow dynamically as
they happened to come up or power down.  I got the impression that was
what logical nodes were intended for, but then fell short of
accomplishing what was really needed so autoFailoverChains and flows
were added in as a quick fix (I'm not familiar with the history, this
is just the impression I get from the fact that flows aren't supported
by the multiconfig syntax and the lack of support for both solutions
if you're using a distributed master -- which is a shame -- after all,
if you want complete fault tolerance, you need to use both a
distributed master and autoFailoverChains). The approach I came up
with didn't rely so much on logical nodes so I didn't spend as much
time working with them.


I was just looking at the github repository (see links above) to see
if any history that might explain when and why the inconsistency
between the Identifier and Argument might have been introduced, but
the repo only goes back to 2010 -- I'm not sure where to find older
history.  At the time the code was added, both terms were already
defined as they currently are (with the colon being the only
difference).  I'm guessing part of the problem was introduced with the
Identifier was overloaded to describe the syntax for both Java
function names as well as Host ids.  As is it allows the dot '.' which
would be necessary for host names like hellofrom.somewhere.com, but
which would lead to invalid java code if used in a function name.
Conversely it allows the underscore which isn't entirely valid in
standard conforming host names.  Personally I think it would be a good
idea to break out that one term so that functions and sink names have
one set of valid characters and host names have a separate

The colon is kind of a tricky one especially when viewed in the
situation where you want to use it -- first of, if tacked onto the end
of a host name, I would normally think it was specifying a port.
Secondly, based on the multiconfig syntax of 'HOST:SOURCE|SINK', it
could get quite confusing to parse that line if the host name were
allowed to contain a colon.  As a result you'd need to quote it or
escape it, but in the end it might be easiest and the least confusing
just not to allow it.

On Thu, Sep 15, 2011 at 6:03 PM, Huang, Zijian(Victor)
<zijian.huang@etrade.com> wrote:
> Hi, Jeff:
>   you can look at here on the use of logical node: http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_logical_nodes
> We use it to have many nodes threads running on a single JVM, and each node will stream
one file to a different collector. The other approach is to start more than one Java processes
using the flume cmd.
>   Look like I can't use colon, so for now I have to replaced by something else, but
I think the Flume team need to make the grammar more consistent
> Vic
> -----Original Message-----
> From: Jeff Hansen [mailto:dscheffy@gmail.com]
> Sent: Thursday, September 15, 2011 3:53 PM
> To: flume-user@incubator.apache.org
> Subject: Re: Restricted character in logical node name
> Oh, I see.  I could be wrong, but I don't believe you can use logical node names in
the place of Hosts for configuration purposes.  I believe they're intended just for use with
logicalSinks and logicalSources.
> Whether that's the case or not though, when it comes to specifying the host name in your
config or multiconfig, the antlr grammar files have the "host" name using the Identifier syntax
I included earlier -- so from that perspective the colon is not allowed.
> On Thu, Sep 15, 2011 at 5:17 PM, Huang, Zijian(Victor) <zijian.huang@etrade.com>
>> Hi, Jeff:
>>  Thanks for the detail explanation. I can map the logical node using the ":" in
the name, but I have problem configuring it. I am using it this way:
>> ===
>> Exec map xxxx collector-sit:ets:txn:ord:test.log
>> submit multiconfig 'collector-sit:ets:txn:ord:test.log: collectorSource( 16006 )
|  text("/tmp/test.log")'
>> ===
>> Getting an syntax error when I trying to do multiconfig. I tried
>> quoting the node name, but it doesn't seem to work. If they allow us
>> to create an logical node with ":" in the name they should provide us
>> a way to configure it as well. I will take a look at their grammar in
>> the mean time
>> Thanks
>> Vic
>> -----Original Message-----
>> From: Jeff Hansen [mailto:dscheffy@gmail.com]
>> Sent: Thursday, September 15, 2011 8:18 AM
>> To: flume-user@incubator.apache.org
>> Subject: Re: Restricted character in logical node name
>> Are you by any chance using it in somewhere in a config or multiconfig without quoting
>> specifically, if you were to say
>> config host someSource logicalSink(bac:host:accee.log)
>> the parser would treat the logical name as a function rather than a string literal
and colons aren't allowed in function names.
>> Functions use identifiers:
>> Identifier
>>    :   Letter (Letter|JavaIDDigit|'.'|'-'|'_')*
>>    ;
>> However, when you're mapping the host to a logical name, config
>> arguments are allowed to have colons Argument
>>    : (Letter|JavaIDDigit|':'|'.'|'-'|'_')+
>>    ;
>> So I assume you'd be fine with a line like exec map somehost
>> bac:host:accee.log
>> Without looking through the code I don't know if there are further constraints, but
digging through the antlr syntax in FlumeShell.g and FlumeDeploy.g help me understand the
config grammar a lot better.
>> On Wed, Sep 14, 2011 at 6:56 PM, Huang, Zijian(Victor) <zijian.huang@etrade.com>
>>> Hi, Guys:
>>>    Is there a list of characters we can't not use in the logical
>>> agent/collector's name. I tried "bac:host:accee.log", it seems Flume
>>> has trouble dealing with ":"
>>> Thanks
>>> Victor Huang

View raw message