flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AD <straightfl...@gmail.com>
Subject Re: regex not matching 0 properly
Date Fri, 07 Oct 2011 02:43:57 GMT
apologies, you are correct.  Hbase is fine.

column=f1:response_bytes, timestamp=1317954678177, value=0

Cheers,
AD

On Thu, Oct 6, 2011 at 2:46 AM, Mingjie Lai <mjlai09@gmail.com> wrote:

> AD.
>
> IMO, the issue only occurs for console (maybe also text, but haven't tried)
> sinks who call the Attributes.toString() method.
>
> The hbase sink should be fine. Have you verified to write to hbase? I don't
> think I had the problem before.
>
> Thanks,
> Mingjie
>
>
> On 10/05/2011 06:23 AM, AD wrote:
>
>> i am pumping the results into Hbase and the value is showing up as 48
>> and not 0 which is a bit of an issue.
>>
>> On Tue, Oct 4, 2011 at 11:41 PM, NerdyNick <nerdynick@gmail.com
>> <mailto:nerdynick@gmail.com>> wrote:
>>
>>    If you plan on using the attributes you extract in any of the
>>    escaped/formated output paths or strings they will be fine. As those
>>    decorators/sinks/source actually convert the bite array. The fact that
>>    console doesn't make me think it should be flagged as a bug and should
>>    be fixed as to reduce confusion. However I do see it as beneficial for
>>    developers to have the raw bit values. So maybe we should also be
>>    logging a DEBUG level message for that version of the output.
>>
>>    On Tue, Oct 4, 2011 at 7:19 PM, AD <straightflush@gmail.com
>>    <mailto:straightflush@gmail.**com <straightflush@gmail.com>>> wrote:
>>     > Thanks, so is this a bug?  My issue is that i am storing the
>>    number of
>>     > "bytes" served from my apache log, and when its 0, i will end up
>>    storing 48
>>     > and skewing the reports.
>>     > Any thoughts?
>>     >
>>     > Thanks for the find.
>>     > _AD
>>     >
>>     > On Tue, Oct 4, 2011 at 2:56 PM, Mingjie Lai <mjlai09@gmail.com
>>    <mailto:mjlai09@gmail.com>> wrote:
>>     >>
>>     >> AD.
>>     >>
>>     >> I noticed the issue before. It's actually not a regex problem,
>>    but the way
>>     >> flume printing byte array as string at collector side.
>>     >>
>>     >> You can also reproduce it by:
>>     >> # bin/flume node_nowatch -1 -s -n dump -c 'dump:
>>    tail("/tmp/integer") | {
>>     >> value("bb", "b") => console};
>>     >>
>>     >> Below is the piece of code (Attributes.java). It takes a bytes
>>    array whose
>>     >> length is 1, 4, or 8 and print them as int or long. In case of
>>    length 1, it
>>     >> only prints the byte value.
>>     >>
>>     >> ---------------
>>     >>      // this is a hack that prints in int, string and double
>>    format when
>>     >> there
>>     >>      // are 8 bytes.
>>     >>      // TODO (jon) this gets grosser and grosser. make a final
>>    decision on
>>     >> how
>>     >>      // these attributes are going to be
>>     >>      if (bytes.length == 8) {
>>     >>
>>     >>        return "(long)" + readLong(e, attr).toString() + "
>>      (string) '"
>>     >>            + readString(e, attr) + "'" + " (double)"
>>     >>            + readDouble(e, attr).toString();
>>     >>      }
>>     >>
>>     >>      // this is a similar hack that prints in int and string
>>    format when
>>     >> there
>>     >>      // are 4 bytes.
>>     >>      if (bytes.length == 4) {
>>     >>        return readInt(e, attr).toString() + " '" + readString(e,
>>    attr) +
>>     >> "'";
>>     >>      }
>>     >>
>>     >>      if (bytes.length == 1) {
>>     >>        return "" + (((int) bytes[0]) & 0xff);
>>     >>      }
>>     >>
>>     >> ---------------
>>     >>
>>     >> -mingjie
>>     >>
>>     >> On 10/03/2011 07:40 PM, AD wrote:
>>     >>>
>>     >>> Hello,
>>     >>>
>>     >>>  I noticed when trying to use regex to parse an integer from a
>>    file, a
>>     >>> number of 0 was populating the number 48 into the output on the
>>    flume
>>     >>> command line instead.  has anyone come across this before?
>>  Example
>>     >>> below:
>>     >>>
>>     >>> bash-3.2# cat /tmp/integer
>>     >>> 0
>>     >>>
>>     >>> bash-3.2# cat parse.int <http://parse.int> <http://parse.int>
>>
>>     >>> ./flume node_nowatch -1 -s -n dump -c 'dump:
>>    tail("/tmp/integer") | {
>>     >>> regexAll("^(\\d+)","mynum") => console }; '
>>     >>>
>>     >>> bash-3.2# ./parse.int <http://parse.int> <http://parse.int>
>>
>>    2>&1 | grep mynum
>>     >>>
>>     >>> 2011-10-03 22:37:49,526 [main] INFO agent.FlumeNode: System
>>    property
>>     >>> sun.java.command=com.cloudera.**flume.agent.FlumeNode -1 -s -n
>>    dump -c
>>     >>> dump: tail("/tmp/integer") | { regexAll("^(\\d+)","mynum") =>
>>    console };
>>     >>> 2011-10-03 22:37:49,966 [main] INFO agent.FlumeNode: Loading
>>    spec from
>>     >>> command line: 'dump: tail("/tmp/integer") | {
>>     >>> regexAll("^(\\d+)","mynum") => console }; '
>>     >>> lilmac.home [INFO Mon Oct 03 22:37:50 EDT 2011] { *mynum : 48* }
{
>>     >>> tailSrcFile : integer } 0
>>     >>>
>>     >>> Cheers,
>>     >>> AD
>>     >
>>     >
>>
>>
>>
>>    --
>>    Nick Verbeck - NerdyNick
>>    ------------------------------**----------------------
>>    NerdyNick.com
>>    Coloco.ubuntu-rocks.org <http://Coloco.ubuntu-rocks.**org<http://Coloco.ubuntu-rocks.org>
>> >
>>
>>
>>

Mime
View raw message