hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command
Date Wed, 19 Jun 2013 18:06:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688231#comment-13688231
] 

Colin Patrick McCabe commented on HDFS-4829:
--------------------------------------------

[~tgrayson] I agree with Jing, this is behaving as designed.

We could add an option for it to behave more like the UNIX tail command, which prints the
last 10 lines by default, rather than a fixed amount of data.  But that is a feature request,
not a bug.
                
> Strange loss of data displayed in hadoop fs -tail command
> ---------------------------------------------------------
>
>                 Key: HDFS-4829
>                 URL: https://issues.apache.org/jira/browse/HDFS-4829
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.0.0-alpha
>         Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM running under
windows 7)
> Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
>            Reporter: Todd Grayson
>            Priority: Minor
>
> Strange behavior of the hadoop fs -tail command - its default for output seems to be
9 lines of output vs 10 lines of output in the OS version of the command (minor issue).  The
strange thing (bug behavior?) appears to drop the initial octect from an IP address when examining
a file over HDFS.  
> [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
> .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js
HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1"
200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657
> *When looking at the original log data outside of HDFS with the os version of the tail
command we see the following*
> [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
> [training@localhost hands-on]$ tail access_log 
> 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js
HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1"
200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657
> When using non ip data seperated by periods, it gets even worse and even more data is
masked? (same data subtituting names for IP octects).  Note we loose the first line well into
the URI string? *
> [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
> s/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png
HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 larry.379
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657
> * and verifying what we are looking at in normal tail matches - note the first line is
not represented in the hadoop fs -tail as its only grabbing 9 lines instead of 10... as I
mentioned before. Align the two text based examples along the javascript_combined line. *
> [training@localhost hands-on]$ tail test_log
> larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js
HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png
HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 larry.379
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message