hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command
Date Fri, 17 May 2013 21:43:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661072#comment-13661072
] 

Jing Zhao commented on HDFS-4829:
---------------------------------

I think the reason of the behavior is that "hadoop fs -tail" only shows the last 1K data.
Its description says "Show the last 1KB of the file", and the shown content in the above two
examples are both of exact 1K size.
                
> Strange loss of data displayed in hadoop fs -tail command
> ---------------------------------------------------------
>
>                 Key: HDFS-4829
>                 URL: https://issues.apache.org/jira/browse/HDFS-4829
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.0.0-alpha
>         Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM running under
windows 7)
> Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2
>            Reporter: Todd Grayson
>            Priority: Minor
>
> Strange behavior of the hadoop fs -tail command - its default for output seems to be
9 lines of output vs 10 lines of output in the OS version of the command (minor issue).  The
strange thing (bug behavior?) appears to drop the initial octect from an IP address when examining
a file over HDFS.  
> [training@localhost hands-on]$ hadoop fs -tail weblog/access_log
> .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js
HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1"
200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657
> *When looking at the original log data outside of HDFS with the os version of the tail
command we see the following*
> [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./
> [training@localhost hands-on]$ tail access_log 
> 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js
HTTP/1.1" 200 20404
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1"
200 3892
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 109379
> 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657
> When using non ip data seperated by periods, it gets even worse and even more data is
masked? (same data subtituting names for IP octects).  Note we loose the first line well into
the URI string? *
> [training@localhost hands-on]$ hadoop fs -tail weblog/test_log
> s/javascript_combined.js HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png
HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 larry.379
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657
> * and verifying what we are looking at in normal tail matches - note the first line is
not represented in the hadoop fs -tail as its only grabbing 9 lines instead of 10... as I
mentioned before. Align the two text based examples along the javascript_combined line. *
> [training@localhost hands-on]$ tail test_log
> larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js
HTTP/1.1" 200 20404
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png
HTTP/1.1" 200 3892
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg
HTTP/1.1" 200 74446
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg
HTTP/1.1" 200 761555
> larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg
HTTP/1.1" 200 154609
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 184976
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg
HTTP/1.1" 200 60117
> larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg
HTTP/1.1" 200 larry.379
> larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg
HTTP/1.1" 200 161657

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message