Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3160310AB6 for ; Wed, 19 Jun 2013 18:06:23 +0000 (UTC) Received: (qmail 78743 invoked by uid 500); 19 Jun 2013 18:06:22 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 78641 invoked by uid 500); 19 Jun 2013 18:06:22 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 78309 invoked by uid 99); 19 Jun 2013 18:06:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jun 2013 18:06:22 +0000 Date: Wed, 19 Jun 2013 18:06:21 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4829) Strange loss of data displayed in hadoop fs -tail command MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688231#comment-13688231 ] Colin Patrick McCabe commented on HDFS-4829: -------------------------------------------- [~tgrayson] I agree with Jing, this is behaving as designed. We could add an option for it to behave more like the UNIX tail command, which prints the last 10 lines by default, rather than a fixed amount of data. But that is a feature request, not a bug. > Strange loss of data displayed in hadoop fs -tail command > --------------------------------------------------------- > > Key: HDFS-4829 > URL: https://issues.apache.org/jira/browse/HDFS-4829 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.0.0-alpha > Environment: OS Centos 6.3 (on Intel Core2 Duo, VMware Player VM running under windows 7) > Testing on both 2.0.0-cdh4.1.1 and 2.0.0-cdh4.1.2 > Reporter: Todd Grayson > Priority: Minor > > Strange behavior of the hadoop fs -tail command - its default for output seems to be 9 lines of output vs 10 lines of output in the OS version of the command (minor issue). The strange thing (bug behavior?) appears to drop the initial octect from an IP address when examining a file over HDFS. > [training@localhost hands-on]$ hadoop fs -tail weblog/access_log > .190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js HTTP/1.1" 200 20404 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 > *When looking at the original log data outside of HDFS with the os version of the tail command we see the following* > [training@localhost hands-on]$ hadoop fs -get weblog/access_log ./ > [training@localhost hands-on]$ tail access_log > 10.190.174.142 - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > 10.190.174.142 - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js HTTP/1.1" 200 20404 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > 10.190.174.142 - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > 10.190.174.142 - - [03/Dec/2011:13:28:10 -0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 109379 > 10.190.174.142 - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 > When using non ip data seperated by periods, it gets even worse and even more data is masked? (same data subtituting names for IP octects). Note we loose the first line well into the URI string? * > [training@localhost hands-on]$ hadoop fs -tail weblog/test_log > s/javascript_combined.js HTTP/1.1" 200 20404 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 > * and verifying what we are looking at in normal tail matches - note the first line is not represented in the hadoop fs -tail as its only grabbing 9 lines instead of 10... as I mentioned before. Align the two text based examples along the javascript_combined line. * > [training@localhost hands-on]$ tail test_log > larry.billy.will.amy - - [03/Dec/2011:13:28:06 -0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > larry.billy.will.amy - - [03/Dec/2011:13:28:08 -0800] "GET /assets/js/javascript_combined.js HTTP/1.1" 200 20404 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /assets/img/home-logo.png HTTP/1.1" 200 3892 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/019.jpg HTTP/1.1" 200 74446 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/g_still_04.jpg HTTP/1.1" 200 761555 > larry.billy.will.amy - - [03/Dec/2011:13:28:09 -0800] "GET /images/filmmediablock/360/07082218.jpg HTTP/1.1" 200 154609 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmpics/0000/2229/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 184976 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000163.jpg HTTP/1.1" 200 60117 > larry.billy.will.amy - - [03/Dec/2011:13:28:larry.-0800] "GET /images/filmmediablock/360/Chacha.jpg HTTP/1.1" 200 larry.379 > larry.billy.will.amy - - [03/Dec/2011:13:28:11 -0800] "GET /images/filmmediablock/360/GOEMON-NUKI-000159.jpg HTTP/1.1" 200 161657 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira