hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout
Date Sat, 27 Oct 2007 22:52:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538266
] 

chris.douglas edited comment on HADOOP-2113 at 10/27/07 3:52 PM:
-----------------------------------------------------------------

(core tests failed HADOOP-2112; I assume the contrib tests are unrelated)

Each of those seem like valuable operations, but piping the output through one's favorite
text-processing utility seems very usable. Unless the keys contain tabs, I would expect 1-4
in your list to be pretty straightforward. I agree that the framework could be far more efficient
for most operations- particularly for sorted data, which is almost certainly the most common
case- and it could also help express "for keys matching this regexp in their string representation,
emit them as their native type" (which this cannot), but isn't mapred the correct tool for
that job, anyway? The intent was merely to provide an aid to people hoping to check the first
few/some subset of values from a given SequenceFile; it aspires to sanity checks, not processing.

I could see extending stat to support more info, re: (5), though. By "a more general set of
tools", what did you have in mind?

[edit - unintended text effects ]

      was (Author: chris.douglas):
    (core tests failed HADOOP-2112; I assume the contrib tests are unrelated)

Each of those seem like valuable operations, but piping the output of "-text" through one's
favorite text-processing utility seems very usable. Unless the keys contain tabs, I would
expect 1-4 in your list to be pretty straightforward. I agree that the framework could be
far more efficient for most operations- particularly for sorted data, which is almost certainly
the most common case- and it could also help express "for keys matching this regexp in their
string representation, emit them as their native type" (which this cannot), but isn't mapred
the correct tool for that job, anyway? The intent was merely to provide an aid to people hoping
to check the first few/some subset of values from a given SequenceFile; it aspires to sanity
checks, not processing.

I could see extending -stat to support more info, re: (5), though. By "a more general set
of tools", what did you have in mind?
  
> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2113-0.patch
>
>
> FsShell should provide a command to examine SequenceFiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message