hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2113) Add "-text" command to FsShell to decode SequenceFile to stdout
Date Mon, 29 Oct 2007 18:51:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538557

Chris Douglas commented on HADOOP-2113:

I think I've explained this command poorly. It attempts to render whatever exists at a given
path as human-readable text. Right now, it includes SequenceFile and gzip formats; it's not
trying to stuff a framework for computation on SequenceFiles into FsShell. I agree that such
a toolchain should be independent, but this aspires to something else.

While we're on the subject though, I'm not sure I fully understand the motivation for this
command-line tool. Aren't each of those commands easily implemented in map/reduce? As I see
it, there are two ways to generalize the operations Enis suggests, since all of WritableComparable
is fair game. Either a) everything is first converted to a string or b) the framework can
understand that a user-specified InputFormat creating a RecordReader creating a keytype comparable
to IntWritable should select a comparator for its keys such that the user-supplied "70" is
greater than "9", (unless the user actually intends a lexiographic ordering). Not to reveal
my opinion. ;)

In the latter case, code like this belongs in mapred, since merely working out the types is
going to be either a hack or a significant effort. In the former case, for more than a single
SequenceFile, such code still seems to belong in mapred; that said, piping the output of "text"-
as implemented- through a general text-processing utility is a reasonable hack for some purposes.
For my purposes, I only needed to check the first few records for some of the output, and
this suffices. I don't know why a comparable utility like HADOOP-175 never got committed (it
would be a good base, though 1) it relies on UTF8 keys which are currently deprecated and
2) it solves some problems outside the limited domain of this issue), but that no similar
utility has been written for the last year makes me wary of over-complicating this. It's for
human-readability, not processing.

> Add "-text" command to FsShell to decode SequenceFile to stdout
> ---------------------------------------------------------------
>                 Key: HADOOP-2113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2113
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.16.0
>         Attachments: 2113-0.patch
> FsShell should provide a command to examine SequenceFiles.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message