cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CASSANDRA-1497) Add input support for Hadoop Streaming
Date Thu, 21 Oct 2010 03:00:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923292#action_12923292
] 

Stu Hood edited comment on CASSANDRA-1497 at 10/20/10 11:00 PM:
----------------------------------------------------------------

contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper and reducer
* I suspect that extract_text can be turned into a one-liner somehow

contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header

contrib/hadoop_streaming_input/[input/]README.txt
* Mentions "-input": {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory

.../hadoop/ColumnFamilyRecordReader.java
* Indentation

.../hadoop/streaming/AvroResolver.java
* Updated javadoc

I looked a little bit into the immediate runtime failure, but didn't come to any conclusions.
One suspicious aspect is that Streaming appears to use the result of Resolver.getInputWriterClass
to write to both the mapper and reducer scripts: see http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783

      was (Author: stuhood):
    contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper and reducer
* I suspect that extract_text can be turned into a one-liner somehow
contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header
contrib/hadoop_streaming_input/[input/]README.txt
* Mentions "-input": {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory
.../hadoop/ColumnFamilyRecordReader.java
* Indentation
.../hadoop/streaming/AvroResolver.java
* Updated javadoc

I looked a little bit into the immediate runtime failure, but didn't come to any conclusions.
One suspicious aspect is that Streaming appears to use the result of Resolver.getInputWriterClass
to write to both the mapper and reducer scripts: see http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783
  
> Add input support for Hadoop Streaming
> --------------------------------------
>
>                 Key: CASSANDRA-1497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>             Fix For: 0.7.1
>
>         Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch
>
>
> related to CASSANDRA-1368 - create similar functionality for input streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message