hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romeo Kienzler <ro...@ormium.de>
Subject Question on Hadoop Streaming
Date Tue, 06 Dec 2011 08:59:05 GMT
Hi,

I've got the following setup for NGS read alignment:


A script accepting data from stdin/out:
------------------------------------------------------------
cat /root/bowtiestreaming.sh
cd /home/streamsadmin/crossbow-1.1.2/bin/linux32/
/home/streamsadmin/crossbow-1.1.2/bin/linux32/bowtie -m 1 -q e_coli --12 
- 2> /root/bowtie.log



A file copied to HDFS:
------------------------------------------------------------
hadoop fs -put 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000

A streaming job invoked with only the mapper:
------------------------------------------------------------
hadoop jar 
hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar 
-input 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
-output 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned

-mapper '/root/bowtiestreaming.sh' -jobconf mapred.reduce.tasks=0

The file cannot be found even it is displayed:
------------------------------------------------------------
hadoop fs -cat 
/user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned
11/12/06 09:07:47 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
cacheTimeout=300000
11/12/06 09:07:48 WARN conf.Configuration: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
cat: File does not exist: 
/user/root/SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000.aligned


He file looks like this (tab seperated):
head 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 

@SRR014475.1 :1:1:108:111 length=36     
GAGTTTTACGTCGTCCTAAAACAGTACATAAAAATA    I3IIIII+I(%BH43%III7I(5IIIIIII*<&II+
@SRR014475.2 :1:1:112:26 length=36      
GNNNNNNTTCCCTTTTCAACTTCCAAATCACCTAAC    I!!!!!!II=I<IIII@II5II)/$;%+*/&%%#&#
@SRR014475.3 :1:1:101:937 length=36     
GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA    IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G
@SRR014475.4 :1:1:124:64 length=36      
GAACACATAGAACAACAGGATTCGCCAGAACACCTG    IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;
@SRR014475.5 :1:1:108:897 length=36     
GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT    I0I:I'IIII+IG3II46II0>C@=III()+:+2&$
@SRR014475.6 :1:1:106:14 length=36      
GNNNNNNNNNNNNNNNTNTAGCATTAAGTAATTGGT    I!!!!!!!!!!!!!!!I!I6I*+III:%IB0+I.%?
@SRR014475.7 :1:1:118:934 length=36     
GGTTACTACTCTGCGACTCCTCGCAGAAGAGACGCT    III0%%)&%I.I&I;III.(I@E&2>*'+1;;#;&'
@SRR014475.8 :1:1:123:8 length=36       
GNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNTNN    I!!!!!!!!!!!!!!!!!!!!!!!!!!!$!!!!(!!
@SRR014475.9 :1:1:118:88 length=36      
GGAAACAAAATGGCGCGCTACCAGGTAACGCGCCAC    IIIIIIIIIIIIIIIGIAA4;1+16*;*+)'$%#$%
@SRR014475.10 :1:1:92:122 length=36     
ATTTGCTGCCAATGGCGAGATTAAAAACGAATAATA    IIIIIIIIIIIIIICII;CGIDI?%$I:%6)C*;#;


and the result like this:

cat 
SRR014475.lite.nodoublequotewithendsnocommas.fastq.received.1-read-per-line-format.10000 
|./bowtiestreaming.sh |head
@SRR014475.3 :1:1:101:937 length=36     +       
gi|110640213|ref|NC_008253.1|   3393863 
GAAGATCCGGTACAACAAAACCTGATGTAAATGGTA    
IIIIIIIIIIIIIIIIIAIIIIIIAII%I<IIII0G  0       7:T>C,27:G>T
@SRR014475.4 :1:1:124:64 length=36      +       
gi|110640213|ref|NC_008253.1|   2288633 
GAACACATAGAACAACAGGATTCGCCAGAACACCTG    
IIIIIIIIIIIIIII><CI+@5+)'(-'&;&%$;+;  0       30:T>C
@SRR014475.5 :1:1:108:897 length=36     +       
gi|110640213|ref|NC_008253.1|   4389356 
GGAAGAGATGAAGTGGGTCGTTGTGGTGTGTTTGTT    
I0I:I'IIII+IG3II46II0>C@=III()+:+2&$  0       
5:C>A,28:G>T,29:C>G,30:A>T,34:C>T
@SRR014475.9 :1:1:118:88 length=36      -       
gi|110640213|ref|NC_008253.1|   3598410 
GTGGCGCGTTACCTGGTAGCGCGCCATTTTGTTTCC    
%$#%$')+*;*61+1;4AAIGIIIIIIIIIIIIIII  0
@SRR014475.15 :1:1:87:967 length=36     +       
gi|110640213|ref|NC_008253.1|   4474247 
GACTACACGATCGCCTGCCTTAATATTCTTTACACC    
IIIIIIIIIIIIA27II7CIII*I5I+FIIII?II'  0       6:G>A,26:G>T
@SRR014475.20 :1:1:108:121 length=36    -       
gi|110640213|ref|NC_008253.1|   37761   
AAAAAATGCATATTGTTTTAGAGTGTGATTATTAGC    
I<D4II'2I<IIC/;B?FIIIIIIIIIIIIIIIIII  0       12:C>T
@SRR014475.23 :1:1:75:54 length=36      +       
gi|110640213|ref|NC_008253.1|   2465453 
GGTTTCTTTCTGCGCAGATGCCAGACGGTCTTTATA    
IIIIIIIIIIIICII<III;';29=9I.4%EE2)*'  0
@SRR014475.24 :1:1:89:904 length=36     -       
gi|110640213|ref|NC_008253.1|   3216193 
ATTAGTGTTAAGATTTCTATATTGTTGTTTTAGGCC    
#%);%;$EI-;$%8%&I%I/+IIIIIIIIIIIIIII  0       
18:C>T,21:G>T,30:C>T,31:T>G,34:A>T
@SRR014475.27 :1:1:74:887 length=36     -       
gi|110640213|ref|NC_008253.1|   540567  
AAACGTGGCGTTTCAGGGATCGTTTGCCTGCATTAC    
*&(%9%0F3.@4;&?4I3I6%:9AI0HIIIIIIIII  0       34:C>A,35:C>A
@SRR014475.30 :1:1:123:73 length=36     +       
gi|110640213|ref|NC_008253.1|   3391697 
AAAAGATTGCGACTGACGGCGCAAATGCCCTCCGTT    
IIIIIIIIICI:II3*<4.*'+%'&)&$;+;%;%;;  0       30:C>T,34:G>T


Any ideas?

best Regards,

Romeo


-------------
Romeo Kienzler
r o m e o @ o r m i u m . d e


Mime
View raw message