hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周梦想 <abloz...@gmail.com>
Subject why using mapreduce python scripts print more NULLs
Date Tue, 26 Mar 2013 09:54:53 GMT
hive version:0.10.0

hive> from testpoker select
transform(ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon)
using 'calcpoker.py' as
ldate,gameid,userid,pid,win,fold,allin,cardtype,cards ;

03/13/13 1009 185690475 8639 0 1 0 -1 NULL      NULL    NULL    NULL
 NULL    NULL    NULL    NULL    NULL
03/13/13 1009 187270278 92030 0 1 0 -1 NULL     NULL    NULL    NULL
 NULL    NULL    NULL    NULL    NULL
03/13/13 1009 184151687 8639 0 1 0 -1 NULL      NULL    NULL    NULL
 NULL    NULL    NULL    NULL    NULL
03/13/13 1009 186012530 8593 1 0 1 7 8|21|16|42|39      NULL    NULL
 NULL    NULL    NULL    NULL    NULL    NULL
03/13/13 1009 180286243 92041 0 1 0 -1 NULL     NULL    NULL    NULL
 NULL    NULL    NULL    NULL    NULL

the last 8 NULLs is wrong data that I unexpected. I don't know where them
come from and how to get rid of them. pls give me some advice. thanks.

Andy

-----

python file:
[hbase@h46 hive-0.10.0]$ cat calcpoker.py
#!/usr/bin/env python
# coding:utf8

import sys
import datetime

def calcwin():
    for line in sys.stdin:
        #line = line.strip()

(ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon)=line.strip().split()
        win = '0'
        if fold=='1':
            print '%s %s %s %s %s %s %s %s
%s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards)
            continue
        cw = []
        if chipwon == "NULL":
            print '%s %s %s %s %s %s %s %s
%s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards)
            continue
        #print "userid win ",userid
        cw=chipwon.split('|')
        chipwonv=0
        roundbetv=int(roundbet)
        for v in cw:
            chipwonv += int(v.split(':')[1])

        #print "chipwonv:%d,roundbet:%d"%(chipwonv,roundbetv)
        if chipwonv > roundbetv:
            win = '1'

        #print '
'.join([ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon])
        print '
'.join([ldate,gameid,userid,pid,win,fold,allin,cardtype,cards])
        #print '%s %s %s %s %s %s %s %s
%s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards)


calcwin()

I test outside of the mapreduce, it's ok:
[hbase@h46 hive-0.10.0]$ ./calcpoker.py
03/13/13 14:59:51 00000ab4 1009 185690475 8639 240 1 0 -1 NULL NULL
03/13/13 14:59:51 00000cb4 1009 187270278 92030 600 1 0 -1 NULL NULL
03/13/13 14:59:52 000003d8 1009 184151687 8639 600 1 0 -1 NULL NULL
03/13/13 14:59:52 00000ba8 1009 186012530 8593 154135 0 1 7 8|21|16|42|39
0:73250|1:60500|2:100135
03/13/13 14:59:52 00000a88 1009 180286243 92041 100 1 0 -1 NULL NULL
03/13/13 1009 185690475 8639 0 1 0 -1 NULL
03/13/13 1009 187270278 92030 0 1 0 -1 NULL
03/13/13 1009 184151687 8639 0 1 0 -1 NULL
03/13/13 1009 186012530 8593 1 0 1 7 8|21|16|42|39
03/13/13 1009 180286243 92041 0 1 0 -1 NULL

the begin five lines is the source data on hdfs.
the last five lines is the result that calcpoker printed.

Mime
View raw message