spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fahad shah <>
Subject pyspark groupbykey throwing error: unpack requires a string argument of length 4
Date Mon, 19 Oct 2015 05:42:52 GMT

I am trying to do pair rdd's, group by the key assign id based on key.
I am using Pyspark with spark 1.3, for some reason, I am getting this
error that I am unable to figure out - any help much appreciated.

Things I tried (but to no effect),

1. make sure I am not doing any conversions on the strings
2. make sure that the fields used in the key are all there  and not
empty string (or else I toss the row out)

My code is along following lines (split is using stringio to parse
csv, header removes the header row and parse_train is putting the 54
fields into named tuple after whitespace/quote removal):

#Error for string argument is thrown on the BB.take(1) where the
groupbykey is evaluated

A = sc.textFile("train.csv").filter(lambda x:not
isHeader(x)).map(split).map(parse_train).filter(lambda x: not x is


B = k:
                         k.srch_children_count,k.srch_room_count), (k[0:54])))
BB = B.groupByKey()

best fahad

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message