Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E41BC18607 for ; Mon, 19 Oct 2015 05:43:00 +0000 (UTC) Received: (qmail 77351 invoked by uid 500); 19 Oct 2015 05:42:57 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 77255 invoked by uid 500); 19 Oct 2015 05:42:57 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 77245 invoked by uid 99); 19 Oct 2015 05:42:57 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Oct 2015 05:42:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9BB7FC6508 for ; Mon, 19 Oct 2015 05:42:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.129 X-Spam-Level: X-Spam-Status: No, score=0.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0bO4KufPakvp for ; Mon, 19 Oct 2015 05:42:56 +0000 (UTC) Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 1799720FF5 for ; Mon, 19 Oct 2015 05:42:53 +0000 (UTC) Received: by ykfy204 with SMTP id y204so133021316ykf.1 for ; Sun, 18 Oct 2015 22:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=2AU7BaSWXTl25Ep+YgBk+mkK4Wsik76vvHuvHj+giQ8=; b=k+rdmGD61AHLc6mvbAeO5ghk0zEa7Yn5bA85dHjBeF2T6hdGWl/H78d81HM3mxQR++ N8LGx2Q9IbC7Xehs5P/G/kIiEzJS5maNAYeGUqeV23DoITkI01uJymyV+VM/TmJD4lyB SzUP+ZEOZhcbO/Di4LQMmmrzw1b5hZ/4eTdhQ5w8lO83A7jkIWRvaocxa1cZKOc5Ffv0 eIUm508jj2SFRNFUhix+sEW/MMLq+niTpCsxJR8YLbldgOaX+5gEam14M6Xs4ZhfDW3s y2GUrNJAgFXZIA1X/yFlpojA+W85J415uLtaJcz85FN8aXTW6DjMrSZpUUuMRlymtXT3 gBwQ== MIME-Version: 1.0 X-Received: by 10.129.89.138 with SMTP id n132mr1759968ywb.61.1445233372338; Sun, 18 Oct 2015 22:42:52 -0700 (PDT) Received: by 10.13.241.135 with HTTP; Sun, 18 Oct 2015 22:42:52 -0700 (PDT) Date: Sun, 18 Oct 2015 22:42:52 -0700 Message-ID: Subject: pyspark groupbykey throwing error: unpack requires a string argument of length 4 From: fahad shah To: user@spark.apache.org Content-Type: text/plain; charset=UTF-8 Hi I am trying to do pair rdd's, group by the key assign id based on key. I am using Pyspark with spark 1.3, for some reason, I am getting this error that I am unable to figure out - any help much appreciated. Things I tried (but to no effect), 1. make sure I am not doing any conversions on the strings 2. make sure that the fields used in the key are all there and not empty string (or else I toss the row out) My code is along following lines (split is using stringio to parse csv, header removes the header row and parse_train is putting the 54 fields into named tuple after whitespace/quote removal): #Error for string argument is thrown on the BB.take(1) where the groupbykey is evaluated A = sc.textFile("train.csv").filter(lambda x:not isHeader(x)).map(split).map(parse_train).filter(lambda x: not x is None) A.count() B = A.map(lambda k: ((k.srch_destination_id,k.srch_length_of_stay,k.srch_booking_window,k.srch_adults_count, k.srch_children_count,k.srch_room_count), (k[0:54]))) BB = B.groupByKey() BB.take(1) best fahad --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org