Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 38150 invoked from network); 16 Jul 2010 01:20:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Jul 2010 01:20:00 -0000 Received: (qmail 81250 invoked by uid 500); 16 Jul 2010 01:19:58 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 81198 invoked by uid 500); 16 Jul 2010 01:19:57 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 81189 invoked by uid 99); 16 Jul 2010 01:19:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jul 2010 01:19:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of korovaikon@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-gw0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jul 2010 01:19:48 +0000 Received: by gwj20 with SMTP id 20so1289550gwj.35 for ; Thu, 15 Jul 2010 18:18:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:from:date :message-id:subject:to:content-type; bh=p9RC33x2MBmhgQGoQDE9rGPvmBoRA0U7bCCCWX9mGMQ=; b=JSgi8fZ8z/8exHL+nxkVPmjdL7Au0OJ1ILyFaRZtUI39Kpxqpm9w9e9gVp8B98pVsD dhX77raNpf8lE8irm0AVcW4ilB6shIxvLYd3XMFmn8dzeJMLwjHF+StuYnvahAXzA+AD V3lBdK3BD05d4jtcLaqiIWWYBD4w1THBexcQI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=TquxDIMpwqaQiOSUabBL1EJ7hANXUwLqcVtFxBpAn1n4951e7d23RWJ9JJTISUsISs dZwVRzoXRPoTymDSYRCetVCyUwVLiQ/VrfWd/ncPKJvTMCWu/BHQztwJLv5RZNCIyMpm D6+MUqRed02Aj1OsspyB4/KmH5plMD3ITRHwQ= Received: by 10.100.8.5 with SMTP id 5mr528973anh.110.1279243107255; Thu, 15 Jul 2010 18:18:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.151.20 with HTTP; Thu, 15 Jul 2010 18:18:07 -0700 (PDT) From: Nikolay Korovaiko Date: Thu, 15 Jul 2010 18:18:07 -0700 Message-ID: Subject: string conversion problems To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e64653f0fd4517048b76fe7f X-Virus-Checked: Checked by ClamAV on apache.org --0016e64653f0fd4517048b76fe7f Content-Type: text/plain; charset=ISO-8859-1 Hi everyone, I hope this is the right place for my question. If not, please, feel free to ignore it ;) and I'm sorry for any inconvenience made :( I'm writing a simple program for enumerating triangles in directed graphs for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab symbol serves as a delimiter) I want my map function output the following pairs ([a, to_b], [b, from_a], [a_b, -1]): public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); String [] tokens = line.split(" "); output.collect(new Text(tokens[0]), new Text("to_"+tokens[1])); output.collect(new Text(tokens[1]), new Text("from_"+tokens[0])); output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1")); } Now my reduce function is supposed to cross join all pairs that have both to_'s and from_'s and to simply propogate any other pairs whose keys contain "_". public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { String key_s = key.toString(); if (key_s.indexOf("_")>0) output.collect(key, new Text("completed")); else { HashMap > lists = new HashMap > (); while (values.hasNext()) { String line = values.next().toString(); String[] tokens = line.split("_"); if (!lists.containsKey(tokens[0])) { lists.put(tokens[0], new ArrayList()); } lists.get(tokens[0]).add(tokens[1]); } for (String t : lists.get("to")) for (String f : lists.get("from")) output.collect(new Text(t+"_"+f), key); } } And this is where the most exciting stuff happens. tokens[1] yields an ArrayOutOfBounds exception. If you scroll up, you can see that by this point the iterator should give values like "to_a", "from_b", "to_b", etc... when I just output these values, everything looks ok and I have "to_a", "from_b". But split() don't work at all, moreover line.length() is always 1 and indexOf("*") returns -1! The very same indexOf WORKS PERFECTLY for keys... where we have pairs whose keys contain "_"* and look like "a_b", "b_c" I'm really puzzled with all this. MapReduce is supposed to save lives making everything simple. Instead I spent several hours to just spot this... I'd really appreciate your help, guys!!! Thanks in advance! --0016e64653f0fd4517048b76fe7f--