hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Korovaiko <korovai...@gmail.com>
Subject string conversion problems
Date Fri, 16 Jul 2010 01:18:07 GMT
Hi everyone,

I hope this is the right place for my question. If not, please, feel free to
ignore it  ;) and I'm sorry for any inconvenience made :(

I'm writing a simple program for enumerating triangles in directed graphs
for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab
symbol serves as a delimiter) I want my map function output the following
pairs ([a, to_b], [b, from_a], [a_b, -1]):

 public void map(LongWritable key, Text value,

                OutputCollector<Text, Text> output,

                Reporter reporter) throws IOException {

  String line = value.toString();

  String [] tokens = line.split("    ");

  output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));

  output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));

  output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));


Now my reduce function is supposed to cross join all pairs that have both
to_'s and from_'s and to simply propogate any other pairs whose keys contain

      public void reduce(Text key, Iterator<Text> values,

                   OutputCollector<Text, Text> output,

                   Reporter reporter) throws IOException {

  String key_s = key.toString();

  if (key_s.indexOf("_")>0)

      output.collect(key, new Text("completed"));

   else {

           HashMap <String, ArrayList<String>> lists = new HashMap
<String, ArrayList<String>> ();

          while (values.hasNext()) {

              String line = values.next().toString();

              String[] tokens = line.split("_");

              if (!lists.containsKey(tokens[0])) {

                   lists.put(tokens[0], new ArrayList<String>());



          for (String t : lists.get("to"))

               for (String f : lists.get("from"))

                  output.collect(new Text(t+"_"+f), key);



And this is where the most exciting stuff happens. tokens[1] yields an
ArrayOutOfBounds exception. If you scroll up, you can see that by this point
the iterator should give values like "to_a", "from_b", "to_b", etc... when I
just output these values, everything looks ok and I have "to_a", "from_b".
But split() don't work at all, moreover line.length() is always 1 and
indexOf("*") returns -1! The very same indexOf WORKS PERFECTLY for keys...
where we have pairs whose keys contain "_"* and look like "a_b", "b_c"

I'm really puzzled with all this. MapReduce is supposed to save lives making
everything simple. Instead I spent several hours to just spot  this...

I'd really appreciate your help, guys!!! Thanks in advance!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message