giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: ConnectedComponents example
Date Mon, 31 Mar 2014 20:34:47 GMT
Weird, inputs with tabs work for me right out of the box. Either the "\t"
is not the cause or it's some Java-version specific issue. Try this toy
program:


import java.util.regex.Pattern;

public class Test {
  public static void main(String[] args) {
    Pattern SEPARATOR = Pattern.compile("[\t ]");
    String[] tokens = SEPARATOR.split("3 4    5    6    7");

    for (int i = 0; i < tokens.length; i++) {
      System.out.println("--" + tokens[i] + "--");
    }
  }
}


Does it split the tabs properly for your Java?

Young


On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1malik@gmail.com>wrote:

> Yep you right it is a bug with all the InputFormats I believe,  I just
> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
> and the example ConnectedComponents class and it worked like a charm with
> just the normal spacing.
>
>
> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young.han@uwaterloo.ca> wrote:
>
>> Huh, it might be a bug in the code. Could it be that Pattern.compile has
>> to take "[\\t ]" (note the double backslash) to properly match tabs? If so,
>> that bug is in all the input formats...
>>
>> Happy to help :)
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I removed the spaces and it worked! I don't understand though. I'm sure
>>> the separator pattern means that it splits it by tab spaces?.
>>>
>>> Thanks for all your help though some what relieved now!
>>>
>>> Kind regards,
>>>
>>> Ghufran
>>>
>>>
>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>
>>>> Hi,
>>>>
>>>> That looks like an error with the algorithm... What do the Hadoop
>>>> userlogs say?
>>>>
>>>> And just to rule out weirdness, what happens if you use spaces instead
>>>> of tabs (for your input graph)?
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>> down to 50 and gives me the error:
>>>>>
>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
>>>>> specified. Ensure your InputFormat does not require one.
>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>>>> vertex index type is not known
>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>>>> vertex value type is not known
>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format
>>>>> edge value type is not known
>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>> disabled (default), do not allow any task retries (setting
>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>> job_201403311622_0004
>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>> job_201403311622_0004
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>> reduces waiting after reserving slots (ms)=0
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by all
>>>>> maps waiting after reserving slots (ms)=0
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>
>>>>>
>>>>> I did a check to make sure the graph was being stored correctly by
>>>>> doing:
>>>>>
>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>> input/*
>>>>> 1 2
>>>>> 2 1 3 4
>>>>> 3 2
>>>>> 4 2
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message