giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ghufran malik <ghufran1ma...@gmail.com>
Subject Re: ConnectedComponents example
Date Mon, 31 Mar 2014 20:50:30 GMT
Hey,

Yes when originally debugging the code I thought to check what \t actually
split by and created my own test class:

import java.util.regex.Pattern;

 class App
{
  private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
    public static void main( String[] args )
    {
    String line = "1 0 2";
     String[] tokens = SEPARATOR.split(line.toString());

     System.out.println(SEPARATOR);
     System.out.println(tokens.length);

     for(String token : tokens){

     System.out.println(token);
     }
    }
}

and the pattern worked as I thought it should by tab spaces.

I'll try your test as well to double check


On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young.han@uwaterloo.ca> wrote:

> Weird, inputs with tabs work for me right out of the box. Either the "\t"
> is not the cause or it's some Java-version specific issue. Try this toy
> program:
>
>
> import java.util.regex.Pattern;
>
> public class Test {
>   public static void main(String[] args) {
>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>
>     for (int i = 0; i < tokens.length; i++) {
>       System.out.println("--" + tokens[i] + "--");
>     }
>   }
> }
>
>
> Does it split the tabs properly for your Java?
>
> Young
>
>
> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>
>> Yep you right it is a bug with all the InputFormats I believe,  I just
>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>> and the example ConnectedComponents class and it worked like a charm with
>> just the normal spacing.
>>
>>
>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>
>>> Huh, it might be a bug in the code. Could it be that Pattern.compile has
>>> to take "[\\t ]" (note the double backslash) to properly match tabs? If so,
>>> that bug is in all the input formats...
>>>
>>> Happy to help :)
>>>
>>> Young
>>>
>>>
>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I removed the spaces and it worked! I don't understand though. I'm sure
>>>> the separator pattern means that it splits it by tab spaces?.
>>>>
>>>> Thanks for all your help though some what relieved now!
>>>>
>>>> Kind regards,
>>>>
>>>> Ghufran
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>> userlogs say?
>>>>>
>>>>> And just to rule out weirdness, what happens if you use spaces instead
>>>>> of tabs (for your input graph)?
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>> ghufran1malik@gmail.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> No even after I added the .txt it gets to map 100% then drops back
>>>>>> down to 50 and gives me the error:
>>>>>>
>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format
>>>>>> specified. Ensure your InputFormat does not require one.
>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>> format vertex index type is not known
>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>> format vertex value type is not known
>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>> format edge value type is not known
>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is
>>>>>> disabled (default), do not allow any task retries (setting
>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>> job_201403311622_0004
>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>> job_201403311622_0004
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1238858
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by
all
>>>>>> reduces waiting after reserving slots (ms)=0
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent by
all
>>>>>> maps waiting after reserving slots (ms)=0
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>
>>>>>>
>>>>>> I did a check to make sure the graph was being stored correctly by
>>>>>> doing:
>>>>>>
>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat
>>>>>> input/*
>>>>>> 1 2
>>>>>> 2 1 3 4
>>>>>> 3 2
>>>>>> 4 2
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message