giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: ConnectedComponents example
Date Mon, 31 Mar 2014 20:52:06 GMT
Ah yeah, I found the answer to that question:
https://stackoverflow.com/questions/3762347/

So I don't think that bit is a bug. I'm not really sure why inputs with
tabs don't work for you. I'm using Hadoop 1.0.4 and jdk1.6.0_30 on Ubuntu
12.04 x64, if that helps you.

Young


On Mon, Mar 31, 2014 at 4:50 PM, ghufran malik <ghufran1malik@gmail.com>wrote:

> Hey,
>
> Yes when originally debugging the code I thought to check what \t actually
> split by and created my own test class:
>
> import java.util.regex.Pattern;
>
>  class App
> {
>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>     public static void main( String[] args )
>     {
>     String line = "1 0 2";
>      String[] tokens = SEPARATOR.split(line.toString());
>
>      System.out.println(SEPARATOR);
>      System.out.println(tokens.length);
>
>      for(String token : tokens){
>
>      System.out.println(token);
>      }
>     }
> }
>
> and the pattern worked as I thought it should by tab spaces.
>
> I'll try your test as well to double check
>
>
> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young.han@uwaterloo.ca> wrote:
>
>> Weird, inputs with tabs work for me right out of the box. Either the "\t"
>> is not the cause or it's some Java-version specific issue. Try this toy
>> program:
>>
>>
>> import java.util.regex.Pattern;
>>
>> public class Test {
>>   public static void main(String[] args) {
>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>
>>     for (int i = 0; i < tokens.length; i++) {
>>       System.out.println("--" + tokens[i] + "--");
>>     }
>>   }
>> }
>>
>>
>> Does it split the tabs properly for your Java?
>>
>> Young
>>
>>
>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>>
>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>> and the example ConnectedComponents class and it worked like a charm with
>>> just the normal spacing.
>>>
>>>
>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>
>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs?
If
>>>> so, that bug is in all the input formats...
>>>>
>>>> Happy to help :)
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I removed the spaces and it worked! I don't understand though. I'm
>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>
>>>>> Thanks for all your help though some what relieved now!
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Ghufran
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>>> userlogs say?
>>>>>>
>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>> instead of tabs (for your input graph)?
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> No even after I added the .txt it gets to map 100% then drops
back
>>>>>>> down to 50 and gives me the error:
>>>>>>>
>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input
>>>>>>> format specified. Ensure your InputFormat does not require one.
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format vertex index type is not known
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format vertex value type is not known
>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output
>>>>>>> format edge value type is not known
>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing
is
>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>> job_201403311622_0004
>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>> job_201403311622_0004
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent
by all
>>>>>>> reduces waiting after reserving slots (ms)=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent
by all
>>>>>>> maps waiting after reserving slots (ms)=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map tasks=2
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>>
>>>>>>>
>>>>>>> I did a check to make sure the graph was being stored correctly
by
>>>>>>> doing:
>>>>>>>
>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs
-cat
>>>>>>> input/*
>>>>>>> 1 2
>>>>>>> 2 1 3 4
>>>>>>> 3 2
>>>>>>> 4 2
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message