giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ghufran malik <ghufran1ma...@gmail.com>
Subject Re: ConnectedComponents example
Date Mon, 31 Mar 2014 21:05:07 GMT
Hmm yea, the only difference between mine and your system is the hadoop
your using and maybe the jdk. I think it's most likely something to do with
the jdk in this respect.


On Mon, Mar 31, 2014 at 10:01 PM, ghufran malik <ghufran1malik@gmail.com>wrote:

> the output your code produced is:
>
> --3--
> --4--
> ----
> ----
> ----
> --5--
> ----
> ----
> ----
> --6--
> ----
> ----
> ----
> --7--
>
> it's because of the space between the \t and closing ] in [\t ]. This will
> separate output by a space. Whereas if you just have [\t] it will separate
> this out using tab spacing.
>
> Thanks for clearing that up for me!
>
> Ghufran
>
>
> On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>
>> Hey,
>>
>> Yes when originally debugging the code I thought to check what \t
>> actually split by and created my own test class:
>>
>> import java.util.regex.Pattern;
>>
>>  class App
>> {
>>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>>     public static void main( String[] args )
>>     {
>>     String line = "1 0 2";
>>      String[] tokens = SEPARATOR.split(line.toString());
>>
>>      System.out.println(SEPARATOR);
>>      System.out.println(tokens.length);
>>
>>      for(String token : tokens){
>>
>>      System.out.println(token);
>>      }
>>     }
>> }
>>
>> and the pattern worked as I thought it should by tab spaces.
>>
>> I'll try your test as well to double check
>>
>>
>> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>
>>> Weird, inputs with tabs work for me right out of the box. Either the
>>> "\t" is not the cause or it's some Java-version specific issue. Try this
>>> toy program:
>>>
>>>
>>> import java.util.regex.Pattern;
>>>
>>> public class Test {
>>>   public static void main(String[] args) {
>>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>>
>>>     for (int i = 0; i < tokens.length; i++) {
>>>       System.out.println("--" + tokens[i] + "--");
>>>     }
>>>   }
>>> }
>>>
>>>
>>> Does it split the tabs properly for your Java?
>>>
>>> Young
>>>
>>>
>>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>>>
>>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>>> and the example ConnectedComponents class and it worked like a charm with
>>>> just the normal spacing.
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>>
>>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs?
If
>>>>> so, that bug is in all the input formats...
>>>>>
>>>>> Happy to help :)
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <
>>>>> ghufran1malik@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I removed the spaces and it worked! I don't understand though. I'm
>>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>>
>>>>>> Thanks for all your help though some what relieved now!
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Ghufran
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> That looks like an error with the algorithm... What do the Hadoop
>>>>>>> userlogs say?
>>>>>>>
>>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>>> instead of tabs (for your input graph)?
>>>>>>>
>>>>>>> Young
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> No even after I added the .txt it gets to map 100% then drops
back
>>>>>>>> down to 50 and gives me the error:
>>>>>>>>
>>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge
input
>>>>>>>> format specified. Ensure your InputFormat does not require
one.
>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator:
Output
>>>>>>>> format vertex index type is not known
>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator:
Output
>>>>>>>> format vertex value type is not known
>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator:
Output
>>>>>>>> format edge value type is not known
>>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing
is
>>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>>> job_201403311622_0004
>>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce
0%
>>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>>> job_201403311622_0004
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent
by
>>>>>>>> all reduces waiting after reserving slots (ms)=0
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time spent
by
>>>>>>>> all maps waiting after reserving slots (ms)=0
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched map
tasks=2
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map tasks=1
>>>>>>>>
>>>>>>>>
>>>>>>>> I did a check to make sure the graph was being stored correctly
by
>>>>>>>> doing:
>>>>>>>>
>>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop
dfs -cat
>>>>>>>> input/*
>>>>>>>> 1 2
>>>>>>>> 2 1 3 4
>>>>>>>> 3 2
>>>>>>>> 4 2
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message