giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: ConnectedComponents example
Date Mon, 31 Mar 2014 21:12:03 GMT
That's pretty interesting. Forgot to mention, the output I get is

--3--
--4--
--5--
--6--
--7--

So it does look like something is up with Java.

Young


On Mon, Mar 31, 2014 at 5:05 PM, ghufran malik <ghufran1malik@gmail.com>wrote:

> Hmm yea, the only difference between mine and your system is the hadoop
> your using and maybe the jdk. I think it's most likely something to do with
> the jdk in this respect.
>
>
> On Mon, Mar 31, 2014 at 10:01 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>
>> the output your code produced is:
>>
>> --3--
>> --4--
>> ----
>> ----
>> ----
>> --5--
>> ----
>> ----
>> ----
>> --6--
>> ----
>> ----
>> ----
>> --7--
>>
>> it's because of the space between the \t and closing ] in [\t ]. This
>> will separate output by a space. Whereas if you just have [\t] it will
>> separate this out using tab spacing.
>>
>> Thanks for clearing that up for me!
>>
>> Ghufran
>>
>>
>> On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <ghufran1malik@gmail.com>wrote:
>>
>>> Hey,
>>>
>>> Yes when originally debugging the code I thought to check what \t
>>> actually split by and created my own test class:
>>>
>>> import java.util.regex.Pattern;
>>>
>>>  class App
>>> {
>>>   private static final Pattern SEPARATOR = Pattern.compile("[\t ]");
>>>     public static void main( String[] args )
>>>     {
>>>     String line = "1 0 2";
>>>      String[] tokens = SEPARATOR.split(line.toString());
>>>
>>>      System.out.println(SEPARATOR);
>>>      System.out.println(tokens.length);
>>>
>>>      for(String token : tokens){
>>>
>>>      System.out.println(token);
>>>      }
>>>     }
>>> }
>>>
>>> and the pattern worked as I thought it should by tab spaces.
>>>
>>> I'll try your test as well to double check
>>>
>>>
>>> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>
>>>> Weird, inputs with tabs work for me right out of the box. Either the
>>>> "\t" is not the cause or it's some Java-version specific issue. Try this
>>>> toy program:
>>>>
>>>>
>>>> import java.util.regex.Pattern;
>>>>
>>>> public class Test {
>>>>   public static void main(String[] args) {
>>>>     Pattern SEPARATOR = Pattern.compile("[\t ]");
>>>>     String[] tokens = SEPARATOR.split("3 4    5    6    7");
>>>>
>>>>     for (int i = 0; i < tokens.length; i++) {
>>>>       System.out.println("--" + tokens[i] + "--");
>>>>     }
>>>>   }
>>>> }
>>>>
>>>>
>>>> Does it split the tabs properly for your Java?
>>>>
>>>> Young
>>>>
>>>>
>>>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1malik@gmail.com
>>>> > wrote:
>>>>
>>>>> Yep you right it is a bug with all the InputFormats I believe,  I just
>>>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat
>>>>> and the example ConnectedComponents class and it worked like a charm
with
>>>>> just the normal spacing.
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>>>
>>>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile
>>>>>> has to take "[\\t ]" (note the double backslash) to properly match
tabs? If
>>>>>> so, that bug is in all the input formats...
>>>>>>
>>>>>> Happy to help :)
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <
>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I removed the spaces and it worked! I don't understand though.
I'm
>>>>>>> sure the separator pattern means that it splits it by tab spaces?.
>>>>>>>
>>>>>>> Thanks for all your help though some what relieved now!
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Ghufran
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young.han@uwaterloo.ca>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> That looks like an error with the algorithm... What do the
Hadoop
>>>>>>>> userlogs say?
>>>>>>>>
>>>>>>>> And just to rule out weirdness, what happens if you use spaces
>>>>>>>> instead of tabs (for your input graph)?
>>>>>>>>
>>>>>>>> Young
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <
>>>>>>>> ghufran1malik@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> No even after I added the .txt it gets to map 100% then
drops back
>>>>>>>>> down to 50 and gives me the error:
>>>>>>>>>
>>>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge
input
>>>>>>>>> format specified. Ensure your InputFormat does not require
one.
>>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator:
Output
>>>>>>>>> format vertex index type is not known
>>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator:
Output
>>>>>>>>> format vertex value type is not known
>>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator:
Output
>>>>>>>>> format edge value type is not known
>>>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing
is
>>>>>>>>> disabled (default), do not allow any task retries (setting
>>>>>>>>> mapred.map.max.attempts = 0, old value = 4)
>>>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job:
>>>>>>>>> job_201403311622_0004
>>>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient:  map 100% reduce
0%
>>>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete:
>>>>>>>>> job_201403311622_0004
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:   Job Counters
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:
>>>>>>>>> SLOTS_MILLIS_MAPS=1238858
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time
spent by
>>>>>>>>> all reduces waiting after reserving slots (ms)=0
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Total time
spent by
>>>>>>>>> all maps waiting after reserving slots (ms)=0
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Launched
map tasks=2
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient:     Failed map
tasks=1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did a check to make sure the graph was being stored
correctly by
>>>>>>>>> doing:
>>>>>>>>>
>>>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop
dfs
>>>>>>>>> -cat input/*
>>>>>>>>> 1 2
>>>>>>>>> 2 1 3 4
>>>>>>>>> 3 2
>>>>>>>>> 4 2
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message