Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F93E10F18 for ; Mon, 31 Mar 2014 21:02:11 +0000 (UTC) Received: (qmail 18541 invoked by uid 500); 31 Mar 2014 21:02:10 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 18488 invoked by uid 500); 31 Mar 2014 21:02:09 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 18476 invoked by uid 99); 31 Mar 2014 21:02:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 21:02:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ghufran1malik@gmail.com designates 74.125.82.193 as permitted sender) Received: from [74.125.82.193] (HELO mail-we0-f193.google.com) (74.125.82.193) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 21:02:02 +0000 Received: by mail-we0-f193.google.com with SMTP id w61so1674117wes.8 for ; Mon, 31 Mar 2014 14:01:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=PukinAFopkEwNHuvdvaE7fMHFcQI7BhsrbqbTndVrR0=; b=hWJwoEF6+XpDfevU/A4T0EhvB+aHTij6L23NiweevX1uYLtTgr5eMWkMWyltT+9vjn UA7K96N/5qHmGF2/DJ787snp5tsocGmdRn7u+bEa9TuMy4s5Vo21AsmpXPB2uOI9qpqK z6Uff8/EOBY9CvcRFQhkxqb5bCU2mtUKJoO+QoBk+eI/oK/QkTsizrCQa6sLLqpORqLf pcwGXt8zaKmpydXn+0Ra7R3allJ6Tkti/4PCR6eVXTcWSim/HY7Wt6RN8dknIfgV6L0z hm5lEz10+IrXFVNvw7GGsLaAMWcGQSL7wzqIc6fG4kbpD7o8+g+T2+TVOe3LlGUt9Yjm Y+ag== MIME-Version: 1.0 X-Received: by 10.180.96.225 with SMTP id dv1mr14748347wib.37.1396299701304; Mon, 31 Mar 2014 14:01:41 -0700 (PDT) Received: by 10.194.134.226 with HTTP; Mon, 31 Mar 2014 14:01:41 -0700 (PDT) In-Reply-To: References: Date: Mon, 31 Mar 2014 22:01:41 +0100 Message-ID: Subject: Re: ConnectedComponents example From: ghufran malik To: Young Han , user@giraph.apache.org Content-Type: multipart/alternative; boundary=f46d04428a6cb246c504f5ed5bb9 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04428a6cb246c504f5ed5bb9 Content-Type: text/plain; charset=ISO-8859-1 the output your code produced is: --3-- --4-- ---- ---- ---- --5-- ---- ---- ---- --6-- ---- ---- ---- --7-- it's because of the space between the \t and closing ] in [\t ]. This will separate output by a space. Whereas if you just have [\t] it will separate this out using tab spacing. Thanks for clearing that up for me! Ghufran On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik wrote: > Hey, > > Yes when originally debugging the code I thought to check what \t actually > split by and created my own test class: > > import java.util.regex.Pattern; > > class App > { > private static final Pattern SEPARATOR = Pattern.compile("[\t ]"); > public static void main( String[] args ) > { > String line = "1 0 2"; > String[] tokens = SEPARATOR.split(line.toString()); > > System.out.println(SEPARATOR); > System.out.println(tokens.length); > > for(String token : tokens){ > > System.out.println(token); > } > } > } > > and the pattern worked as I thought it should by tab spaces. > > I'll try your test as well to double check > > > On Mon, Mar 31, 2014 at 9:34 PM, Young Han wrote: > >> Weird, inputs with tabs work for me right out of the box. Either the "\t" >> is not the cause or it's some Java-version specific issue. Try this toy >> program: >> >> >> import java.util.regex.Pattern; >> >> public class Test { >> public static void main(String[] args) { >> Pattern SEPARATOR = Pattern.compile("[\t ]"); >> String[] tokens = SEPARATOR.split("3 4 5 6 7"); >> >> for (int i = 0; i < tokens.length; i++) { >> System.out.println("--" + tokens[i] + "--"); >> } >> } >> } >> >> >> Does it split the tabs properly for your Java? >> >> Young >> >> >> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik wrote: >> >>> Yep you right it is a bug with all the InputFormats I believe, I just >>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat >>> and the example ConnectedComponents class and it worked like a charm with >>> just the normal spacing. >>> >>> >>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han wrote: >>> >>>> Huh, it might be a bug in the code. Could it be that Pattern.compile >>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? If >>>> so, that bug is in all the input formats... >>>> >>>> Happy to help :) >>>> >>>> Young >>>> >>>> >>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik >>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I removed the spaces and it worked! I don't understand though. I'm >>>>> sure the separator pattern means that it splits it by tab spaces?. >>>>> >>>>> Thanks for all your help though some what relieved now! >>>>> >>>>> Kind regards, >>>>> >>>>> Ghufran >>>>> >>>>> >>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> That looks like an error with the algorithm... What do the Hadoop >>>>>> userlogs say? >>>>>> >>>>>> And just to rule out weirdness, what happens if you use spaces >>>>>> instead of tabs (for your input graph)? >>>>>> >>>>>> Young >>>>>> >>>>>> >>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik < >>>>>> ghufran1malik@gmail.com> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> No even after I added the .txt it gets to map 100% then drops back >>>>>>> down to 50 and gives me the error: >>>>>>> >>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input >>>>>>> format specified. Ensure your InputFormat does not require one. >>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>> format vertex index type is not known >>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>> format vertex value type is not known >>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>> format edge value type is not known >>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is >>>>>>> disabled (default), do not allow any task retries (setting >>>>>>> mapred.map.max.attempts = 0, old value = 4) >>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: >>>>>>> job_201403311622_0004 >>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient: map 0% reduce 0% >>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient: map 50% reduce 0% >>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient: map 100% reduce 0% >>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient: map 50% reduce 0% >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete: >>>>>>> job_201403311622_0004 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job Counters >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: >>>>>>> SLOTS_MILLIS_MAPS=1238858 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>>>> reduces waiting after reserving slots (ms)=0 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>>>> maps waiting after reserving slots (ms)=0 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Launched map tasks=2 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Failed map tasks=1 >>>>>>> >>>>>>> >>>>>>> I did a check to make sure the graph was being stored correctly by >>>>>>> doing: >>>>>>> >>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat >>>>>>> input/* >>>>>>> 1 2 >>>>>>> 2 1 3 4 >>>>>>> 3 2 >>>>>>> 4 2 >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --f46d04428a6cb246c504f5ed5bb9 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
the output your code produced is:

--3--
<= div>--4--
----
----
----
--5--
----
----
----
--6--
----
----
----
--7--

it's because of the spa= ce between the \t and closing ] in [\t ]. This will separate output by a sp= ace. Whereas if you just have [\t] it will separate this out using tab spac= ing.

Thanks for clearing that up for me!=A0

Ghufran= =A0


On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <ghufran1malik@gma= il.com> wrote:
Hey,=A0

Yes when ori= ginally debugging the code I thought to check what \t actually split by and= created my own test class:

import java.util.regex.Pattern;

=A0class = App=A0
{
=A0private s= tatic final Pattern SEPARATOR =3D Pattern.compile("[\t ]");
=
=A0 =A0 public= static void main( String[] args )
=A0 =A0 {
=A0 =A0 String line =3D "1 0 2";
=A0 =A0 String[] tokens =3D SEPARATOR.split= (line.toString());
=A0 =A0 =A0
=A0= =A0 System.out.println(SEPAR= ATOR);
=A0 =A0 Syst= em.out.println(tokens.length);
=A0 =A0 =A0
=A0= =A0 for(String token : token= s){
=A0 =A0 =A0
=A0 =A0 System.out.println(t= oken);
=A0 =A0 }
=A0 =A0 }
}

and the pattern work= ed as I thought it should by tab spaces.=A0

I'll try your test as well to double check


On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young.= han@uwaterloo.ca> wrote:
Weird, inputs wit= h tabs work for me right out of the box. Either the "\t" is not t= he cause or it's some Java-version specific issue. Try this toy program= :


import java.util.regex.Pattern;

public class Test {
=A0 public static void main(String[] args) {
= =A0=A0=A0 Pattern SEPARATOR =3D Pattern.compile("[\t ]");
=A0= =A0=A0 String[] tokens =3D SEPARATOR.split("3 4=A0=A0=A0 5=A0=A0=A0 6= =A0=A0=A0 7");

=A0=A0=A0 for (int i =3D 0; i < tokens.length= ; i++) {
=A0=A0=A0=A0=A0 System.out.println("--" + tokens[i] + "--&qu= ot;);
=A0=A0=A0 }
=A0 }
}


Does it split the tabs = properly for your Java?

=
Young


On Mon, Mar 31, 2014 at 4:19 PM, ghufran= malik <ghufran1malik@gmail.com> wrote:
Yep you right it is a bug with all the InputFormats I beli= eve, =A0I just checked it with the Giraph 1.1.0 jar using the IntIntNullVer= texInputFormat and the example ConnectedComponents class and it worked like= a charm with just the normal spacing.=A0


On Mon, Mar 3= 1, 2014 at 9:15 PM, Young Han <young.han@uwaterloo.ca> = wrote:
Huh, it might be a bug= in the code. Could it be that Pattern.compile has to take "[\\t ]&quo= t; (note the double backslash) to properly match tabs? If so, that bug is i= n all the input formats...

Happy to help :)

Young


On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <g= hufran1malik@gmail.com> wrote:
Hi,=A0

I= removed the spaces and it worked! I don't understand though. I'm s= ure the separator pattern means that it splits it by tab spaces?.=A0

Thanks for all your help though some what relieved now!=A0

Kind regards,=A0

G= hufran=A0

On Mon, Mar 31, 2014 at 8:15 PM, Young Han = <young.han@uwaterloo.ca> wrote:
Hi,

= That looks like an error with the algorithm... What do the Hadoop userlogs = say?

And just to rule out weirdness, what happens if you use spac= es instead of tabs (for your input graph)?

Young
<= /font>


On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <g= hufran1malik@gmail.com> wrote:
=
Hey,=A0

No even af= ter I added the .txt it gets to map 100% then drops back down to 50 and giv= es me the error: =A0

14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format speci= fied. Ensure your InputFormat does not require one.
14/03/31 18:2= 2:56 WARN job.GiraphConfigurationValidator: Output format vertex index type= is not known
14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format= vertex value type is not known
14/03/31 18:22:56 WARN job.Giraph= ConfigurationValidator: Output format edge value type is not known
14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is disa= bled (default), do not allow any task retries (setting mapred.map.max.attem= pts =3D 0, old value =3D 4)
14/03/31 18:22:57 INFO mapred.JobClie= nt: Running job: job_201403311622_0004
14/03/31 18:22:58 INFO mapred.JobClient: =A0map 0% reduce 0%
14/03/31 18:23:16 INFO mapred.JobClient: =A0map 50% reduce 0%
14= /03/31 18:23:19 INFO mapred.JobClient: =A0map 100% reduce 0%
14/0= 3/31 18:33:25 INFO mapred.JobClient: =A0map 50% reduce 0%
14/03/31 18:33:30 INFO mapred.JobClient: Job complete: job_20140331162= 2_0004
14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6
=
14/03/31 18:33:30 INFO mapred.JobClient: =A0 Job Counters=A0
14/03/31 18:33:30 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS_MAPS=3D123885= 8
14/03/31 18:33:30 INFO mapred.JobClient: =A0 =A0 Total time spe= nt by all reduces waiting after reserving slots (ms)=3D0
14/03/31= 18:33:30 INFO mapred.JobClient: =A0 =A0 Total time spent by all maps waiti= ng after reserving slots (ms)=3D0
14/03/31 18:33:30 INFO mapred.JobClient: =A0 =A0 Launched map tasks=3D= 2
14/03/31 18:33:30 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS_R= EDUCES=3D0
14/03/31 18:33:30 INFO mapred.JobClient: =A0 =A0 Faile= d map tasks=3D1


I did a check to make sure the graph was= being stored correctly by doing:=A0

ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat i= nput/*
1 2<= /div>
2 1 3 4=A0
3 2
4= 2 =A0=A0







--f46d04428a6cb246c504f5ed5bb9--