hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <surajsme...@apache.org>
Subject Re: Error with fastgen input
Date Tue, 05 Mar 2013 23:06:13 GMT
No, the partitions we write locally need not be sorted. Sorry for the
confusion. The Superstep injection is possible with Superstep API. There
are few enhancements needed to make it simpler after I last worked on it.
We can then look into partitioning superstep being executed before the
setup of first superstep of submitted job. I think it is feasible.

On Tue, Mar 5, 2013 at 5:48 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:

> > spilling queue and sorted spilling queue, can we inject the partitioning
> > superstep as the first superstep and use local memory?
>
> Actually, I wanted to add something before calling BSP.setup() method
> to avoid execute additional BSP job. But, in my opinion, current is
> enough. I think, we need to collect more experiences of input
> partitioning on large environments. I'll do.
>
> BTW, I still don't know why it need to be Sorted?! MR-like?
>
> On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <surajsmenon@apache.org>
> wrote:
> > Sorry, I am increasing the scope here to outside graph module. When we
> have
> > spilling queue and sorted spilling queue, can we inject the partitioning
> > superstep as the first superstep and use local memory?
> > Today we have partitioning job within a job and are creating two copies
> of
> > data on HDFS. This could be really costly. Is it possible to create or
> > redistribute the partitions on local memory and initialize the record
> > reader there?
> > The user can run a separate job give in examples area to explicitly
> > repartition the data on HDFS. The deployment question is how much of disk
> > space gets allocated for local memory usage? Would it be a safe approach
> > with the limitations?
> >
> > -Suraj
> >
> > On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> > <thomas.jungblut@gmail.com>wrote:
> >
> >> yes. Once Suraj added merging of sorted files we can add this to the
> >> partitioner pretty easily.
> >>
> >> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> >>
> >> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
> >> >
> >> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> >> > <thomas.jungblut@gmail.com> wrote:
> >> > > Now I get how the partitioning works, obviously if you merge n
> sorted
> >> > files
> >> > > by just appending to each other, this will result in totally
> unsorted
> >> > data
> >> > > ;-)
> >> > > Why didn't you solve this via messaging?
> >> > >
> >> > > 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> > >
> >> > >> Seems that they are not correctly sorted:
> >> > >>
> >> > >> vertexID: 50
> >> > >> vertexID: 52
> >> > >> vertexID: 54
> >> > >> vertexID: 56
> >> > >> vertexID: 58
> >> > >> vertexID: 61
> >> > >> ...
> >> > >> vertexID: 78
> >> > >> vertexID: 81
> >> > >> vertexID: 83
> >> > >> vertexID: 85
> >> > >> ...
> >> > >> vertexID: 94
> >> > >> vertexID: 96
> >> > >> vertexID: 98
> >> > >> vertexID: 1
> >> > >> vertexID: 10
> >> > >> vertexID: 12
> >> > >> vertexID: 14
> >> > >> vertexID: 16
> >> > >> vertexID: 18
> >> > >> vertexID: 21
> >> > >> vertexID: 23
> >> > >> vertexID: 25
> >> > >> vertexID: 27
> >> > >> vertexID: 29
> >> > >> vertexID: 3
> >> > >>
> >> > >> So this won't work then correctly...
> >> > >>
> >> > >>
> >> > >> 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> > >>
> >> > >>> sure, have fun on your holidays.
> >> > >>>
> >> > >>>
> >> > >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> >> > >>>
> >> > >>>> Sure, but if you can fix quickly, please do. March 1 is
> holiday[1]
> >> so
> >> > >>>> I'll appear next week.
> >> > >>>>
> >> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> >> > >>>>
> >> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> >> > >>>> <thomas.jungblut@gmail.com> wrote:
> >> > >>>> > Maybe 50 is missing from the file, didn't observe
if all items
> >> were
> >> > >>>> added.
> >> > >>>> > As far as I remember, I copy/pasted the logic of
the ID into
> the
> >> > >>>> fastgen,
> >> > >>>> > want to have a look into it?
> >> > >>>> >
> >> > >>>> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> >> > >>>> >
> >> > >>>> >> I guess, it's a bug of fastgen, when generate
adjacency matrix
> >> into
> >> > >>>> >> multiple files.
> >> > >>>> >>
> >> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
> >> > >>>> >> <thomas.jungblut@gmail.com> wrote:
> >> > >>>> >> > You have two files, are they partitioned
correctly?
> >> > >>>> >> >
> >> > >>>> >> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> >> > >>>> >> >
> >> > >>>> >> >> It looks like a bug.
> >> > >>>> >> >>
> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
ls -al
> >> /tmp/randomgraph/
> >> > >>>> >> >> total 44
> >> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월
28 18:03 .
> >> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월
28 18:04 ..
> >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월
28 18:01 part-00000
> >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월
28 18:01
> .part-00000.crc
> >> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월
28 18:01 part-00001
> >> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월
28 18:01
> .part-00001.crc
> >> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월
28 18:03 partitions
> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
ls -al
> >> > >>>> >> /tmp/randomgraph/partitions/
> >> > >>>> >> >> total 24
> >> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월
28 18:03 .
> >> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월
28 18:03 ..
> >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월
28 18:03 part-00000
> >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월
28 18:03
> .part-00000.crc
> >> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월
28 18:03 part-00001
> >> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월
28 18:03
> .part-00001.crc
> >> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> >> > >>>> >> >>
> >> > >>>> >> >>
> >> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward
<edward@udanax.org
> >
> >> > wrote:
> >> > >>>> >> >> > yes i'll check again
> >> > >>>> >> >> >
> >> > >>>> >> >> > Sent from my iPhone
> >> > >>>> >> >> >
> >> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
Jungblut <
> >> > >>>> >> thomas.jungblut@gmail.com>
> >> > >>>> >> >> wrote:
> >> > >>>> >> >> >
> >> > >>>> >> >> >> Can you verify an observation
for me please?
> >> > >>>> >> >> >>
> >> > >>>> >> >> >> 2 files are created from fastgen,
part-00000 and
> >> part-00001,
> >> > >>>> both
> >> > >>>> >> ~2.2kb
> >> > >>>> >> >> >> sized.
> >> > >>>> >> >> >> In the below partition directory,
there is only a single
> >> > 5.56kb
> >> > >>>> file.
> >> > >>>> >> >> >>
> >> > >>>> >> >> >> Is it intended for the partitioner
to write a single
> file
> >> if
> >> > you
> >> > >>>> >> >> configured
> >> > >>>> >> >> >> two?
> >> > >>>> >> >> >> It even reads it as a two files,
strange huh?
> >> > >>>> >> >> >>
> >> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> > >>>> >> >> >>
> >> > >>>> >> >> >>> Will have a look into it.
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
1
> >> > >>>> >> >> >>> pagerank /tmp/randomgraph
/tmp/pageout
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> did work for me the last
time I profiled, maybe the
> >> > >>>> partitioning
> >> > >>>> >> >> doesn't
> >> > >>>> >> >> >>> partition correctly with
the input or something else.
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon
<edwardyoon@apache.org>
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>> Fastgen input seems not
work for graph examples.
> >> > >>>> >> >> >>>>
> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> >> bin/hama
> >> > jar
> >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
gen
> >> > fastgen
> >> > >>>> 100 10
> >> > >>>> >> >> >>>> /tmp/randomgraph 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN
util.NativeCodeLoader: Unable
> to
> >> > load
> >> > >>>> >> >> >>>> native-hadoop library
for your platform... using
> >> > builtin-java
> >> > >>>> >> classes
> >> > >>>> >> >> >>>> where applicable
> >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
bsp.BSPJobClient: Running job:
> >> > >>>> >> >> job_localrunner_0001
> >> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO
bsp.LocalBSPRunner: Setting up
> a
> >> new
> >> > >>>> barrier
> >> > >>>> >> >> for 2
> >> > >>>> >> >> >>>> tasks!
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient: Current
> >> supersteps
> >> > >>>> >> number: 0
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient: The total
> number
> >> > of
> >> > >>>> >> >> supersteps: 0
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient: Counters: 3
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient:
> SUPERSTEPS=0
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient:
> >> > LAUNCHED_TASKS=2
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO
bsp.BSPJobClient:
> >> > >>>> >> TASK_OUTPUT_RECORDS=100
> >> > >>>> >> >> >>>> Job Finished in 3.212
seconds
> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> >> bin/hama
> >> > jar
> >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
> >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> >> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> >> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> >> bin/hama
> >> > jar
> >> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
> pagerank
> >> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> >> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN
util.NativeCodeLoader: Unable
> to
> >> > load
> >> > >>>> >> >> >>>> native-hadoop library
for your platform... using
> >> > builtin-java
> >> > >>>> >> classes
> >> > >>>> >> >> >>>> where applicable
> >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
bsp.FileInputFormat: Total
> input
> >> > paths
> >> > >>>> to
> >> > >>>> >> >> process
> >> > >>>> >> >> >>>> : 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO
bsp.FileInputFormat: Total
> input
> >> > paths
> >> > >>>> to
> >> > >>>> >> >> process
> >> > >>>> >> >> >>>> : 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
bsp.BSPJobClient: Running job:
> >> > >>>> >> >> job_localrunner_0001
> >> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO
bsp.LocalBSPRunner: Setting up
> a
> >> new
> >> > >>>> barrier
> >> > >>>> >> >> for 2
> >> > >>>> >> >> >>>> tasks!
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient: Current
> >> supersteps
> >> > >>>> >> number: 1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient: The total
> number
> >> > of
> >> > >>>> >> >> supersteps: 1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient: Counters: 6
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> SUPERSTEPS=1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > LAUNCHED_TASKS=2
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > SUPERSTEP_SUM=4
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > >>>> IO_BYTES_READ=4332
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > >>>> TIME_IN_SYNC_MS=14
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient:
> >> > >>>> TASK_INPUT_RECORDS=100
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.FileInputFormat: Total
> input
> >> > paths
> >> > >>>> to
> >> > >>>> >> >> process
> >> > >>>> >> >> >>>> : 2
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.BSPJobClient: Running job:
> >> > >>>> >> >> job_localrunner_0001
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
bsp.LocalBSPRunner: Setting up
> a
> >> new
> >> > >>>> barrier
> >> > >>>> >> >> for 2
> >> > >>>> >> >> >>>> tasks!
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
graph.GraphJobRunner: 50
> vertices
> >> > are
> >> > >>>> loaded
> >> > >>>> >> >> into
> >> > >>>> >> >> >>>> local:1
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO
graph.GraphJobRunner: 50
> vertices
> >> > are
> >> > >>>> loaded
> >> > >>>> >> >> into
> >> > >>>> >> >> >>>> local:0
> >> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
bsp.LocalBSPRunner: Exception
> >> > during
> >> > >>>> BSP
> >> > >>>> >> >> >>>> execution!
> >> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
Messages must
> never
> >> be
> >> > >>>> behind
> >> > >>>> >> the
> >> > >>>> >> >> >>>> vertex in ID! Current
Message ID: 1 vs. 50
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >>
> >> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> > >>>> >> >> >>>>        at
> >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>>
> >> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >> > >>>> >> >> >>>>        at
> >> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> > >>>> >> >> >>>>        at
> >> > >>>> >> >> >>>>
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>>
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
> >> > >>>> >> >> >>>>
> >> > >>>> >> >> >>>>
> >> > >>>> >> >> >>>> --
> >> > >>>> >> >> >>>> Best Regards, Edward
J. Yoon
> >> > >>>> >> >> >>>> @eddieyoon
> >> > >>>> >> >> >>>
> >> > >>>> >> >> >>>
> >> > >>>> >> >>
> >> > >>>> >> >>
> >> > >>>> >> >>
> >> > >>>> >> >> --
> >> > >>>> >> >> Best Regards, Edward J. Yoon
> >> > >>>> >> >> @eddieyoon
> >> > >>>> >> >>
> >> > >>>> >>
> >> > >>>> >>
> >> > >>>> >>
> >> > >>>> >> --
> >> > >>>> >> Best Regards, Edward J. Yoon
> >> > >>>> >> @eddieyoon
> >> > >>>> >>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Best Regards, Edward J. Yoon
> >> > >>>> @eddieyoon
> >> > >>>>
> >> > >>>
> >> > >>>
> >> > >>
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards, Edward J. Yoon
> >> > @eddieyoon
> >> >
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message