hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <surajsme...@apache.org>
Subject Re: Error with fastgen input
Date Thu, 28 Feb 2013 14:20:32 GMT
Sorry, I am increasing the scope here to outside graph module. When we have
spilling queue and sorted spilling queue, can we inject the partitioning
superstep as the first superstep and use local memory?
Today we have partitioning job within a job and are creating two copies of
data on HDFS. This could be really costly. Is it possible to create or
redistribute the partitions on local memory and initialize the record
reader there?
The user can run a separate job give in examples area to explicitly
repartition the data on HDFS. The deployment question is how much of disk
space gets allocated for local memory usage? Would it be a safe approach
with the limitations?

-Suraj

On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
<thomas.jungblut@gmail.com>wrote:

> yes. Once Suraj added merging of sorted files we can add this to the
> partitioner pretty easily.
>
> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
>
> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
> >
> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
> > <thomas.jungblut@gmail.com> wrote:
> > > Now I get how the partitioning works, obviously if you merge n sorted
> > files
> > > by just appending to each other, this will result in totally unsorted
> > data
> > > ;-)
> > > Why didn't you solve this via messaging?
> > >
> > > 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
> > >
> > >> Seems that they are not correctly sorted:
> > >>
> > >> vertexID: 50
> > >> vertexID: 52
> > >> vertexID: 54
> > >> vertexID: 56
> > >> vertexID: 58
> > >> vertexID: 61
> > >> ...
> > >> vertexID: 78
> > >> vertexID: 81
> > >> vertexID: 83
> > >> vertexID: 85
> > >> ...
> > >> vertexID: 94
> > >> vertexID: 96
> > >> vertexID: 98
> > >> vertexID: 1
> > >> vertexID: 10
> > >> vertexID: 12
> > >> vertexID: 14
> > >> vertexID: 16
> > >> vertexID: 18
> > >> vertexID: 21
> > >> vertexID: 23
> > >> vertexID: 25
> > >> vertexID: 27
> > >> vertexID: 29
> > >> vertexID: 3
> > >>
> > >> So this won't work then correctly...
> > >>
> > >>
> > >> 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
> > >>
> > >>> sure, have fun on your holidays.
> > >>>
> > >>>
> > >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> > >>>
> > >>>> Sure, but if you can fix quickly, please do. March 1 is holiday[1]
> so
> > >>>> I'll appear next week.
> > >>>>
> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
> > >>>>
> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
> > >>>> <thomas.jungblut@gmail.com> wrote:
> > >>>> > Maybe 50 is missing from the file, didn't observe if all items
> were
> > >>>> added.
> > >>>> > As far as I remember, I copy/pasted the logic of the ID into
the
> > >>>> fastgen,
> > >>>> > want to have a look into it?
> > >>>> >
> > >>>> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> > >>>> >
> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
matrix
> into
> > >>>> >> multiple files.
> > >>>> >>
> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
> > >>>> >> <thomas.jungblut@gmail.com> wrote:
> > >>>> >> > You have two files, are they partitioned correctly?
> > >>>> >> >
> > >>>> >> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> > >>>> >> >
> > >>>> >> >> It looks like a bug.
> > >>>> >> >>
> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> /tmp/randomgraph/
> > >>>> >> >> total 44
> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28 18:03
.
> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28 18:04
..
> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28 18:01
part-00000
> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
.part-00000.crc
> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28 18:01
part-00001
> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28 18:01
.part-00001.crc
> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28 18:03
partitions
> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls -al
> > >>>> >> /tmp/randomgraph/partitions/
> > >>>> >> >> total 24
> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28 18:03
.
> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28 18:03
..
> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28 18:03
part-00000
> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
.part-00000.crc
> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28 18:03
part-00001
> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28 18:03
.part-00001.crc
> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
> > >>>> >> >>
> > >>>> >> >>
> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <edward@udanax.org>
> > wrote:
> > >>>> >> >> > yes i'll check again
> > >>>> >> >> >
> > >>>> >> >> > Sent from my iPhone
> > >>>> >> >> >
> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas Jungblut
<
> > >>>> >> thomas.jungblut@gmail.com>
> > >>>> >> >> wrote:
> > >>>> >> >> >
> > >>>> >> >> >> Can you verify an observation for me
please?
> > >>>> >> >> >>
> > >>>> >> >> >> 2 files are created from fastgen, part-00000
and
> part-00001,
> > >>>> both
> > >>>> >> ~2.2kb
> > >>>> >> >> >> sized.
> > >>>> >> >> >> In the below partition directory, there
is only a single
> > 5.56kb
> > >>>> file.
> > >>>> >> >> >>
> > >>>> >> >> >> Is it intended for the partitioner to
write a single file
> if
> > you
> > >>>> >> >> configured
> > >>>> >> >> >> two?
> > >>>> >> >> >> It even reads it as a two files, strange
huh?
> > >>>> >> >> >>
> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
> > >>>> >> >> >>
> > >>>> >> >> >>> Will have a look into it.
> > >>>> >> >> >>>
> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
1
> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
> > >>>> >> >> >>>
> > >>>> >> >> >>> did work for me the last time I
profiled, maybe the
> > >>>> partitioning
> > >>>> >> >> doesn't
> > >>>> >> >> >>> partition correctly with the input
or something else.
> > >>>> >> >> >>>
> > >>>> >> >> >>>
> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
> > >>>> >> >> >>>
> > >>>> >> >> >>> Fastgen input seems not work for
graph examples.
> > >>>> >> >> >>>>
> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> bin/hama
> > jar
> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
gen
> > fastgen
> > >>>> 100 10
> > >>>> >> >> >>>> /tmp/randomgraph 2
> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
Unable to
> > load
> > >>>> >> >> >>>> native-hadoop library for your
platform... using
> > builtin-java
> > >>>> >> classes
> > >>>> >> >> >>>> where applicable
> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
Running job:
> > >>>> >> >> job_localrunner_0001
> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
Setting up a
> new
> > >>>> barrier
> > >>>> >> >> for 2
> > >>>> >> >> >>>> tasks!
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
Current
> supersteps
> > >>>> >> number: 0
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
The total number
> > of
> > >>>> >> >> supersteps: 0
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
Counters: 3
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
    SUPERSTEPS=0
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > LAUNCHED_TASKS=2
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
> > >>>> >> TASK_OUTPUT_RECORDS=100
> > >>>> >> >> >>>> Job Finished in 3.212 seconds
> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> bin/hama
> > jar
> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
> bin/hama
> > jar
> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
pagerank
> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
Unable to
> > load
> > >>>> >> >> >>>> native-hadoop library for your
platform... using
> > builtin-java
> > >>>> >> classes
> > >>>> >> >> >>>> where applicable
> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
Total input
> > paths
> > >>>> to
> > >>>> >> >> process
> > >>>> >> >> >>>> : 2
> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
Total input
> > paths
> > >>>> to
> > >>>> >> >> process
> > >>>> >> >> >>>> : 2
> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
Running job:
> > >>>> >> >> job_localrunner_0001
> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
Setting up a
> new
> > >>>> barrier
> > >>>> >> >> for 2
> > >>>> >> >> >>>> tasks!
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
Current
> supersteps
> > >>>> >> number: 1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
The total number
> > of
> > >>>> >> >> supersteps: 1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
Counters: 6
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
    SUPERSTEPS=1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > LAUNCHED_TASKS=2
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > SUPERSTEP_SUM=4
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> IO_BYTES_READ=4332
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> TIME_IN_SYNC_MS=14
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
> > >>>> TASK_INPUT_RECORDS=100
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
Total input
> > paths
> > >>>> to
> > >>>> >> >> process
> > >>>> >> >> >>>> : 2
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
Running job:
> > >>>> >> >> job_localrunner_0001
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
Setting up a
> new
> > >>>> barrier
> > >>>> >> >> for 2
> > >>>> >> >> >>>> tasks!
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner:
50 vertices
> > are
> > >>>> loaded
> > >>>> >> >> into
> > >>>> >> >> >>>> local:1
> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner:
50 vertices
> > are
> > >>>> loaded
> > >>>> >> >> into
> > >>>> >> >> >>>> local:0
> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR bsp.LocalBSPRunner:
Exception
> > during
> > >>>> BSP
> > >>>> >> >> >>>> execution!
> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
Messages must never
> be
> > >>>> behind
> > >>>> >> the
> > >>>> >> >> >>>> vertex in ID! Current Message
ID: 1 vs. 50
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >>
> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> >> >> >>>>        at
> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>>
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> >> >> >>>>        at
> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>> >> >> >>>>        at
> > >>>> >> >> >>>>
> > >>>> >> >>
> > >>>> >>
> > >>>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
> > >>>> >> >> >>>>
> > >>>> >> >> >>>>
> > >>>> >> >> >>>> --
> > >>>> >> >> >>>> Best Regards, Edward J. Yoon
> > >>>> >> >> >>>> @eddieyoon
> > >>>> >> >> >>>
> > >>>> >> >> >>>
> > >>>> >> >>
> > >>>> >> >>
> > >>>> >> >>
> > >>>> >> >> --
> > >>>> >> >> Best Regards, Edward J. Yoon
> > >>>> >> >> @eddieyoon
> > >>>> >> >>
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> --
> > >>>> >> Best Regards, Edward J. Yoon
> > >>>> >> @eddieyoon
> > >>>> >>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards, Edward J. Yoon
> > >>>> @eddieyoon
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message