hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Error with fastgen input
Date Tue, 05 Mar 2013 10:48:25 GMT
> spilling queue and sorted spilling queue, can we inject the partitioning
> superstep as the first superstep and use local memory?

Actually, I wanted to add something before calling BSP.setup() method
to avoid execute additional BSP job. But, in my opinion, current is
enough. I think, we need to collect more experiences of input
partitioning on large environments. I'll do.

BTW, I still don't know why it need to be Sorted?! MR-like?

On Thu, Feb 28, 2013 at 11:20 PM, Suraj Menon <surajsmenon@apache.org> wrote:
> Sorry, I am increasing the scope here to outside graph module. When we have
> spilling queue and sorted spilling queue, can we inject the partitioning
> superstep as the first superstep and use local memory?
> Today we have partitioning job within a job and are creating two copies of
> data on HDFS. This could be really costly. Is it possible to create or
> redistribute the partitions on local memory and initialize the record
> reader there?
> The user can run a separate job give in examples area to explicitly
> repartition the data on HDFS. The deployment question is how much of disk
> space gets allocated for local memory usage? Would it be a safe approach
> with the limitations?
>
> -Suraj
>
> On Thu, Feb 28, 2013 at 7:05 AM, Thomas Jungblut
> <thomas.jungblut@gmail.com>wrote:
>
>> yes. Once Suraj added merging of sorted files we can add this to the
>> partitioner pretty easily.
>>
>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
>>
>> > Eh,..... btw, is re-partitioned data really necessary to be Sorted?
>> >
>> > On Thu, Feb 28, 2013 at 7:48 PM, Thomas Jungblut
>> > <thomas.jungblut@gmail.com> wrote:
>> > > Now I get how the partitioning works, obviously if you merge n sorted
>> > files
>> > > by just appending to each other, this will result in totally unsorted
>> > data
>> > > ;-)
>> > > Why didn't you solve this via messaging?
>> > >
>> > > 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
>> > >
>> > >> Seems that they are not correctly sorted:
>> > >>
>> > >> vertexID: 50
>> > >> vertexID: 52
>> > >> vertexID: 54
>> > >> vertexID: 56
>> > >> vertexID: 58
>> > >> vertexID: 61
>> > >> ...
>> > >> vertexID: 78
>> > >> vertexID: 81
>> > >> vertexID: 83
>> > >> vertexID: 85
>> > >> ...
>> > >> vertexID: 94
>> > >> vertexID: 96
>> > >> vertexID: 98
>> > >> vertexID: 1
>> > >> vertexID: 10
>> > >> vertexID: 12
>> > >> vertexID: 14
>> > >> vertexID: 16
>> > >> vertexID: 18
>> > >> vertexID: 21
>> > >> vertexID: 23
>> > >> vertexID: 25
>> > >> vertexID: 27
>> > >> vertexID: 29
>> > >> vertexID: 3
>> > >>
>> > >> So this won't work then correctly...
>> > >>
>> > >>
>> > >> 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
>> > >>
>> > >>> sure, have fun on your holidays.
>> > >>>
>> > >>>
>> > >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
>> > >>>
>> > >>>> Sure, but if you can fix quickly, please do. March 1 is holiday[1]
>> so
>> > >>>> I'll appear next week.
>> > >>>>
>> > >>>> 1. http://en.wikipedia.org/wiki/Public_holidays_in_South_Korea
>> > >>>>
>> > >>>> On Thu, Feb 28, 2013 at 6:36 PM, Thomas Jungblut
>> > >>>> <thomas.jungblut@gmail.com> wrote:
>> > >>>> > Maybe 50 is missing from the file, didn't observe if all
items
>> were
>> > >>>> added.
>> > >>>> > As far as I remember, I copy/pasted the logic of the ID
into the
>> > >>>> fastgen,
>> > >>>> > want to have a look into it?
>> > >>>> >
>> > >>>> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
>> > >>>> >
>> > >>>> >> I guess, it's a bug of fastgen, when generate adjacency
matrix
>> into
>> > >>>> >> multiple files.
>> > >>>> >>
>> > >>>> >> On Thu, Feb 28, 2013 at 6:29 PM, Thomas Jungblut
>> > >>>> >> <thomas.jungblut@gmail.com> wrote:
>> > >>>> >> > You have two files, are they partitioned correctly?
>> > >>>> >> >
>> > >>>> >> > 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
>> > >>>> >> >
>> > >>>> >> >> It looks like a bug.
>> > >>>> >> >>
>> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
-al
>> /tmp/randomgraph/
>> > >>>> >> >> total 44
>> > >>>> >> >> drwxrwxr-x  3 edward edward  4096  2월 28
18:03 .
>> > >>>> >> >> drwxrwxrwt 19 root   root   20480  2월 28
18:04 ..
>> > >>>> >> >> -rwxrwxrwx  1 edward edward  2243  2월 28
18:01 part-00000
>> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
18:01 .part-00000.crc
>> > >>>> >> >> -rwxrwxrwx  1 edward edward  2251  2월 28
18:01 part-00001
>> > >>>> >> >> -rw-rw-r--  1 edward edward    28  2월 28
18:01 .part-00001.crc
>> > >>>> >> >> drwxrwxr-x  2 edward edward  4096  2월 28
18:03 partitions
>> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$ ls
-al
>> > >>>> >> /tmp/randomgraph/partitions/
>> > >>>> >> >> total 24
>> > >>>> >> >> drwxrwxr-x 2 edward edward 4096  2월 28
18:03 .
>> > >>>> >> >> drwxrwxr-x 3 edward edward 4096  2월 28
18:03 ..
>> > >>>> >> >> -rwxrwxrwx 1 edward edward 2932  2월 28
18:03 part-00000
>> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
18:03 .part-00000.crc
>> > >>>> >> >> -rwxrwxrwx 1 edward edward 2955  2월 28
18:03 part-00001
>> > >>>> >> >> -rw-rw-r-- 1 edward edward   32  2월 28
18:03 .part-00001.crc
>> > >>>> >> >> edward@udanax:~/workspace/hama-trunk$
>> > >>>> >> >>
>> > >>>> >> >>
>> > >>>> >> >> On Thu, Feb 28, 2013 at 5:27 PM, Edward <edward@udanax.org>
>> > wrote:
>> > >>>> >> >> > yes i'll check again
>> > >>>> >> >> >
>> > >>>> >> >> > Sent from my iPhone
>> > >>>> >> >> >
>> > >>>> >> >> > On Feb 28, 2013, at 5:18 PM, Thomas
Jungblut <
>> > >>>> >> thomas.jungblut@gmail.com>
>> > >>>> >> >> wrote:
>> > >>>> >> >> >
>> > >>>> >> >> >> Can you verify an observation for
me please?
>> > >>>> >> >> >>
>> > >>>> >> >> >> 2 files are created from fastgen,
part-00000 and
>> part-00001,
>> > >>>> both
>> > >>>> >> ~2.2kb
>> > >>>> >> >> >> sized.
>> > >>>> >> >> >> In the below partition directory,
there is only a single
>> > 5.56kb
>> > >>>> file.
>> > >>>> >> >> >>
>> > >>>> >> >> >> Is it intended for the partitioner
to write a single file
>> if
>> > you
>> > >>>> >> >> configured
>> > >>>> >> >> >> two?
>> > >>>> >> >> >> It even reads it as a two files,
strange huh?
>> > >>>> >> >> >>
>> > >>>> >> >> >> 2013/2/28 Thomas Jungblut <thomas.jungblut@gmail.com>
>> > >>>> >> >> >>
>> > >>>> >> >> >>> Will have a look into it.
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> gen fastgen 100 10 /tmp/randomgraph
1
>> > >>>> >> >> >>> pagerank /tmp/randomgraph /tmp/pageout
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> did work for me the last time
I profiled, maybe the
>> > >>>> partitioning
>> > >>>> >> >> doesn't
>> > >>>> >> >> >>> partition correctly with the
input or something else.
>> > >>>> >> >> >>>
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> 2013/2/28 Edward J. Yoon <edwardyoon@apache.org>
>> > >>>> >> >> >>>
>> > >>>> >> >> >>> Fastgen input seems not work
for graph examples.
>> > >>>> >> >> >>>>
>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> bin/hama
>> > jar
>> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
gen
>> > fastgen
>> > >>>> 100 10
>> > >>>> >> >> >>>> /tmp/randomgraph 2
>> > >>>> >> >> >>>> 13/02/28 10:32:02 WARN util.NativeCodeLoader:
Unable to
>> > load
>> > >>>> >> >> >>>> native-hadoop library for
your platform... using
>> > builtin-java
>> > >>>> >> classes
>> > >>>> >> >> >>>> where applicable
>> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.BSPJobClient:
Running job:
>> > >>>> >> >> job_localrunner_0001
>> > >>>> >> >> >>>> 13/02/28 10:32:03 INFO bsp.LocalBSPRunner:
Setting up a
>> new
>> > >>>> barrier
>> > >>>> >> >> for 2
>> > >>>> >> >> >>>> tasks!
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
Current
>> supersteps
>> > >>>> >> number: 0
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
The total number
>> > of
>> > >>>> >> >> supersteps: 0
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
Counters: 3
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
    SUPERSTEPS=0
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > LAUNCHED_TASKS=2
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:06 INFO bsp.BSPJobClient:
>> > >>>> >> TASK_OUTPUT_RECORDS=100
>> > >>>> >> >> >>>> Job Finished in 3.212 seconds
>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> bin/hama
>> > jar
>> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT
>> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT-javadoc.jar
>> > >>>> >> >> >>>> hama-examples-0.7.0-SNAPSHOT.jar
>> > >>>> >> >> >>>> edward@edward-virtualBox:~/workspace/hama-trunk$
>> bin/hama
>> > jar
>> > >>>> >> >> >>>> examples/target/hama-examples-0.7.0-SNAPSHOT.jar
pagerank
>> > >>>> >> >> >>>> /tmp/randomgraph /tmp/pageour
>> > >>>> >> >> >>>> 13/02/28 10:32:29 WARN util.NativeCodeLoader:
Unable to
>> > load
>> > >>>> >> >> >>>> native-hadoop library for
your platform... using
>> > builtin-java
>> > >>>> >> classes
>> > >>>> >> >> >>>> where applicable
>> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
Total input
>> > paths
>> > >>>> to
>> > >>>> >> >> process
>> > >>>> >> >> >>>> : 2
>> > >>>> >> >> >>>> 13/02/28 10:32:29 INFO bsp.FileInputFormat:
Total input
>> > paths
>> > >>>> to
>> > >>>> >> >> process
>> > >>>> >> >> >>>> : 2
>> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.BSPJobClient:
Running job:
>> > >>>> >> >> job_localrunner_0001
>> > >>>> >> >> >>>> 13/02/28 10:32:30 INFO bsp.LocalBSPRunner:
Setting up a
>> new
>> > >>>> barrier
>> > >>>> >> >> for 2
>> > >>>> >> >> >>>> tasks!
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
Current
>> supersteps
>> > >>>> >> number: 1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
The total number
>> > of
>> > >>>> >> >> supersteps: 1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
Counters: 6
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.JobInProgress$JobCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
    SUPERSTEPS=1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > LAUNCHED_TASKS=2
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> >> >> >>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > SUPERSTEP_SUM=4
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> IO_BYTES_READ=4332
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> TIME_IN_SYNC_MS=14
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
>> > >>>> TASK_INPUT_RECORDS=100
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.FileInputFormat:
Total input
>> > paths
>> > >>>> to
>> > >>>> >> >> process
>> > >>>> >> >> >>>> : 2
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.BSPJobClient:
Running job:
>> > >>>> >> >> job_localrunner_0001
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO bsp.LocalBSPRunner:
Setting up a
>> new
>> > >>>> barrier
>> > >>>> >> >> for 2
>> > >>>> >> >> >>>> tasks!
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner:
50 vertices
>> > are
>> > >>>> loaded
>> > >>>> >> >> into
>> > >>>> >> >> >>>> local:1
>> > >>>> >> >> >>>> 13/02/28 10:32:33 INFO graph.GraphJobRunner:
50 vertices
>> > are
>> > >>>> loaded
>> > >>>> >> >> into
>> > >>>> >> >> >>>> local:0
>> > >>>> >> >> >>>> 13/02/28 10:32:33 ERROR
bsp.LocalBSPRunner: Exception
>> > during
>> > >>>> BSP
>> > >>>> >> >> >>>> execution!
>> > >>>> >> >> >>>> java.lang.IllegalArgumentException:
Messages must never
>> be
>> > >>>> behind
>> > >>>> >> the
>> > >>>> >> >> >>>> vertex in ID! Current Message
ID: 1 vs. 50
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >>
>> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:279)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:225)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:129)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> > org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >>>> >> >> >>>>        at
>> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>>
>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >>>> >> >> >>>>        at
>> > >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> > >>>> >> >> >>>>        at
>> > >>>> >> >> >>>>
>> > >>>> >> >>
>> > >>>> >>
>> > >>>>
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > >>>> >> >> >>>>        at java.lang.Thread.run(Thread.java:722)
>> > >>>> >> >> >>>>
>> > >>>> >> >> >>>>
>> > >>>> >> >> >>>> --
>> > >>>> >> >> >>>> Best Regards, Edward J.
Yoon
>> > >>>> >> >> >>>> @eddieyoon
>> > >>>> >> >> >>>
>> > >>>> >> >> >>>
>> > >>>> >> >>
>> > >>>> >> >>
>> > >>>> >> >>
>> > >>>> >> >> --
>> > >>>> >> >> Best Regards, Edward J. Yoon
>> > >>>> >> >> @eddieyoon
>> > >>>> >> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> --
>> > >>>> >> Best Regards, Edward J. Yoon
>> > >>>> >> @eddieyoon
>> > >>>> >>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Best Regards, Edward J. Yoon
>> > >>>> @eddieyoon
>> > >>>>
>> > >>>
>> > >>>
>> > >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message