It is a little unclear what you start with and where you want to end up.
Let us assume that you have a collection of triplets of
person : place : time
we might imagine this information stored on a line of text.
It somewhat simplifies the problem to assume that the number of places
visited by one person is small enough to keep in memory.
Where you want to end up is something like this you have a place
Place: number of visitors  place1 : number  place2 number ...
where place2 is a place someone visited after visiting place 1 and number of
the number of people visiting place 1 after visiting place 2.
Once again if simplifies the problem if we assume that the number of places
(and the structure described above) can be kept in memory. It is not
necessarily to assume that the number of visits and the number of people
cannot grow very large.
I would use two Hadoop jobs. The first passes the person: place : time trio
with the person as the key and lets the reducer construct a structure with
person : List of places in the order visited.
The second job takes Person : list of places and return for each place in
the list consructs
place : 1  place after P : 1  next place : 1 ...
for every place in the list make and pass one of these 
Then you can write a simple routine to combine these structures  probably
write a combiner and follow the word count model
On Mon, Jul 18, 2011 at 10:03 AM, Em <mailformailinglists@yahoo.de> wrote:
> Hello list,
>
> as a newbie I got a tricky usecase in mind which I want to implement
> with Hadoop to train my skillz. There is no real scenario behind that,
> so I can extend or shrink the problem to the extent I like.
>
> I create random lists of personIDs and places plus a timevalue.
>
> The result of my mapreduceoperations should be something like that:
> The key is a place and the value is a list of places that were visited
> by persons after they visited the keyplace.
> Additionally the value should be sorted in a way were I use some
> time/countbiased metric. This way the valuelist should reflect the
> place which was the most popular i.e. secondstation on a tour.
>
> I think this is a complex almost realworldscenario.
>
> In pseudocode it will be something like this:
> for every place p
> for every person m that visited p
> select list l of all the places that m visited after p
> write a keyvaluepair p=>l to disc and l is in order of the visits
>
> for every key k in the list of keyvaluepairs
> get the value list of places v for k 
> create another keyvaluepair pv where the key is the place and
> the value is its index in v (for a place p in v)
>
> for every k
> get all pv
> for every pv aggregate the keyvaluepairs by key and sum up
> the index i for every place p so that it becomes the kvpair opv
> sort opv in ascending order by its value
>
> The result would be what I wanted, no?
>
> It looks like I need multiple MRphases, however I do not even know how
> to start.
>
> My first guess is: Create a MRJob where I invert my list so that I got
> a place as the key and as value all persons that visited it.
> The next phase needs to iterate over the value's persons and join with
> the original data to get an idea of when this person visited this place
> and what places came next.
> And now the problems arise:
>  First: What happens to places that are so popular that the number of
> persons that visited it is so large, that I can not pass the whole
> KVpair to a single node to iterate over it?
>  Second: I need to rejoin the original data. Without a database this
> would be extremely slow, wouldn't it?
>
> I hope that you guys can give me some ideas and input to make my first
> serious steps in Hadoopland.
>
> Regards,
> Em
>

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
2063841340 (cell)
Skype lordjoe_com
