I assumed the problem was count the number of people visiting Moscow after
London without considering iany intermediate stops. This leads to a data
structure which is easy to combine. The structure you propose adds more
information and is difficult to combine. I doubt it could handle a billion
people and recommend trying with a hundred people visiting 5 out of 20
destinations in random order to see how bad it is getting.
My schema can handle billions of combinations assuming only that the total
destinations in any node can be handled  i.e. a billion people can visit
any of a thousand cities in random order and worst case I need a thousand
cities and a thousand counts  now I doubt that the schema you propose with
added order information will scale to those levels
On Tue, Jul 19, 2011 at 10:39 AM, Em <mailformailinglists@yahoo.de> wrote:
> Thanks!
>
> So you invert the data and than walk through each inverted result.
> Good point!
> What do you think about prefixing each cityname with the index in the
> list?
>
> This way you can say:
> London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
> 3_Berlin:1...
>
> From this list you can see that people are likely to visit moscow right
> after london at their first or second journey. This would maintain a
> strong order (whether that's good or bad depends on a realworldscenario).
>
> Since your ideas gave me a good startingpoint for realizing this job
> (I'll practice it), we can make the problem more heavyweight, if you like?
>
> What happens to records that are too big to be processable by one node?
> Let's say from my above example of a stronglyordered list one gets a
> billion combinations  way too much for one node (we assume that).
> What possibilities does Hadoop offer to deal with such things?
>
> Regards and many thanks for the insights,
> Em
>
>
> Am 19.07.2011 19:15, schrieb Steve Lewis:
> > Assume Joe visits Washington, London, Paris and Moscow
> >
> > You start with records like
> > Joe:Washington:20Jan2011
> > Joe:London:14Feb2011
> > Joe:Paris :9Mar2011
> >
> > You want
> > Joe: Washington, London, Paris and Moscow
> >
> > For the next step the person is irrelevant
> > you want
> >
> >
> > Washington: London:1, Paris:1 ,Moscow:1
> > London: , Paris:1 Moscow:1
> > Paris: Moscow:1
> > The first say after a visit to Washington there was one visit to London,
> > one to Paris and one to Moscow
> >
> >
> > This can be combined with the one from Joe
> >
> >
> > Now suppose Bill visits London and Moscow
> > So he generates
> > London: Moscow:1
> >
> > This can be combined with the one from Joe saying London: , Paris:1 and
> > Moscow:1
> > to give
> >
> > London: , Paris:1 and Moscow:2
> >
> > Now suppose Sue visits London and Riga and Paris
> > So she generates
> > London: , Paris:1,Riga 1
> >
> > This can be combined with London: , Paris:1 and Moscow:2 to give
> >
> > London: , Paris:2 and Moscow:2,Riga 1
> >
> > Note I can keep places in alphabetical order in the result
> >
> >
> >
> > On Tue, Jul 19, 2011 at 9:53 AM, Em <mailformailinglists@yahoo.de
> > <mailto:mailformailinglists@yahoo.de>> wrote:
> >
> > Hi Steven,
> >
> > thanks for your response! For the ease of use we can make those
> > assumptions you made  maybe this makes it much easier to help. Those
> > little extras are something for after solving the "easy" version of
> the
> > task. :)
> >
> > What do you mean with the following?
> >
> > > The second job takes Person : list of places and return for each
> place
> > > in the list consructs
> > > place : 1  place after P : 1  next place : 1 ...
> >
> > You mean something like that?
> >
> > Washington DC:1
> > New York after Washington DC:1
> > Miami after New York:1
> >
> > I do not see the benefit for the result I like to get?
> >
> > The endresult should be something like that:
> > Washington DC => New York, Miami, Los Angeles
> > New York => Chicago, Seattle, San Francisco
> >
> > The point is, that one can see that persons that visited Washington
> DC
> > are likely to visit New York as the next place, Miami as the second
> and
> > L.A. as the third.
> > However, if I choose New York as my starting point, I can see that
> > persons that start their journey in New York (and maybe weren't in DC
> > before) are likely to visit Chicago, Seattle and San Francisco. Maybe
> > Los Angeles comes at the 10th position.
> >
> > Regards,
> > Em
> >
> >
> >
> >
> > 
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 2063841340 (cell)
> > Skype lordjoe_com
> >
> >
>

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
2063841340 (cell)
Skype lordjoe_com
