hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject Re: How would you translate this into MapReduce?
Date Tue, 19 Jul 2011 19:40:35 GMT
If the size of a record is too big to be processed by a node you probably
need to re-architect using a different
record which scales better and combines cleanly
You also need to ask at the start what data you need to retrieve and how you
intend to retrieve it-
at some point a database may start to look like a good solution although in
this case I might think about saying I can track the order of trips to - say
16 and using a comma delimited list for the counts

On Tue, Jul 19, 2011 at 11:14 AM, Em <mailformailinglists@yahoo.de> wrote:

> Of course it won't scale or at least not as good as your suggested
> model. Chances are good that my idea is not an option for a
> production-system and not as usefull as the less-complex variant. So you
> are right!
>
> The reason why I asked was to get an idea of what should be done, if a
> record is too big to be processable by a node.
>
> Regards,
> Em
>
> Am 19.07.2011 19:54, schrieb Steve Lewis:
> > I assumed the problem was count the number of people visiting Moscow
> > after London without considering iany intermediate stops. This leads to
> > a data structure which is easy to combine. The structure you propose
> > adds more information and is difficult to combine. I doubt it could
> > handle a billion people and  recommend trying with a hundred people
> > visiting 5 out of 20 destinations in random order to see how bad it is
> > getting.
> >
> > My schema can handle billions of combinations assuming only that the
> > total destinations in any node can be handled - i.e. a billion people
> > can visit any of a thousand cities in random order and worst case I need
> > a thousand cities and a thousand counts - now I doubt that the schema
> > you propose with added order information will scale to those levels
> >
> > On Tue, Jul 19, 2011 at 10:39 AM, Em <mailformailinglists@yahoo.de
> > <mailto:mailformailinglists@yahoo.de>> wrote:
> >
> >     Thanks!
> >
> >     So you invert the data and than walk through each inverted result.
> >     Good point!
> >     What do you think about prefixing each city-name with the index in
> >     the list?
> >
> >     This way you can say:
> >     London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
> >     3_Berlin:1...
> >
> >     >From this list you can see that people are likely to visit moscow
> right
> >     after london at their first or second journey. This would maintain a
> >     strong order (whether that's good or bad depends on a
> >     real-world-scenario).
> >
> >     Since your ideas gave me a good starting-point for realizing this job
> >     (I'll practice it), we can make the problem more heavy-weight, if
> >     you like?
> >
> >     What happens to records that are too big to be processable by one
> node?
> >     Let's say from my above example of a strongly-ordered list one gets a
> >     billion combinations - way too much for one node (we assume that).
> >     What possibilities does Hadoop offer to deal with such things?
> >
> >     Regards and many thanks for the insights,
> >     Em
> >
> >
> >     Am 19.07.2011 19:15, schrieb Steve Lewis:
> >     > Assume Joe visits Washington, London, Paris and Moscow
> >     >
> >     > You start with records like
> >     > Joe:Washington:20-Jan-2011
> >     > Joe:London:14-Feb2011
> >     > Joe:Paris :9-Mar-2011
> >     >
> >     > You want
> >     > Joe: Washington, London, Paris and Moscow
> >     >
> >     > For the next step the person is irrelevant
> >     > you want
> >     >
> >     >
> >     > Washington:  London:1, Paris:1 ,Moscow:1
> >     >  London: , Paris:1  Moscow:1
> >     >  Paris:   Moscow:1
> >     > The first say after a visit to Washington there was one visit to
> >     London,
> >     > one to Paris and one to Moscow
> >     >
> >     >
> >     > This can be combined with the one from Joe
> >     >
> >     >
> >     > Now suppose Bill visits London and Moscow
> >     > So he generates
> >     > London:    Moscow:1
> >     >
> >     > This can be combined with the one from Joe saying  London: ,
> >     Paris:1 and
> >     > Moscow:1
> >     >  to give
> >     >
> >     >  London: , Paris:1 and Moscow:2
> >     >
> >     > Now suppose Sue visits London and  Riga and Paris
> >     > So she generates
> >     > London: , Paris:1,Riga 1
> >     >
> >     > This can be combined with  London: , Paris:1 and Moscow:2 to give
> >     >
> >     > London: , Paris:2 and Moscow:2,Riga 1
> >     >
> >     > Note I can keep places in alphabetical order in the result
> >     >
> >     >
> >     >
> >     > On Tue, Jul 19, 2011 at 9:53 AM, Em <mailformailinglists@yahoo.de
> >     <mailto:mailformailinglists@yahoo.de>
> >     > <mailto:mailformailinglists@yahoo.de
> >     <mailto:mailformailinglists@yahoo.de>>> wrote:
> >     >
> >     >     Hi Steven,
> >     >
> >     >     thanks for your response! For the ease of use we can make those
> >     >     assumptions you made - maybe this makes it much easier to
> >     help. Those
> >     >     little extras are something for after solving the "easy"
> >     version of the
> >     >     task. :)
> >     >
> >     >     What do you mean with the following?
> >     >
> >     >     > The second job takes Person : list of places and return for
> >     each place
> >     >     > in the list consructs
> >     >     > place : 1 | place after P : 1 | next place : 1 ...
> >     >
> >     >     You mean something like that?
> >     >
> >     >     Washington DC:1
> >     >     New York after Washington DC:1
> >     >     Miami after New York:1
> >     >
> >     >     I do not see the benefit for the result I like to get?
> >     >
> >     >     The end-result should be something like that:
> >     >     Washington DC => New York, Miami, Los Angeles
> >     >     New York => Chicago, Seattle, San Francisco
> >     >
> >     >     The point is, that one can see that persons that visited
> >     Washington DC
> >     >     are likely to visit New York as the next place, Miami as the
> >     second and
> >     >     L.A. as the third.
> >     >     However, if I choose New York as my starting point, I can see
> that
> >     >     persons that start their journey in New York (and maybe
> >     weren't in DC
> >     >     before) are likely to visit Chicago, Seattle and San
> >     Francisco. Maybe
> >     >     Los Angeles comes at the 10th position.
> >     >
> >     >     Regards,
> >     >     Em
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Steven M. Lewis PhD
> >     > 4221 105th Ave NE
> >     > Kirkland, WA 98033
> >     > 206-384-1340 <tel:206-384-1340> (cell)
> >     > Skype lordjoe_com
> >     >
> >     >
> >
> >
> >
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >
> >
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Mime
View raw message