hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Em <mailformailingli...@yahoo.de>
Subject Re: How would you translate this into MapReduce?
Date Tue, 19 Jul 2011 19:59:34 GMT
Interesting to see the upper bound for Hadoop.
However I guess this is a rare problem.

I'll try to implement what we discussed so far and train myself.

Regards,
Em

Am 19.07.2011 21:40, schrieb Steve Lewis:
> If the size of a record is too big to be processed by a node you
> probably need to re-architect using a different 
> record which scales better and combines cleanly 
> You also need to ask at the start what data you need to retrieve and how
> you intend to retrieve it- 
> at some point a database may start to look like a good solution although
> in this case I might think about saying I can track the order of trips
> to - say 16 and using a comma delimited list for the counts 
> 
> On Tue, Jul 19, 2011 at 11:14 AM, Em <mailformailinglists@yahoo.de
> <mailto:mailformailinglists@yahoo.de>> wrote:
> 
>     Of course it won't scale or at least not as good as your suggested
>     model. Chances are good that my idea is not an option for a
>     production-system and not as usefull as the less-complex variant. So you
>     are right!
> 
>     The reason why I asked was to get an idea of what should be done, if a
>     record is too big to be processable by a node.
> 
>     Regards,
>     Em
> 
>     Am 19.07.2011 19:54, schrieb Steve Lewis:
>     > I assumed the problem was count the number of people visiting Moscow
>     > after London without considering iany intermediate stops. This
>     leads to
>     > a data structure which is easy to combine. The structure you propose
>     > adds more information and is difficult to combine. I doubt it could
>     > handle a billion people and  recommend trying with a hundred people
>     > visiting 5 out of 20 destinations in random order to see how bad it is
>     > getting.
>     >
>     > My schema can handle billions of combinations assuming only that the
>     > total destinations in any node can be handled - i.e. a billion people
>     > can visit any of a thousand cities in random order and worst case
>     I need
>     > a thousand cities and a thousand counts - now I doubt that the schema
>     > you propose with added order information will scale to those levels
>     >
>     > On Tue, Jul 19, 2011 at 10:39 AM, Em <mailformailinglists@yahoo.de
>     <mailto:mailformailinglists@yahoo.de>
>     > <mailto:mailformailinglists@yahoo.de
>     <mailto:mailformailinglists@yahoo.de>>> wrote:
>     >
>     >     Thanks!
>     >
>     >     So you invert the data and than walk through each inverted result.
>     >     Good point!
>     >     What do you think about prefixing each city-name with the index in
>     >     the list?
>     >
>     >     This way you can say:
>     >     London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
>     >     3_Berlin:1...
>     >
>     >     >From this list you can see that people are likely to visit
>     moscow right
>     >     after london at their first or second journey. This would
>     maintain a
>     >     strong order (whether that's good or bad depends on a
>     >     real-world-scenario).
>     >
>     >     Since your ideas gave me a good starting-point for realizing
>     this job
>     >     (I'll practice it), we can make the problem more heavy-weight, if
>     >     you like?
>     >
>     >     What happens to records that are too big to be processable by
>     one node?
>     >     Let's say from my above example of a strongly-ordered list one
>     gets a
>     >     billion combinations - way too much for one node (we assume that).
>     >     What possibilities does Hadoop offer to deal with such things?
>     >
>     >     Regards and many thanks for the insights,
>     >     Em
>     >
>     >
>     >     Am 19.07.2011 19:15, schrieb Steve Lewis:
>     >     > Assume Joe visits Washington, London, Paris and Moscow
>     >     >
>     >     > You start with records like
>     >     > Joe:Washington:20-Jan-2011
>     >     > Joe:London:14-Feb2011
>     >     > Joe:Paris :9-Mar-2011
>     >     >
>     >     > You want
>     >     > Joe: Washington, London, Paris and Moscow
>     >     >
>     >     > For the next step the person is irrelevant
>     >     > you want
>     >     >
>     >     >
>     >     > Washington:  London:1, Paris:1 ,Moscow:1
>     >     >  London: , Paris:1  Moscow:1
>     >     >  Paris:   Moscow:1
>     >     > The first say after a visit to Washington there was one visit to
>     >     London,
>     >     > one to Paris and one to Moscow
>     >     >
>     >     >
>     >     > This can be combined with the one from Joe
>     >     >
>     >     >
>     >     > Now suppose Bill visits London and Moscow
>     >     > So he generates
>     >     > London:    Moscow:1
>     >     >
>     >     > This can be combined with the one from Joe saying  London: ,
>     >     Paris:1 and
>     >     > Moscow:1
>     >     >  to give
>     >     >
>     >     >  London: , Paris:1 and Moscow:2
>     >     >
>     >     > Now suppose Sue visits London and  Riga and Paris
>     >     > So she generates
>     >     > London: , Paris:1,Riga 1
>     >     >
>     >     > This can be combined with  London: , Paris:1 and Moscow:2 to
>     give
>     >     >
>     >     > London: , Paris:2 and Moscow:2,Riga 1
>     >     >
>     >     > Note I can keep places in alphabetical order in the result
>     >     >
>     >     >
>     >     >
>     >     > On Tue, Jul 19, 2011 at 9:53 AM, Em
>     <mailformailinglists@yahoo.de <mailto:mailformailinglists@yahoo.de>
>     >     <mailto:mailformailinglists@yahoo.de
>     <mailto:mailformailinglists@yahoo.de>>
>     >     > <mailto:mailformailinglists@yahoo.de
>     <mailto:mailformailinglists@yahoo.de>
>     >     <mailto:mailformailinglists@yahoo.de
>     <mailto:mailformailinglists@yahoo.de>>>> wrote:
>     >     >
>     >     >     Hi Steven,
>     >     >
>     >     >     thanks for your response! For the ease of use we can
>     make those
>     >     >     assumptions you made - maybe this makes it much easier to
>     >     help. Those
>     >     >     little extras are something for after solving the "easy"
>     >     version of the
>     >     >     task. :)
>     >     >
>     >     >     What do you mean with the following?
>     >     >
>     >     >     > The second job takes Person : list of places and
>     return for
>     >     each place
>     >     >     > in the list consructs
>     >     >     > place : 1 | place after P : 1 | next place : 1 ...
>     >     >
>     >     >     You mean something like that?
>     >     >
>     >     >     Washington DC:1
>     >     >     New York after Washington DC:1
>     >     >     Miami after New York:1
>     >     >
>     >     >     I do not see the benefit for the result I like to get?
>     >     >
>     >     >     The end-result should be something like that:
>     >     >     Washington DC => New York, Miami, Los Angeles
>     >     >     New York => Chicago, Seattle, San Francisco
>     >     >
>     >     >     The point is, that one can see that persons that visited
>     >     Washington DC
>     >     >     are likely to visit New York as the next place, Miami as the
>     >     second and
>     >     >     L.A. as the third.
>     >     >     However, if I choose New York as my starting point, I
>     can see that
>     >     >     persons that start their journey in New York (and maybe
>     >     weren't in DC
>     >     >     before) are likely to visit Chicago, Seattle and San
>     >     Francisco. Maybe
>     >     >     Los Angeles comes at the 10th position.
>     >     >
>     >     >     Regards,
>     >     >     Em
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > --
>     >     > Steven M. Lewis PhD
>     >     > 4221 105th Ave NE
>     >     > Kirkland, WA 98033
>     >     > 206-384-1340 <tel:206-384-1340> <tel:206-384-1340
>     <tel:206-384-1340>> (cell)
>     >     > Skype lordjoe_com
>     >     >
>     >     >
>     >
>     >
>     >
>     >
>     > --
>     > Steven M. Lewis PhD
>     > 4221 105th Ave NE
>     > Kirkland, WA 98033
>     > 206-384-1340 <tel:206-384-1340> (cell)
>     > Skype lordjoe_com
>     >
>     >
> 
> 
> 
> 
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
> 
> 

Mime
View raw message