mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Razon, Oren" <>
Subject Mahout beginner questions...
Date Thu, 22 Mar 2012 11:35:10 GMT
As a data mining developer who need to build a recommender engine POC (Proof Of Concept) to
support several future use cases, I've found Mahout framework as an appealing place to start
with. But as I'm new to Mahout and Hadoop in general I've a couple of questions...

1.      In "Mahout in action" under section 3.2.5 (Database-based data) it says: "...Several
classes in Mahout's recommender implementation will attempt to push computations into the
database for performance...". I've looked in the documents and inside the code itself, but
didn't found anywhere a reference to what are those calculations that are pushed into the
DB. Could you please explain what could be done inside the DB?
2.      My future use will include use cases with small-medium data volumes (where I guess
the non-distributed algorithms will do the job), but also use cases that include huge amounts
of data (over 500,000,000 ratings). From my understanding this is where the distributed code
should be come handy. My question here is, because I will need to use both distributed &
non-distributed how could I build a good design here?
      Should I build two different solutions on different machines? Could I do part of the
job distributed (for example similarity calculation) and the output will be used for the non-distributed
code? Is it a BKM? Also if I deploy entire mahout code on an Hadoop environment, what does
it mean for the non-distributed code, will it all run as a different java process on the name
3.      As for now, beside of the Hadoop cluster we are building we have some strong SQL machines
(Netezza appliance) that can handle big (structure) data and include good integration with
3'rd party analytics providers or developing on java platform but don't include such reach
recommender framework like Mahout. I'm trying to understand how could I utilize both solutions
(Netezza & Mahout) to handle big data recommender system use cases. Thought maybe to move
data into Netezza, do there all data manipulation and transformation, and in the end to prepare
a file that contain the classic data model structure needed for Mahout. But could you think
on better solution \ architecture? Maybe keeping the data only inside Netezza and extracting
it to Mahout using JDBC when needed? I will be glad to hear your ideas :)


Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message