mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Running Taste Web example without the webserver
Date Thu, 23 Jul 2009 07:43:14 GMT
Aurora did you see my last reply on the list?

On Wed, Jul 22, 2009 at 9:29 AM, Sean Owen<srowen@gmail.com> wrote:
> Yes, there are a few components here -- a few different purposes. All
> build around the core library which isn't specific to Hadoop or an
> HTTP server, but you've seen some of the components that adapt the
> core to this contexts. There are also components that can evaluate or
> load test the code.
>
> The only piece you are interested in then is really the Hadoop
> integration -- see org.apache.mahout.cf.taste.hadoop. There you will
> find RecommenderJob which should be able to launch a
> pseudo-distributed recommender job. I say pseudo since these
> algorithms are not in general distributable, but, one can of course
> run n instances of a recommender to compute 1/nth of all
> recommendations each. That is nice, though it means, say, the amount
> of RAM the jobs consume is still limited by the size of each machine.
>
> I just recently rewrote this package to be compatible with Hadoop
> 0.20's new APIs. I do not know that it works, and, have some reason to
> believe there are bugs in the API that will prevent it from working.
> So this piece is currently in flux.
>
> If you want to experiment and be a guinea pig for this latest
> revision, I can provide close support to work through the bugs on both
> sides. Or we can talk about your requirements more a bit to figure out
> whether this is feasible, what the best algorithm is, whether you need
> Hadoop?
>
> How big is 'massive'? could you reveal how many users, items, and
> user-item preferences to an order of magnitude? what is generally the
> nature of the input data you have, and you want recommendations out?
>
> On Wed, Jul 22, 2009 at 12:12 AM, Aurora
> Skarra-Gallagher<aurora@yahoo-inc.com> wrote:
>> Hi,
>>
>> I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our
goal was to take a recommendation framework and use our own recommendation algorithm within
it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid.
I thought that Taste was the right fit for the job. I'm not interested in the HTTP service.
I'm interested in the recommendation framework, particularly from a back-end batch perspective.
Does that help clarify? Thanks for helping me sort through this.
>>
>> -Aurora
>>
>>
>> On 7/21/09 3:02 PM, "Sean Owen" <srowen@gmail.com> wrote:
>>
>> Hmm, lots going on here, it's confusing.
>>
>> Are you trying to run this on Hadoop intentionally? because the web
>> app example is not intended to run on Hadoop. It's a component
>> intended to serve recommendations over HTTP in real time. It also
>> appears you are running an evaluation rather than a web app serving
>> requests. I realize you're trying to run this without Jetty, but
>> that's kind of like trying to run a web app without a web server.
>>
>> I think you'd have to clarify what you are trying to do, and then what
>> you are doing right now, to begin to assist.
>>
>> On Tue, Jul 21, 2009 at 9:20 PM, Aurora
>> Skarra-Gallagher<aurora@yahoo-inc.com> wrote:
>>> Hi,
>>>
>>> I'm trying to run the taste web example without using jetty. Our gateways aren't
meant to be used as webservers. By poking around, I found that the following command worked:
>>> hadoop --config ~/hod-clusters/test jar /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job
org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>>>
>>> The output is:
>>> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
>>> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning
evaluation using 0.9 of GroupLensDataModel
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
>>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
>>> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
>>> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
>>> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation
result: 0.7035965559003973
>>> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7035965559003973
>>>
>>> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm not
sure if this is the correct way to run this example. I have a few questions:
>>>
>>>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>>>  2.  What does the Evaluation result mean?
>>>  3.  Is it even running on HDFS?
>>>  4.  Is it a map-reduce job?
>>>
>>> Any pointers on how to run this as a standalone job would be helpful.
>>>
>>> Thanks,
>>> Aurora
>>>
>>
>>
>

Mime
View raw message