mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Mahout In Action
Date Fri, 23 Apr 2010 18:11:23 GMT
If you are making more changes do that, you are more than welcome to. Just
fix a convention. For example, in the clustering algorithms chapter, it was
points and clusters-[0-n] like you said. and in dirichlet it was state-n. So
it will be better if we stick to a single convention and the book will
follow(shouldn't be the other way around)

Robin

On Fri, Apr 23, 2010 at 11:30 PM, Jeff Eastman
<jdog@windwardsolutions.com>wrote:

> The APIs did not change but the clustered points directory changed from
> "points" to "clusteredPoints" and the various clusters directories changed
> from (e.g. canopies, clusters, clusters-n, canopies-n, state-n) to just
> clusters-n, where clusters-0 is used for the initial clusters needed for
> kmeans and is produced by canopy output by default.
>
>
> On 4/23/10 10:25 AM, Robin Anil wrote:
>
>> Its not aimed at 0.3 per say. Right now its evolving with the code. For.
>> eg.
>> the quality factor is something that will go in there. I keep updating the
>> code with the latest changes and so does Sean. There isnt much that got
>> affected by your latest commit though(it compiles). Though I haven't fully
>> tested the code with the dataset after the commit, something I plan to do
>> soon.
>>
>> Robin
>>
>> On Fri, Apr 23, 2010 at 9:51 PM, Jeff Eastman<jdog@windwardsolutions.com
>> >wrote:
>>
>>
>>
>>> I also wonder how much my recent clustering changes have affected the
>>> examples in the clustering sections. I know the book is currently aimed
>>> at
>>> Mahout 0.3 but users trying the examples with trunk may be frustrated by
>>> the
>>> recent changes in file naming. Do the examples exist in an unannotated
>>> version somewhere that I could get working again on trunk?
>>>
>>> On 4/23/10 9:10 AM, Sean Owen wrote:
>>>
>>>
>>>
>>>> Good eye, this was fixed in the manuscript a while ago.
>>>>
>>>> I will ping Manning to re-publish Chapters 1-6 since a lot of small
>>>> updates have happened since then.
>>>>
>>>> On Fri, Apr 23, 2010 at 4:53 PM, Jeff Eastman
>>>> <jdog@windwardsolutions.com>   wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> Section 4.5.1 says:
>>>>> "The third line shows how it is based on item-item similarities, not
>>>>> user-user similarities as before. The algorithms are similar, but not
>>>>> entirely symmetric. They do have notably different properties. For
>>>>> instance,
>>>>> the running time of an item-based recommender scales up as the number
>>>>> of
>>>>> items increases, whereas a user-based recommender’s running time goes
>>>>> up
>>>>> as
>>>>> the number of users increases.
>>>>>
>>>>> This suggests one reason that you might choose an item-based
>>>>> recommender:
>>>>> if
>>>>> the number of users is relatively low compared to the number of items,
>>>>> the
>>>>> performance advantage could be significant."
>>>>>
>>>>> Shouldn't the second paragraph be?
>>>>>
>>>>> "This suggests one reason that you might choose an item-based
>>>>> recommender:
>>>>> if the number of users is relatively *high* compared to the number of
>>>>> items,
>>>>> the performance advantage could be significant."
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message