Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mahout-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of srowen@gmail.com designates
 209.85.219.222 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=vL8V9MoruUzQajYCFtW1AEwlVpzuFc/Z6fJgv0Z1RwT0WtDq9h07TorcLK2YQDGyJA
         oBOuWUPbFMVSOS43UN/imj2YunNSSV+LY02Xa/+62jWw0AUBxdaH7oW6aTbfDISXVfX5
         1brYVPJzmcXrqTyDejBTGpvre3PACfXCuUE8w=
MIME-Version: 1.0
In-Reply-To: <001601ca26fa$ae69a8c0$0b3cfa40$@unisa.it>
References: <000a01ca2654$e9d7c1e0$bd8745a0$@unisa.it>
	 <e2e029610908260715n2945fb7ckd7f0ea63cd1b7564@mail.gmail.com>
	 <000801ca265c$51a44fd0$f4ecef70$@unisa.it>
	 <e2e029610908260757u5b10e120j72676d97d508705d@mail.gmail.com>
	 <000001ca26ed$f2c175f0$d84461d0$@unisa.it>
	 <e2e029610908270124i5dc7d2e3td4e0ec5cd750dbc3@mail.gmail.com>
	 <001201ca26f8$264c25f0$72e471d0$@unisa.it>
	 <e2e029610908270229s6d4b4dccm416730f3df47ceb2@mail.gmail.com>
	 <001601ca26fa$ae69a8c0$0b3cfa40$@unisa.it>
Date: Thu, 27 Aug 2009 10:58:36 +0100
Message-ID: <e2e029610908270258q3c49401cw43ab3d826b9d74d5@mail.gmail.com>
Subject: Re: R: R: R: R: Problems with evaluator.
From: Sean Owen <srowen@gmail.com>
To: mahout-user@lucene.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Right, in that case you can pass any threshold less than 1. This will
cause the framework to pick the first "at" items it encounters as
relevant. Since there is no way of differentiating the items, this is
as good as anything. So if you are evaluating precision at 3, it will
pick some 3 items as relevant. Sounds like you already tried passing
Double.NEGATIVE_INFINITY, which is fine. If it is still not giving
results, my other guesses still stand -- low evaluation percentage?
too few preferences per user? I think you need to go in with a
debugger; it's hard to guess more from here. I can tell you for what
it's worth that I am using the same evaluator (in the main Java code
of course) to evaluate results in a similar data set and it appears to
work fine for me. There could be some issue in the port, or perhaps
there was a bug fix since the port happened? I don't remember a bug
fix here in a while but could be missing something.

On Thu, Aug 27, 2009 at 10:42 AM, Claudia Grieco<grieco@crmpa.unisa.it> wro=
te:
> But since I'm using boolean preference data, all the preferences are rele=
vant (score 1) so they are all erased from the training data:
>
> for (Preference pref : prefs) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (pref.getValue() >=3D theRelevanceTh=
reshold) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0relevantItems.add(pref.getItem()=
);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
> -----Messaggio originale-----
> Da: Sean Owen [mailto:srowen@gmail.com]
> Inviato: gioved=C3=AC 27 agosto 2009 11.29
> A: mahout-user@lucene.apache.org
> Oggetto: Re: R: R: R: Problems with evaluator.
>
> Yes, that is correct. The framework splits the user's preferences into
> "relevant" and "not relevant" items. It then takes away the relevant
> items, and leaves the non-relevant items in the training data. Then,
> it sees how many of those relevant items are recommended back to the
> user, to compute precision and recall.
>
> On Thu, Aug 27, 2009 at 10:24 AM, Claudia Grieco<grieco@crmpa.unisa.it> w=
rote:
>> Notice this part:
>> for (Preference pref : prefs2) {
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!relevantItems.contains(pref.getItem())) =
{
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0trainingPrefs.add(pref);
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0}}
>> It adds a preference only if it's NOT in the relevant items
>
>