Ted,
Thank you for the tip.
>
> rootLLR = signum(k11/k1*  k21/k2*) * sqrt(LLR)
>
I didn't get what k1* and k2* are. I used (k11+k12) and (k21+k22) in
the denominator. That gives correct result.
shashi
On Wed, Jan 13, 2010 at 12:50 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Raw LLR has a large value whenever there is an anomaly. In this case, term2
> is rare in the cluster and common outside and is thus an anomaly.
>
> One thing that I do is to use a variant of the LLR score:
>
> rootLLR = signum(k11/k1*  k21/k2*) * sqrt(LLR)
>
> This score has two advantages over the basic LLR:
>
> a) it is positive where k11 is bigger than expected, negative where it is
> lower. This resolves your current problem.
>
> b) if there is no difference it is asymptotically normally distributed.
> This allows people to talk about "number of standard deviations" which is a
> more common frame of reference than the chi^2 distribution.
>
>
> On Tue, Jan 12, 2010 at 4:49 AM, Shashikant Kore <shashikant@gmail.com>wrote:
>
>> As I can see Term1 is rarer outside the cluster, but common in the
>> cluster (relatively speaking.) But, when I calculate LLR scores,
>> Term1's score (3569) is lower than that of Term2 (3622). This looks
>> counterintuitive to me. Is it the case that LLR score is higher if
>> term is common outside the cluster and rare inside? Can this be
>> "fixed"?
>>
>
>
>
> 
> Ted Dunning, CTO
> DeepDyve
>
