Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of algoharyalex@gmail.com
 designates 209.85.216.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=F0sW1rvCb7t2M8dQfF+i9y3eJt1qnTpzneHZakUjktmRAdZMLE+74zkOd3H3sSRR/d
         bQ+DfzXx+e8Z9XlpjIZrH4pj35eMKJTro6mX9CkiYJdkhs0jP6ycGSYV3+OdkVJQFbCo
         G43UKCQQp1TIde8d0P8NSx1zORoLr6Hvdoe+M=
MIME-Version: 1.0
In-Reply-To: <AANLkTi=ZxWf48CiEOFDO7DZNMq5G2u10VTAw-5f2+AZT@mail.gmail.com>
References: <24931.12945.qm@web50301.mail.re2.yahoo.com>
 <531769.90099.qm@web113315.mail.gq1.yahoo.com>
 <AANLkTi=ZxWf48CiEOFDO7DZNMq5G2u10VTAw-5f2+AZT@mail.gmail.com>
From: ahmed algohary <algoharyalex@gmail.com>
Date: Sun, 22 Aug 2010 15:27:27 +0200
Message-ID: <AANLkTikmc5pE7P9Nm04kKzc11Ly2jLter8gQyHwsAWLq@mail.gmail.com>
Subject: Re: Calculate Term Co-occurrence Matrix
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=0016364ecd62689835048e697fe6

--0016364ecd62689835048e697fe6
Content-Type: text/plain; charset=ISO-8859-1

I think I got it.

In the CollectionIndexer class, I have added the co-occurrence score to the
index document:

 doc.add(new Field("score", collocation.getScore() + "",
                Field.Store.YES, Field.Index.NOT_ANALYZED));

then in the CollectionSearcher, the scores can be retrieved:

 d.get("score")

Is that correct ??

On Sun, Aug 22, 2010 at 2:47 PM, ahmed algohary <algoharyalex@gmail.com>wrote:

> Thanks! It is exactly what I need. But, isn't there a way to get the
> matching score ?
>
> for example, "damaged"  co-occurs with "shipment" with a probability = 0.4
> ??
>
>
> On Sun, Aug 22, 2010 at 5:35 AM, Ivan Provalov <iprovalo@yahoo.com> wrote:
>
>> Ahmed,
>>
>> FYI, I updated the term collocations package I mentioned earlier with a
>> few fixes and changes which will make it work for Lucene 3.0.2.  This may
>> help your task.
>>
>> See:
>> https://issues.apache.org/jira/browse/LUCENE-474
>>
>> Thanks,
>>
>> Ivan Provalov
>>
>>
>> --- On Sat, 8/21/10, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
>>
>> > From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
>> > Subject: Re: Calculate Term Co-occurrence Matrix
>> > To: java-user@lucene.apache.org
>> > Date: Saturday, August 21, 2010, 8:05 AM
>> > Ahmed,
>> >
>> > That's what that KPE (link in my previous email, below)
>> > will do for you.  It's
>> > not open source at this time, but that is exactly one of
>> > the things it does.  I
>> > think Mahout collocations stuff might work for you, too.
>> >
>> > Otis
>> > ----
>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> > Lucene ecosystem search :: http://search-lucene.com/
>> >
>> >
>> >
>> > ----- Original Message ----
>> > > From: ahmed algohary <algoharyalex@gmail.com>
>> > > To: java-user@lucene.apache.org
>> > > Sent: Sat, August 21, 2010 7:20:03 AM
>> > > Subject: Re: Calculate Term Co-occurrence Matrix
>> > >
>> > > Thanks for all your answers!
>> > >
>> > > it seems like I did not make my question  clear.
>> > I have a text corpus and I
>> > > need to determine the pairs of words that  occur
>> > together in many documents.
>> > > I need to do that to be able to measure the
>> > semantic proximity between
>> > > words. This method is expanded
>> > > here<http://forums.searchenginewatch.com/showthread.php?t=48>.
>> > > I hope to  find some code that given a text
>> > corpus, generate all the words
>> > > pairs with  their probability of occurring
>> > together.
>> > >
>> > >
>> > > On Sat, Aug 21, 2010 at 1:46  AM, Otis
>> > Gospodnetic <
>> > > otis_gospodnetic@yahoo.com>
>> > wrote:
>> > >
>> > > > There is also a non-Mahout Key Phrase Extractor
>> > for  Collocations, SIPs, and
>> > > > a
>> > > > few other things:
>> > > > http://sematext.com/products/key-phrase-extractor/index.html
>> > > >
>> > > >  One of the demos that uses news data is at
>> > > > http://sematext.com/demo/kpe/index.html
>> > > >
>> > > > Otis
>> > > >  ----
>> > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
>> > Nutch
>> > > > Lucene ecosystem  search :: http://search-lucene.com/
>> > > >
>> > > >
>> > > >
>> > > > ----- Original  Message ----
>> > > > > From: Grant Ingersoll <gsingers@apache.org>
>> > > > > To: java-user@lucene.apache.org
>> > > >  > Sent: Fri, August 20, 2010 8:52:17 AM
>> > > > > Subject: Re: Calculate  Term
>> > Co-occurrence Matrix
>> > > > >
>> > > > > You might also be interested  in
>> > Mahout's collocations package:
>> > > > >http://cwiki.apache.org/confluence/display/MAHOUT/Collocations
>> > > >  >
>> > > > > -Grant
>> > > > > On  Aug 19, 2010, at 11:39 AM,
>> > ahmed  algohary wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > >  > > I need to know if there is a
>> > Lucene plug-in or a Lucene-based  API  for
>> > > > > > calculating the term co-occurrence
>> > matrix for a  given text  corpus.
>> > > > > >
>> > > > > > Thanks!
>> > > >  > >
>> > > > > > --
>> > > > > >  Ahmed
>> > > >  >
>> > > > > --------------------------
>> > > > > Grant  Ingersoll
>> > > > > http://www.lucidimagination.com/
>> > > > >
>> > > > > Search the  Lucene ecosystem
>> > using  Solr/Lucene:
>> > > > >http://www.lucidimagination.com/search
>> > > > >
>> > > > >
>> > > >  >
>> > ---------------------------------------------------------------------
>> > > >  > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > >  > For  additional commands, e-mail:
>> > java-user-help@lucene.apache.org
>> > > >  >
>> > > > >
>> > > >
>> > > >
>> > ---------------------------------------------------------------------
>> > > > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > >  For additional commands, e-mail: java-user-help@lucene.apache.org
>> > > >
>> > > >
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

--0016364ecd62689835048e697fe6--