lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Term Frequency for phrases
Date Fri, 08 Jan 2010 17:50:57 GMT
I'm not going to go into too much code level detail, however I'd index
the phrases using tri-gram shingles, and as uni-grams.  I think
this'll give you the results you're looking for.  You'll be able to
quickly recall the count of a given phrase aka tri-gram such as
"blue_shorts_burough"

On Fri, Jan 8, 2010 at 9:37 AM, hrishim <smarthrish@yahoo.co.in> wrote:
>
> @All : Elaborating the problem
>
> The phrase is being indexed as a single token ...
> I have a Gene tag in the xml document which is like
> <Gene>brain natriuretic peptide </Gene>
> This phrase is  present in the abstract text for the given document .
>
> Code is as :
>
> doc.add(new Field("Gene", geneName, Field.Store.YES,
> Field.Index.ANALYZED,Field.TermVector.YES));
>
> doc.add(new Field("Token", abstractText.toString().toLowerCase(),
> Field.Store.YES, Field.Index.ANALYZED,Field.TermVector.YES));
>
> When I retrieve all tokens as well as genes for a given doc and calculate
> the tf for each of these ,
> a null exception is thrown . Term = brain natriuretic peptide
>
> TermDocs termDocs = indexReader.termDocs(term);
> termDocs.next();
> double tf = termDocs.freq();
>
> Regards,
> Hrishi
>
>
> Grant Ingersoll-6 wrote:
>>
>> When do you detect that they are phrases?  During indexing or during
>> search?
>>
>> On Jan 8, 2010, at 5:16 AM, hrishim wrote:
>>
>>>
>>> Hi .
>>> I have phrases like brain natriuretic peptide indexed as a single token
>>> using Lucene.
>>> When I calculate the term frequency for the same  the count is 0 since
>>> the
>>> tokens from the text are indexed separately i.e. brain , natriuretic ,
>>> peptide.
>>> Is there a way to solve this problem and get the term frequency for the
>>> entire phrase ?
>>>
>>> Regards,
>>> Hrishi
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Term-Frequency-for-phrases-tp27073866p27073866.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Term-Frequency-for-phrases-tp27073866p27079648.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message