lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Get term id from dictionary
Date Wed, 31 Oct 2007 13:19:31 GMT
You can check out the file format of Lucene's term dictionary here: 
http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary

That might give you some insight.

Lucene does not keep id's for terms that I can tell though...just for 
documents...and then the id is really just an offset. Because you find 
the term you want, and then an id/offset to get to the doc that contains 
it, I don't see there being a mechanism for anything like the reverse.

You can access the Dictionary with: 
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/TermEnum.html

So you might count how many times you call next() to reach each term in 
your doc to get an id. Would be pretty slow though. Others might have 
good ideas for this though.

- Mark

Ilias Flaounas wrote:
> I want to have IDs for the terms (words) not the documents!
> Also, I need the same ID for a word if it appears in more than one documents.
>
> Example:
> Doc1: The sea is blue
> Doc2: Sky is blue
>
> For these two docs the dictionary would be [the]->1 [sea]->2 [is]->3
> [blue]->4 [sky]->5
>
> So I want to represent these docs by word-ids like this:
> Doc1: 1 2 3 4
> Doc2: 5 3 4
>
> Is there a way to use Lucene for this? I mean Lucene stores an
> internal dictionary. How can I access it?
>
> Thank you,
> Ilias
>
>
> On 10/31/07, Mark Miller <markrmiller@gmail.com> wrote:
>   
>> The id does change. You need to index your own "id" field with the document.
>>
>>
>> Ilias Flaounas wrote:
>>     
>>> Dear experts,
>>>
>>> I need to store and index a string of text into Lucene, and later I
>>> want to get the Id of each term inside this string. Is it possible?
>>> How can I do that?
>>>
>>> I want a unique association, term (in my case a word) -> Id. I know,
>>> that If I delete a document, the dictionary changes. Does the term id
>>> change?
>>>
>>>
>>> Thanks a lot
>>> Ilias
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message