lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Catalin Mititelu <catalinmitit...@yahoo.com>
Subject Re: get terms by positions
Date Tue, 03 Oct 2006 06:12:27 GMT
Hi,
I have the same problem.
This is useful when you try to extract the contexts (terms before and after) of a certain
term (for example).
I found a solution but it performs badly: when you try to retrieve those contexts you have
to re-tokenize the documents containing the given term (i.e. "soccer") and you have to keep
the before and after tokens using TokenStream and the current position. Re-tokenizing could
be ok on small files, but on large files it induces a bad performance of the applications.
Any different solution is welcome.

Catalin.

----- Original Message ----
From: NicolasLalevée <nicolas.lalevee@anyware-tech.com>
To: java-user@lucene.apache.org
Sent: Tuesday, October 3, 2006 1:04:20 AM
Subject: Re: get terms by positions

Le Lundi 02 Octobre 2006 23:06, Renzo Scheffer a écrit :
> Hi,
>
>
>
> can anybody be so kind to tell me if it is possible to search a Term by its
> position?
>
>
>
> I search a term (for excample "soccer") and get back the DocId's and
> positions as follows:
>
>
>
>
>
> TermPositions termPos = reader.termPositions(new
> Term("contents","soccer"));
>
> while(termPos.next()){
>
> int freq = termPos.freq();
>
> for(int i=0; i<freq; i++){
>
>
>
>       int docNumber = termPos.doc();
>
>       int position = termPos.nextPosition();
>
> System.out.println("DocId: "+docNumber+"; Pos:"+position);
>
> }
>
>
>
>
>
>
>
> Output:
>
>
>
> DocId: 0; Pos: 1
>
> DocId: 0; Pos: 4
>
> DocId: 0; Pos: 7
>
> DocId: 1; Pos: 3
>
> DocId: 1; Pos: 7
>
>
>
> Now I try to get back terms, one position before/after "soccer". I
> considered to take the
>
> Position and increase or decrease it. But I can't find a way to get back a
> term, according to the given Position.
>
> Can anybody help me?
>

I think this is a non-sense to try to find a term. In Lucene, you search with 
a term, you are not trying to get some. Basically, in Lucene, you have a list 
of term pointing on documents, not the reverse.

Maybe if you explain why you are trying to do that, we can find a better way 
to do it.

Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org








Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message