lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Indexing/Querying Annotations and Fields for a document
Date Mon, 17 Mar 2008 16:33:25 GMT
I think there are a couple of ways you can approach this, although I  
have never used GATE.

If these annotations are marked in line in your content, then you can  
either preprocess the files to have them separately and index as you  
normally would, or you can use the relatively new TeeTokenFilter and  
SinkTokenizer to extract them as you go for use in other fields.  I  
have done this successfully for some apps that I have worked on and I  
think it works quite nice and beats preprocessing IMO.  Essentially,  
you set up a TeeTokenFilter that recognizes your Person and then set  
that token aside in the Sink.  Then, when you construct the Person  
field, you use the SinkTokenizer.


On Mar 17, 2008, at 8:54 AM, lucene-seme1 s wrote:

> Hello,
> I am a newbie here and still experimenting with Lucene. I have  
> annotations
> and features generated by GATE for many documents and would like to  
> index
> the original content of the documents in addition to the generated
> annotations. The annotations are in the form of [<Person> John </ 
> Person>
> loves fishing]. I would like to be able to search using the Person
> attribute.
> Any hint or suggestion is highly appreciated
> regards,
> JK

Grant Ingersoll
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message