lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <wolfgang.hosc...@mac.com>
Subject Re: [jira] Commented: (LUCENE-794) Beginnings of a span based highlighter
Date Tue, 06 Feb 2007 07:38:35 GMT
>
>>> I need to read the TokenStream at least twice
>>> I used the horribly hackey but quick-for-me method of adding a  
>>> method to MemoryIndex that accepts a List of Tokens. Any ideas?
>
> I'm not sure about modifying MemoryIndex. It should be easy enough  
> to create a subclass of TokenStream - ("CachedTokenStream"  
> perhaps?) which takes a real TokenStream in it's constructor and  
> delegates all "next" calls to it (and also records them in a List)  
> for the the first use. This can then be "rewound" and re-used to  
> run through the same set of tokens held in the list  from the first  
> run.
>

Yes, as Marks points out this can be done without API change via the  
existing MemoryIndex.addField(String fieldName, TokenStream stream)

The TokenStream could be constructed along similar lines as done in  
MemoryIndex.keywordTokenStream(Collection) or perhaps along similar  
lines as in  
org.apache.lucene.index.memory.AnalyzerUtil.getTokenCachingAnalyzer 
(Analyzer)

If needed, an IndexReader can be created from a MemoryIndex via  
MemoryIndex.createSearcher().getIndexReader(), again without API change.

Wolfgang.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message