lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject Re: [Performance] Streaming main memory indexing of single strings
Date Sat, 16 Apr 2005 01:50:38 GMT
On Apr 15, 2005, at 5:55 PM, Erik Hatcher wrote:

> On Apr 15, 2005, at 8:18 PM, Wolfgang Hoschek wrote:
>> The main issue is to enable handling arbitrary queries (anything  
>> derived from o.a.l.search.Query). Yes, there'd be an additional  
>> method Analyzer parameter (support any analyzer). The use case does  
>> not require field names. One could internally use "content" or  
>> anything else for the default field name, which is the one to  
>> implicitly be queried...
>
> To be general-purpose, though, passing in the default field name would  
> be needed.  All of Lucene's built-in analyzers ignore it, but that is  
> not the case everywhere.  For example, NutchAnalyzer does different  
> things for different fields.
>
> So, all the text analyzed is in a given field... that means that  
> anything in the Query not associated with that field has no bearing on  
> whether the text matches or not, correct?

Right, it has no bearing. A query wouldn't specify any fields, it just  
uses the implicit default field name. Here's an example

(: An XQuery that finds all books authored by James that have something  
to do with "fish", sorted by relevance :)
declare namespace lucene = "java:nux.xom.xquery.XQueryUtil";
declare variable $query := "fish*~"; (: any arbitrary fuzzy lucene  
query goes here :)

for $book in /books/book[author="James" and lucene:match(string(.),  
$query) > 0.0]
let $score := lucene:match(string($book), $query)
order by $score descending
return (<score>{$score}</score>, $book)

some skeleton:

	private static final String FIELD_NAME = "content"; // or whatever -  
it doesn't matter

	public Query parseQuery(String expression) throws ParseException {
		QueryParser parser = new QueryParser(FIELD_NAME, analyzer);
		return parser.parse(expression);
	}

	private Document createDocument(String content) {
		Document doc = new Document();
		doc.add(Field.UnStored(FIELD_NAME, content));
		return doc;
	}

>
> There is still some missing trickery in my StringIndexReader - it does  
> not currently handle phrase queries as an implementation of  
> termPositions() is needed.
>
> Wolfgang - will you take what I've done the extra mile and implement  
> what's left (frequency and term position)?  I might not revisit this  
> very soon.

I'm not sure I'll be able to pull it off, but I'll see what I can do.  
If someone more competent would like to help out, let me know... Thanks  
for all the help anyway, Erik and co, it is greatly appreciated!
Wolfgang.



>
> 	Erik
>
>
>
>>
>> Wolfgang.
>>
>> On Apr 15, 2005, at 5:08 PM, Erik Hatcher wrote:
>>
>>> On Apr 15, 2005, at 6:15 PM, Wolfgang Hoschek wrote:
>>>> Cool! For my use case it would need to be able to handle arbitrary  
>>>> queries (previously parsed from a general lucene query string).  
>>>> Something like:
>>>>
>>>> 	float match(String Text, Query query)
>>>>
>>>> it's fine with me if it also works for
>>>>
>>>> 	float[] match(String[] texts, Query query) or
>>>> 	float(Document doc, Query query)
>>>>
>>>> but that isn't required by the use case.
>>>
>>> My implementation is nearly that.  The score is available as  
>>> hits.score(0).  You would also need an analyzer, I presume, passed  
>>> to your proposed match() method if you want the text broken into  
>>> terms.  My current implementation is passed a String[] where each  
>>> item is considered a term for the document.  match() would also need  
>>> a field name to be fully accurate - since the analyzer needs a field  
>>> name and terms used for searching need a field name.  The Query may  
>>> contain terms for any number of fields - how should that be handled?  
>>>  Should only a single field name be passed in and any terms request  
>>> for other fields be ignored?  Or should this utility morph to assume  
>>> any words in the text is in any field being asked of it?
>>>
>>> As for Doug's devil advocate questions - I really don't know what  
>>> I'd use it for personally (other than the "match this single string  
>>> against a bunch of queries"), I just thought it was clever that it  
>>> could be done.  Clever regex's could come close, but it'd be a lot  
>>> more effort than reusing good ol' QueryParser and this simplistic  
>>> IndexReader, along with an Analyzer.
>>>
>>> 	Erik
>>>
>>>>
>>>> Wolfgang.
>>>>
>>>>> I am intrigued by this and decided to mock a quick and dirty  
>>>>> example of such an IndexReader.  After a little trial-and-error I  
>>>>> got it working at least for TermQuery and WildcardQuery.  I've  
>>>>> pasted my code below as an example, but there is much room for  
>>>>> improvement, especially in terms of performance and also in  
>>>>> keeping track of term frequency, and also it would be nicer if it  
>>>>> handled the analysis internally.
>>>>>
>>>>> I think something like this would make a handy addition to our  
>>>>> contrib area at least.  I'd be happy to receive improvements to  
>>>>> this and then add it to a contrib subproject.
>>>>>
>>>>> Perhaps this would be a handy way to handle situations where users  
>>>>> have queries saved in a system and need to be alerted whenever a  
>>>>> new document arrives matching the saved queries?
>>>>>
>>>>> 	Erik
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Wolfgang Hoschek [mailto:whoschek@lbl.gov]
>>>>>> Sent: Thursday, April 14, 2005 4:04 PM
>>>>>> To: java-dev@lucene.apache.org
>>>>>> Subject: Re: [Performance] Streaming main memory indexing of  
>>>>>> single
>>>>>> strings
>>>>>>
>>>>>>
>>>>>> This seems to be a promising avenue worth exploring. My  
>>>>>> gutfeeling is
>>>>>> that this could easily be 10-100 times faster.
>>>>>>
>>>>>> The drawback is that it requires a fair amount of understanding of
>>>>>> intricate Lucene internals, pulling those pieces together and  
>>>>>> adapting
>>>>>> them as required for the seemingly simple "float match(String  
>>>>>> text,
>>>>>> Query query)".
>>>>>>
>>>>>> I might give it a shot but I'm not sure I'll be able to pull this
 
>>>>>> off!
>>>>>> Is there any similar code I could look at as a starting point?
>>>>>>
>>>>>> Wolfgang.
>>>>>>
>>>>>> On Apr 14, 2005, at 1:13 PM, Robert Engels wrote:
>>>>>>
>>>>>>> I think you are not approaching this the correct way.
>>>>>>>
>>>>>>> Pseudo code:
>>>>>>>
>>>>>>> Subclass IndexReader.
>>>>>>>
>>>>>>> Get tokens from String 'document' using Lucene analyzers.
>>>>>>>
>>>>>>> Build simple hash-map based data structures using tokens for
 
>>>>>>> terms,
>>>>>>> and term
>>>>>>> positions.
>>>>>>>
>>>>>>> reimplement termDocs() and termPositions() to use the structures
 
>>>>>>> from
>>>>>>> above.
>>>>>>>
>>>>>>> run searches.
>>>>>>>
>>>>>>> start again with next document.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Wolfgang Hoschek [mailto:whoschek@lbl.gov]
>>>>>>> Sent: Thursday, April 14, 2005 2:56 PM
>>>>>>> To: java-dev@lucene.apache.org
>>>>>>> Subject: Re: [Performance] Streaming main memory indexing of
 
>>>>>>> single
>>>>>>> strings
>>>>>>>
>>>>>>>
>>>>>>> Otis, this might be a misunderstanding.
>>>>>>>
>>>>>>> - I'm not calling optimize(). That piece is commented out you
if  
>>>>>>> look
>>>>>>> again at the code.
>>>>>>> - The *streaming* use case requires that for each query I add
 
>>>>>>> one (and
>>>>>>> only one) document (aka string) to an empty index:
>>>>>>>
>>>>>>> repeat N times (where N is millions or billions):
>>>>>>> 	add a single string (aka document) to an empty index
>>>>>>> 	query the index
>>>>>>> 	drop index (or delete it's document)
>>>>>>>
>>>>>>> with the following API being called N times: float match(String
 
>>>>>>> text,
>>>>>>> Query query)
>>>>>>>
>>>>>>> So there's no possibility of adding many documents and thereafter
>>>>>>> running the query. This in turn seems to mean that the  
>>>>>>> IndexWriter
>>>>>>> can't be kept open - unless I manually delete each document 

>>>>>>> after each
>>>>>>> query to repeatedly reuse the RAMDirectory, which I've also tried
>>>>>>> before without any significant performance gain - deletion seems
 
>>>>>>> to
>>>>>>> have substantial overhead in itself. Perhaps it would be better
 
>>>>>>> if
>>>>>>> there were a Directory.deleteAllDocuments() or similar. Did you
 
>>>>>>> have
>>>>>>> some other approach in mind?
>>>>>>>
>>>>>>> As I said, Lucene's design doesn't seem to fit this streaming
 
>>>>>>> use case
>>>>>>> pattern well. In *this* scenario one could easily do without
any
>>>>>>> locking, and without byte level organization in RAMDirectory
and
>>>>>>> RAMFile, etc because a single small string isn't a large  
>>>>>>> persistent
>>>>>>> multi-document index.
>>>>>>>
>>>>>>> For some background, here's a small example for the kind of 

>>>>>>> XQuery
>>>>>>> functionality Nux/Lucene integration enables:
>>>>>>>
>>>>>>> (: An XQuery that finds all books authored by James that have
 
>>>>>>> something
>>>>>>> to do with "fish", sorted by relevance :)
>>>>>>> declare namespace lucene = "java:nux.xom.xquery.XQueryUtil";
>>>>>>> declare variable $query := "fish*~";
>>>>>>>
>>>>>>> for $book in /books/book[author="James" and  
>>>>>>> lucene:match(string(.),
>>>>>>> $query) > 0.0]
>>>>>>> let $score := lucene:match(string($book), $query)
>>>>>>> order by $score descending
>>>>>>> return (<score>{$score}</score>, $book)
>>>>>>>
>>>>>>> More interestingly one can use this for classifying and routing
 
>>>>>>> XML
>>>>>>> messages based on rules (i.e. queries) inspecting their  
>>>>>>> content...
>>>>>>>
>>>>>>> Any other clues about potential improvements would be greatly
>>>>>>> appreciated.
>>>>>>>
>>>>>>> Wolfgang.
>>>>>>>
>>>>>>> On Apr 13, 2005, at 10:09 PM, Otis Gospodnetic wrote:
>>>>>>>
>>>>>>>> It looks like you are calling that IndexWriter code in some
 
>>>>>>>> loops,
>>>>>>>> opening it and closing it in every iteration of the loop
and  
>>>>>>>> also
>>>>>>>> calling optimize.  All of those things could be improved.
>>>>>>>> Keep your IndexWriter open, don't close it, and optimize
the  
>>>>>>>> index
>>>>>>>> only
>>>>>>>> once you are done adding documents to it.
>>>>>>>>
>>>>>>>> See the highlights and the snipets in the first hit:
>>>>>>>>   http://www.lucenebook.com/search?query=when+to+optimize
>>>>>>>>
>>>>>>>> Otis
>>>>>>>>
>>>>>>>>
>>>>>>>> --- Wolfgang Hoschek <whoschek@lbl.gov> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm wondering if anyone could let me know how to improve
Lucene
>>>>>>>>> performance for "streaming main memory indexing of single
 
>>>>>>>>> strings".
>>>>>>>>> This would help to effectively integrate Lucene with
the Nux  
>>>>>>>>> XQuery
>>>>>>>>> engine.
>>>>>>>>>
>>>>>>>>> Below is a small microbenchmark simulating STREAMING
XQuery  
>>>>>>>>> fulltext
>>>>>>>>> search as typical for XML network routers, message queuing
 
>>>>>>>>> system,
>>>>>>>>> P2P
>>>>>>>>> networks, etc. In this on-the-fly main memory indexing
 
>>>>>>>>> scenario, each
>>>>>>>>>
>>>>>>>>> individual string is immediately matched as soon as it
becomes
>>>>>>>>> available without any persistance involved. This usage
 
>>>>>>>>> scenario and
>>>>>>>>> corresponding performance profile is quite different
in  
>>>>>>>>> comparison to
>>>>>>>>>
>>>>>>>>> fulltext search over persistent (read-mostly) indexes.
>>>>>>>>>
>>>>>>>>> The benchmark runs at some 3000 lucene queries/sec  
>>>>>>>>> (lucene-1.4.3)
>>>>>>>>> which
>>>>>>>>> is unfortunate news considering the XQuery engine can
easily  
>>>>>>>>> walk
>>>>>>>>> hundreds of thousands of XML nodes per second. Ideally
I'd  
>>>>>>>>> like to
>>>>>>>>> run
>>>>>>>>> at some 100000 queries/sec. Runnning this through the
JDK 1.5
>>>>>>>>> profiler
>>>>>>>>> it seems that most time is spent in and below the following
 
>>>>>>>>> calls:
>>>>>>>>>
>>>>>>>>> writer = new IndexWriter(dir, analyzer, true);
>>>>>>>>> writer.addDocument(...);
>>>>>>>>> writer.close();
>>>>>>>>>
>>>>>>>>> I tried quite a few variants of the benchmark with various
 
>>>>>>>>> options,
>>>>>>>>> unfortunately with little or no effect.
>>>>>>>>> Lucene just does not seem to designed to do this sort
of  
>>>>>>>>> "transient
>>>>>>>>> single string index" thing. All code paths related to
opening,
>>>>>>>>> closing,
>>>>>>>>> reading, writing, querying and object creation seem to
be  
>>>>>>>>> designed
>>>>>>>>> for
>>>>>>>>> large persistent indexes.
>>>>>>>>>
>>>>>>>>> Any advice on what I'm missing or what could be done
about it  
>>>>>>>>> would
>>>>>>>>> be
>>>>>>>>> greatly appreciated.
>>>>>>>>>
>>>>>>>>> Wolfgang.
>>>>>>>>>
>>>>>>>>> P.S. the benchmark code is attached as a file below:
>>>>>>>>>
>>>>>>>>>> package nux.xom.pool;
>>>>>>>>>
>>>>>>>>> import java.io.IOException;
>>>>>>>>> //import java.io.Reader;
>>>>>>>>>
>>>>>>>>> import org.apache.lucene.analysis.Analyzer;
>>>>>>>>> //import org.apache.lucene.analysis.LowerCaseTokenizer;
>>>>>>>>> //import org.apache.lucene.analysis.PorterStemFilter;
>>>>>>>>> //import org.apache.lucene.analysis.SimpleAnalyzer;
>>>>>>>>> //import org.apache.lucene.analysis.TokenStream;
>>>>>>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>>>>>>>>> import org.apache.lucene.document.Document;
>>>>>>>>> import org.apache.lucene.document.Field;
>>>>>>>>> //import org.apache.lucene.index.IndexReader;
>>>>>>>>> import org.apache.lucene.index.IndexWriter;
>>>>>>>>> import org.apache.lucene.queryParser.ParseException;
>>>>>>>>> import org.apache.lucene.queryParser.QueryParser;
>>>>>>>>> import org.apache.lucene.search.Hits;
>>>>>>>>> import org.apache.lucene.search.IndexSearcher;
>>>>>>>>> import org.apache.lucene.search.Query;
>>>>>>>>> import org.apache.lucene.search.Searcher;
>>>>>>>>> import org.apache.lucene.store.Directory;
>>>>>>>>> import org.apache.lucene.store.RAMDirectory;
>>>>>>>>>
>>>>>>>>> public final class LuceneMatcher { // TODO: make non-public
>>>>>>>>>
>>>>>>>>> 	private final Analyzer analyzer;
>>>>>>>>> //	private final Directory dir = new RAMDirectory();
>>>>>>>>>
>>>>>>>>> 	public LuceneMatcher() {
>>>>>>>>> 		this(new StandardAnalyzer());
>>>>>>>>> //		this(new SimpleAnalyzer());
>>>>>>>>> //		this(new StopAnalyzer());
>>>>>>>>> //		this(new Analyzer() {
>>>>>>>>> //			public final TokenStream tokenStream(String fieldName,
 
>>>>>>>>> Reader
>>>>>>>>> reader) {
>>>>>>>>> //				return new PorterStemFilter(new  
>>>>>>>>> LowerCaseTokenizer(reader));
>>>>>>>>> //			}
>>>>>>>>> //		});
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> 	public LuceneMatcher(Analyzer analyzer) {
>>>>>>>>> 		if (analyzer == null)
>>>>>>>>> 			throw new IllegalArgumentException("analyzer must
not be  
>>>>>>>>> null");
>>>>>>>>> 		this.analyzer = analyzer;
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> 	public Query parseQuery(String expression) throws  
>>>>>>>>> ParseException {
>>>>>>>>> 		QueryParser parser = new QueryParser("content", analyzer);
>>>>>>>>> //		parser.setPhraseSlop(0);
>>>>>>>>> 		return parser.parse(expression);
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> 	/**
>>>>>>>>> 	 * Returns the relevance score by matching the given
index  
>>>>>>>>> against
>>>>>>>>> the given
>>>>>>>>> 	 * Lucene query expression. The index must not contain
more  
>>>>>>>>> than one
>>>>>>>>> Lucene
>>>>>>>>> 	 * "document" (aka string to be searched).
>>>>>>>>> 	 */
>>>>>>>>> 	public float match(Directory index, Query query) {
>>>>>>>>> 		Searcher searcher = null;
>>>>>>>>> 		try {
>>>>>>>>> 			searcher = new IndexSearcher(index);
>>>>>>>>> 			Hits hits = searcher.search(query);
>>>>>>>>> 			float score = hits.length() > 0 ? hits.score(0)
: 0.0f;
>>>>>>>>> 			return score;
>>>>>>>>> 		} catch (IOException e) { // should never happen  
>>>>>>>>> (RAMDirectory)
>>>>>>>>> 			throw new RuntimeException(e);
>>>>>>>>> 		} finally {
>>>>>>>>> 			try {
>>>>>>>>> 				if (searcher != null) searcher.close();
>>>>>>>>> 			} catch (IOException e) { // should never happen 

>>>>>>>>> (RAMDirectory)
>>>>>>>>> 				throw new RuntimeException(e);
>>>>>>>>> 			}
>>>>>>>>> 		}
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> //	public float match(String text, Query query) {
>>>>>>>>> //		return match(createIndex(text), query);
>>>>>>>>> //	}
>>>>>>>>>
>>>>>>>>> 	public Directory createIndex(String text) {
>>>>>>>>> 		Directory dir = new RAMDirectory();
>>>>>>>>> 		IndexWriter writer = null;
>>>>>>>>> 		try {
>>>>>>>>> 			writer = new IndexWriter(dir, analyzer, true);
>>>>>>>>> //			writer.setUseCompoundFile(false);
>>>>>>>>> //			writer.mergeFactor = 2;
>>>>>>>>> //			writer.minMergeDocs = 1;
>>>>>>>>> //			writer.maxMergeDocs = 1;
>>>>>>>>>
>>>>>>>>> 			writer.addDocument(createDocument(text));
>>>>>>>>> //			writer.optimize();
>>>>>>>>> 			return dir;
>>>>>>>>> 		} catch (IOException e) { // should never happen  
>>>>>>>>> (RAMDirectory)
>>>>>>>>> 			throw new RuntimeException(e);
>>>>>>>>> 		} finally {
>>>>>>>>> 			try {
>>>>>>>>> 				if (writer != null) writer.close();
>>>>>>>>> 			} catch (IOException e) { // should never happen 

>>>>>>>>> (RAMDirectory)
>>>>>>>>> 				throw new RuntimeException(e);
>>>>>>>>> 			}
>>>>>>>>> 		}
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> 	private Document createDocument(String content) {
>>>>>>>>> 		Document doc = new Document();
>>>>>>>>> 		doc.add(Field.UnStored("content", content));
>>>>>>>>> //		doc.add(Field.Text("x", content));
>>>>>>>>> 		return doc;
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> 	/**
>>>>>>>>> 	 * Lucene microbenchmark simulating STREAMING XQuery
fulltext  
>>>>>>>>> search
>>>>>>>>> as
>>>>>>>>> 	 * typical for XML network routers, message queuing
system,  
>>>>>>>>> P2P
>>>>>>>>> networks,
>>>>>>>>> 	 * etc. In this on-the-fly main memory indexing scenario,
each
>>>>>>>>> individual
>>>>>>>>> 	 * string is immediately matched as soon as it becomes
 
>>>>>>>>> available
>>>>>>>>> without any
>>>>>>>>> 	 * persistance involved. This usage scenario and corresponding
>>>>>>>>> performance
>>>>>>>>> 	 * profile is quite different in comparison to fulltext
 
>>>>>>>>> search over
>>>>>>>>> 	 * persistent (read-mostly) indexes.
>>>>>>>>> 	 *
>>>>>>>>> 	 * Example XPath:  
>>>>>>>>> count(/table/row[lucene:match(string(./firstname),
>>>>>>>>> 	 * "James") > 0.0])
>>>>>>>>> 	 */
>>>>>>>>> 	public static void main(String[] args) throws Exception
{
>>>>>>>>> 		int k = -1;
>>>>>>>>> 		int runs = 5;
>>>>>>>>> 		if (args.length > ++k) runs = Integer.parseInt(args[k]);
>>>>>>>>>
>>>>>>>>> 		int nodes = 10000;
>>>>>>>>> 		if (args.length > ++k) nodes = Integer.parseInt(args[k]);
>>>>>>>>>
>>>>>>>>> 		String content = "James is out in the woods";
>>>>>>>>> 		if (args.length > ++k) content = args[k];
>>>>>>>>>
>>>>>>>>> 		String expression = "James";
>>>>>>>>> 		if (args.length > ++k) expression = args[k];
>>>>>>>>>
>>>>>>>>> 		LuceneMatcher matcher = new LuceneMatcher();
>>>>>>>>> 		Query query = matcher.parseQuery(expression); // to
be  
>>>>>>>>> reused N
>>>>>>>>> times
>>>>>>>>>
>>>>>>>>> 		for (int r = 0; r < runs; r++) {
>>>>>>>>> 			long start = System.currentTimeMillis();
>>>>>>>>> 			int matches = 0;
>>>>>>>>>
>>>>>>>>> 			for (int i = 0; i < nodes; i++) {
>>>>>>>>> //				if (LuceneUtil.match(content + i, expression) >
0.0f) {
>>>>>>>>> 				if (matcher.match(matcher.createIndex(content + i),
query)  
>>>>>>>>> >
>>>>>>>>> 0.0f) {
>>>>>>>>> 					matches++;
>>>>>>>>> 				}
>>>>>>>>> 			}
>>>>>>>>>
>>>>>>>>> 			long end = System.currentTimeMillis();
>>>>>>>>> 			System.out.println("matches=" + matches);
>>>>>>>>> 			System.out.println("secs=" + ((end-start) / 1000.0f));
>>>>>>>>> 			System.out.println("queries/sec=" + (nodes / ((end-start)
/
>>>>>>>>> 1000.0f)));
>>>>>>>>> 			System.out.println();
>>>>>>>>> 		}
>>>>>>>>> 	}
>>>>>>>>> }
>>>>>
>>>>> public class StringIndexReader extends IndexReader {
>>>>>   private List terms;
>>>>>   public StringIndexReader(String strings[]) {
>>>>>     super(null);
>>>>>     terms = Arrays.asList(strings);
>>>>>     Collections.sort(terms);
>>>>>   }
>>>>>
>>>>>   public TermFreqVector[] getTermFreqVectors(int docNumber) throws  
>>>>> IOException {
>>>>>     System.out.println("StringIndexReader.getTermFreqVectors");
>>>>>     return new TermFreqVector[0];
>>>>>   }
>>>>>
>>>>>   public TermFreqVector getTermFreqVector(int docNumber, String  
>>>>> field) throws IOException {
>>>>>     System.out.println("StringIndexReader.getTermFreqVector");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   public int numDocs() {
>>>>>     System.out.println("StringIndexReader.numDocs");
>>>>>     return 1;
>>>>>   }
>>>>>
>>>>>   public int maxDoc() {
>>>>>     System.out.println("StringIndexReader.maxDoc");
>>>>>     return 1;
>>>>>   }
>>>>>
>>>>>   public Document document(int n) throws IOException {
>>>>>     System.out.println("StringIndexReader.document");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   public boolean isDeleted(int n) {
>>>>>     System.out.println("StringIndexReader.isDeleted");
>>>>>     return false;
>>>>>   }
>>>>>
>>>>>   public boolean hasDeletions() {
>>>>>     System.out.println("StringIndexReader.hasDeletions");
>>>>>     return false;
>>>>>   }
>>>>>
>>>>>   public byte[] norms(String field) throws IOException {
>>>>>     // TODO: what value to use for this?
>>>>>     System.out.println("StringIndexReader.norms: " + field);
>>>>>     return new byte[] { 1 };
>>>>>   }
>>>>>
>>>>>   public void norms(String field, byte[] bytes, int offset) throws  
>>>>> IOException {
>>>>>     System.out.println("StringIndexReader.norms: " + field + "*");
>>>>>   }
>>>>>
>>>>>   protected void doSetNorm(int doc, String field, byte value)  
>>>>> throws IOException {
>>>>>     System.out.println("StringIndexReader.doSetNorm");
>>>>>
>>>>>   }
>>>>>
>>>>>   public TermEnum terms() throws IOException {
>>>>>     System.out.println("StringIndexReader.terms");
>>>>>     return terms(null);
>>>>>   }
>>>>>
>>>>>   public TermEnum terms(final Term term) throws IOException {
>>>>>     System.out.println("StringIndexReader.terms: " + term);
>>>>>
>>>>>     TermEnum termEnum = new TermEnum() {
>>>>>       private String currentTerm;
>>>>>       private Iterator iter;
>>>>>
>>>>>       public boolean next() {
>>>>>         System.out.println("TermEnum.next");
>>>>>         if (iter.hasNext())
>>>>>           currentTerm = (String) iter.next();
>>>>>         return iter.hasNext();
>>>>>       }
>>>>>
>>>>>       public Term term() {
>>>>>         if (iter == null) {
>>>>>           iter = terms.iterator();
>>>>>           while (next()) {
>>>>>             if (currentTerm.startsWith(term.text()))
>>>>>               break;
>>>>>           }
>>>>>         }
>>>>>         System.out.println("TermEnum.term: " + currentTerm);
>>>>>         return new Term(term.field(), currentTerm);
>>>>>       }
>>>>>
>>>>>       public int docFreq() {
>>>>>         System.out.println("TermEnum.docFreq");
>>>>>         return 1;
>>>>>       }
>>>>>
>>>>>       public void close() {
>>>>>         System.out.println("TermEnum.close");
>>>>>       }
>>>>>     };
>>>>>     return termEnum;
>>>>>   }
>>>>>
>>>>>   public int docFreq(Term term) throws IOException {
>>>>>     System.out.println("StringIndexReader.docFreq: " + term);
>>>>>     return terms.contains(term.text()) ? 1 : 0;
>>>>>   }
>>>>>
>>>>>   public TermDocs termDocs() throws IOException {
>>>>>     System.out.println("StringIndexReader.termDocs");
>>>>>
>>>>>     TermDocs td = new TermDocs() {
>>>>>       private boolean done = false;
>>>>>       String currentTerm;
>>>>>
>>>>>       public void seek(Term term) {
>>>>>         System.out.println(".seek: " + term);
>>>>>         currentTerm = term.text();
>>>>>         done = false;
>>>>>       }
>>>>>
>>>>>       public void seek(TermEnum termEnum) {
>>>>>         seek(termEnum.term());
>>>>>       }
>>>>>
>>>>>       public int doc() {
>>>>>         System.out.println(".doc");
>>>>>         return 0;
>>>>>       }
>>>>>
>>>>>       public int freq() {
>>>>>         System.out.println(".freq");
>>>>>         return 1;
>>>>>       }
>>>>>
>>>>>       public boolean next() {
>>>>>         System.out.println(".next");
>>>>>         return false;
>>>>>       }
>>>>>
>>>>>       public int read(int[] docs, int[] freqs) {
>>>>>         System.out.println(".read: " + docs.length);
>>>>>
>>>>>         if (done) return 0;
>>>>>
>>>>>         done = true;
>>>>>         docs[0] = 0;
>>>>>         freqs[0] = freq();
>>>>>         return 1;
>>>>>       }
>>>>>
>>>>>       public boolean skipTo(int target) {
>>>>>         System.out.println(".skipTo");
>>>>>         return false;
>>>>>       }
>>>>>
>>>>>       public void close() {
>>>>>         System.out.println(".close");
>>>>>
>>>>>       }
>>>>>     };
>>>>>     return td;
>>>>>   }
>>>>>
>>>>>   public TermPositions termPositions() throws IOException {
>>>>>     System.out.println("StringIndexReader.termPositions");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   protected void doDelete(int docNum) throws IOException {
>>>>>     System.out.println("StringIndexReader.doDelete");
>>>>>
>>>>>   }
>>>>>
>>>>>   protected void doUndeleteAll() throws IOException {
>>>>>     System.out.println("StringIndexReader.doUndeleteAll");
>>>>>
>>>>>   }
>>>>>
>>>>>   protected void doCommit() throws IOException {
>>>>>     System.out.println("StringIndexReader.doCommit");
>>>>>
>>>>>   }
>>>>>
>>>>>   protected void doClose() throws IOException {
>>>>>     System.out.println("StringIndexReader.doClose");
>>>>>
>>>>>   }
>>>>>
>>>>>   public Collection getFieldNames() throws IOException {
>>>>>     System.out.println("StringIndexReader.getFieldNames");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   public Collection getFieldNames(boolean indexed) throws  
>>>>> IOException {
>>>>>     System.out.println("StringIndexReader.getFieldNames");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   public Collection getIndexedFieldNames(Field.TermVector tvSpec) {
>>>>>     System.out.println("StringIndexReader.getIndexedFieldNames");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   public Collection getFieldNames(FieldOption fldOption) {
>>>>>     System.out.println("StringIndexReader.getFieldNames");
>>>>>     return null;
>>>>>   }
>>>>>
>>>>>   public static void main(String[] args) {
>>>>>     IndexReader reader = new StringIndexReader(new String[]  
>>>>> {"foo", "bar", "baz"});
>>>>>     IndexSearcher searcher = new IndexSearcher(reader);
>>>>>
>>>>>     Hits hits = null;
>>>>>     try {
>>>>>       hits = searcher.search(new WildcardQuery(new  
>>>>> Term("field","ba*")));
>>>>>     } catch (IOException e) {
>>>>>       e.printStackTrace();
>>>>>     }
>>>>>     System.out.println("found " + hits.length());
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message