Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 57353 invoked from network); 20 Apr 2005 17:19:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Apr 2005 17:19:23 -0000 Received: (qmail 29582 invoked by uid 500); 20 Apr 2005 13:11:08 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 29543 invoked by uid 500); 20 Apr 2005 13:11:07 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 29526 invoked by uid 99); 20 Apr 2005 13:11:07 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from exchange.bvdep.com (HELO exchange.bvdep.com) (193.194.158.19) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 20 Apr 2005 06:11:06 -0700 Content-Class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0 Subject: RE: [Performance] Streaming main memory indexing of single strings Date: Wed, 20 Apr 2005 15:10:56 +0200 Message-ID: <950FF7DE40C2B64CAF80B564732927E1039E71E2@exchange.be.bvd> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [Performance] Streaming main memory indexing of single strings thread-index: AcVCGJ83A0+zDuD+SmelRoxvZ5GxwQDkKcTw From: "Vanlerberghe, Luc" To: X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N One reason to choose the 'simplistic IndexReader' approach to this problem over regex's is that the result should be 'bug-compatible' with a standard search over all documents. Differences between the two systems would be difficult to explain to an end-user (let alone for the developer to debug and find the reason in the first place!) Luc -----Original Message----- From: Erik Hatcher [mailto:erik@ehatchersolutions.com]=20 Sent: Saturday, April 16, 2005 2:09 AM To: java-dev@lucene.apache.org Subject: Re: [Performance] Streaming main memory indexing of single strings On Apr 15, 2005, at 6:15 PM, Wolfgang Hoschek wrote: > Cool! For my use case it would need to be able to handle arbitrary=20 > queries (previously parsed from a general lucene query string). > Something like: > > float match(String Text, Query query) > > it's fine with me if it also works for > > float[] match(String[] texts, Query query) or > float(Document doc, Query query) > > but that isn't required by the use case. My implementation is nearly that. The score is available as hits.score(0). You would also need an analyzer, I presume, passed to your proposed match() method if you want the text broken into terms. =20 My current implementation is passed a String[] where each item is considered a term for the document. match() would also need a field name to be fully accurate - since the analyzer needs a field name and terms used for searching need a field name. The Query may contain terms for any number of fields - how should that be handled? Should only a single field name be passed in and any terms request for other fields be ignored? Or should this utility morph to assume any words in the text is in any field being asked of it? As for Doug's devil advocate questions - I really don't know what I'd use it for personally (other than the "match this single string against a bunch of queries"), I just thought it was clever that it could be done. Clever regex's could come close, but it'd be a lot more effort than reusing good ol' QueryParser and this simplistic IndexReader, along with an Analyzer. Erik > > Wolfgang. > >> I am intrigued by this and decided to mock a quick and dirty example=20 >> of such an IndexReader. After a little trial-and-error I got it=20 >> working at least for TermQuery and WildcardQuery. I've pasted my=20 >> code below as an example, but there is much room for improvement,=20 >> especially in terms of performance and also in keeping track of term=20 >> frequency, and also it would be nicer if it handled the analysis=20 >> internally. >> >> I think something like this would make a handy addition to our=20 >> contrib area at least. I'd be happy to receive improvements to this=20 >> and then add it to a contrib subproject. >> >> Perhaps this would be a handy way to handle situations where users=20 >> have queries saved in a system and need to be alerted whenever a new=20 >> document arrives matching the saved queries? >> >> Erik >> >> >> >>> >>> >>> -----Original Message----- >>> From: Wolfgang Hoschek [mailto:whoschek@lbl.gov] >>> Sent: Thursday, April 14, 2005 4:04 PM >>> To: java-dev@lucene.apache.org >>> Subject: Re: [Performance] Streaming main memory indexing of single=20 >>> strings >>> >>> >>> This seems to be a promising avenue worth exploring. My gutfeeling=20 >>> is that this could easily be 10-100 times faster. >>> >>> The drawback is that it requires a fair amount of understanding of=20 >>> intricate Lucene internals, pulling those pieces together and=20 >>> adapting them as required for the seemingly simple "float=20 >>> match(String text, Query query)". >>> >>> I might give it a shot but I'm not sure I'll be able to pull this=20 >>> off! >>> Is there any similar code I could look at as a starting point? >>> >>> Wolfgang. >>> >>> On Apr 14, 2005, at 1:13 PM, Robert Engels wrote: >>> >>>> I think you are not approaching this the correct way. >>>> >>>> Pseudo code: >>>> >>>> Subclass IndexReader. >>>> >>>> Get tokens from String 'document' using Lucene analyzers. >>>> >>>> Build simple hash-map based data structures using tokens for terms, >>>> and term positions. >>>> >>>> reimplement termDocs() and termPositions() to use the structures=20 >>>> from above. >>>> >>>> run searches. >>>> >>>> start again with next document. >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Wolfgang Hoschek [mailto:whoschek@lbl.gov] >>>> Sent: Thursday, April 14, 2005 2:56 PM >>>> To: java-dev@lucene.apache.org >>>> Subject: Re: [Performance] Streaming main memory indexing of single >>>> strings >>>> >>>> >>>> Otis, this might be a misunderstanding. >>>> >>>> - I'm not calling optimize(). That piece is commented out you if=20 >>>> look again at the code. >>>> - The *streaming* use case requires that for each query I add one=20 >>>> (and only one) document (aka string) to an empty index: >>>> >>>> repeat N times (where N is millions or billions): >>>> add a single string (aka document) to an empty index >>>> query the index >>>> drop index (or delete it's document) >>>> >>>> with the following API being called N times: float match(String=20 >>>> text, Query query) >>>> >>>> So there's no possibility of adding many documents and thereafter=20 >>>> running the query. This in turn seems to mean that the IndexWriter=20 >>>> can't be kept open - unless I manually delete each document after=20 >>>> each query to repeatedly reuse the RAMDirectory, which I've also=20 >>>> tried before without any significant performance gain - deletion=20 >>>> seems to have substantial overhead in itself. Perhaps it would be=20 >>>> better if there were a Directory.deleteAllDocuments() or similar.=20 >>>> Did you have some other approach in mind? >>>> >>>> As I said, Lucene's design doesn't seem to fit this streaming use=20 >>>> case pattern well. In *this* scenario one could easily do without=20 >>>> any locking, and without byte level organization in RAMDirectory=20 >>>> and RAMFile, etc because a single small string isn't a large=20 >>>> persistent multi-document index. >>>> >>>> For some background, here's a small example for the kind of XQuery=20 >>>> functionality Nux/Lucene integration enables: >>>> >>>> (: An XQuery that finds all books authored by James that have=20 >>>> something to do with "fish", sorted by relevance :) declare=20 >>>> namespace lucene =3D "java:nux.xom.xquery.XQueryUtil"; declare=20 >>>> variable $query :=3D "fish*~"; >>>> >>>> for $book in /books/book[author=3D"James" and = lucene:match(string(.), >>>> $query) > 0.0] >>>> let $score :=3D lucene:match(string($book), $query) order by $score = >>>> descending return ({$score}, $book) >>>> >>>> More interestingly one can use this for classifying and routing XML >>>> messages based on rules (i.e. queries) inspecting their content... >>>> >>>> Any other clues about potential improvements would be greatly=20 >>>> appreciated. >>>> >>>> Wolfgang. >>>> >>>> On Apr 13, 2005, at 10:09 PM, Otis Gospodnetic wrote: >>>> >>>>> It looks like you are calling that IndexWriter code in some loops, >>>>> opening it and closing it in every iteration of the loop and also=20 >>>>> calling optimize. All of those things could be improved. >>>>> Keep your IndexWriter open, don't close it, and optimize the index >>>>> only once you are done adding documents to it. >>>>> >>>>> See the highlights and the snipets in the first hit: >>>>> http://www.lucenebook.com/search?query=3Dwhen+to+optimize >>>>> >>>>> Otis >>>>> >>>>> >>>>> --- Wolfgang Hoschek wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm wondering if anyone could let me know how to improve Lucene=20 >>>>>> performance for "streaming main memory indexing of single=20 >>>>>> strings". >>>>>> This would help to effectively integrate Lucene with the Nux=20 >>>>>> XQuery engine. >>>>>> >>>>>> Below is a small microbenchmark simulating STREAMING XQuery=20 >>>>>> fulltext search as typical for XML network routers, message=20 >>>>>> queuing system, P2P networks, etc. In this on-the-fly main memory >>>>>> indexing scenario, each >>>>>> >>>>>> individual string is immediately matched as soon as it becomes=20 >>>>>> available without any persistance involved. This usage scenario=20 >>>>>> and corresponding performance profile is quite different in=20 >>>>>> comparison to >>>>>> >>>>>> fulltext search over persistent (read-mostly) indexes. >>>>>> >>>>>> The benchmark runs at some 3000 lucene queries/sec (lucene-1.4.3) >>>>>> which is unfortunate news considering the XQuery engine can=20 >>>>>> easily walk hundreds of thousands of XML nodes per second.=20 >>>>>> Ideally I'd like to run at some 100000 queries/sec. Runnning this >>>>>> through the JDK 1.5 profiler it seems that most time is spent in=20 >>>>>> and below the following calls: >>>>>> >>>>>> writer =3D new IndexWriter(dir, analyzer, true);=20 >>>>>> writer.addDocument(...); writer.close(); >>>>>> >>>>>> I tried quite a few variants of the benchmark with various=20 >>>>>> options, unfortunately with little or no effect. >>>>>> Lucene just does not seem to designed to do this sort of=20 >>>>>> "transient single string index" thing. All code paths related to=20 >>>>>> opening, closing, reading, writing, querying and object creation=20 >>>>>> seem to be designed for large persistent indexes. >>>>>> >>>>>> Any advice on what I'm missing or what could be done about it=20 >>>>>> would be greatly appreciated. >>>>>> >>>>>> Wolfgang. >>>>>> >>>>>> P.S. the benchmark code is attached as a file below: >>>>>> >>>>>>> package nux.xom.pool; >>>>>> >>>>>> import java.io.IOException; >>>>>> //import java.io.Reader; >>>>>> >>>>>> import org.apache.lucene.analysis.Analyzer; >>>>>> //import org.apache.lucene.analysis.LowerCaseTokenizer; >>>>>> //import org.apache.lucene.analysis.PorterStemFilter; >>>>>> //import org.apache.lucene.analysis.SimpleAnalyzer; >>>>>> //import org.apache.lucene.analysis.TokenStream; >>>>>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>>>>> import org.apache.lucene.document.Document; >>>>>> import org.apache.lucene.document.Field; //import=20 >>>>>> org.apache.lucene.index.IndexReader; >>>>>> import org.apache.lucene.index.IndexWriter; >>>>>> import org.apache.lucene.queryParser.ParseException; >>>>>> import org.apache.lucene.queryParser.QueryParser; >>>>>> import org.apache.lucene.search.Hits; import=20 >>>>>> org.apache.lucene.search.IndexSearcher; >>>>>> import org.apache.lucene.search.Query; import=20 >>>>>> org.apache.lucene.search.Searcher; >>>>>> import org.apache.lucene.store.Directory; >>>>>> import org.apache.lucene.store.RAMDirectory; >>>>>> >>>>>> public final class LuceneMatcher { // TODO: make non-public >>>>>> >>>>>> private final Analyzer analyzer; >>>>>> // private final Directory dir =3D new RAMDirectory(); >>>>>> >>>>>> public LuceneMatcher() { >>>>>> this(new StandardAnalyzer()); >>>>>> // this(new SimpleAnalyzer()); >>>>>> // this(new StopAnalyzer()); >>>>>> // this(new Analyzer() { >>>>>> // public final TokenStream tokenStream(String fieldName, Reader >>>>>> reader) { >>>>>> // return new PorterStemFilter(new LowerCaseTokenizer(reader)); >>>>>> // } >>>>>> // }); >>>>>> } >>>>>> >>>>>> public LuceneMatcher(Analyzer analyzer) { >>>>>> if (analyzer =3D=3D null) >>>>>> throw new IllegalArgumentException("analyzer must not be=20 >>>>>> null"); >>>>>> this.analyzer =3D analyzer; >>>>>> } >>>>>> >>>>>> public Query parseQuery(String expression) throws ParseException >>>>>> { >>>>>> QueryParser parser =3D new QueryParser("content", analyzer); >>>>>> // parser.setPhraseSlop(0); >>>>>> return parser.parse(expression); >>>>>> } >>>>>> >>>>>> /** >>>>>> * Returns the relevance score by matching the given index=20 >>>>>> against the given >>>>>> * Lucene query expression. The index must not contain more than >>>>>> one Lucene >>>>>> * "document" (aka string to be searched). >>>>>> */ >>>>>> public float match(Directory index, Query query) { >>>>>> Searcher searcher =3D null; >>>>>> try { >>>>>> searcher =3D new IndexSearcher(index); >>>>>> Hits hits =3D searcher.search(query); >>>>>> float score =3D hits.length() > 0 ? hits.score(0) : 0.0f; >>>>>> return score; >>>>>> } catch (IOException e) { // should never happen (RAMDirectory) >>>>>> throw new RuntimeException(e); >>>>>> } finally { >>>>>> try { >>>>>> if (searcher !=3D null) searcher.close(); >>>>>> } catch (IOException e) { // should never happen (RAMDirectory) >>>>>> throw new RuntimeException(e); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> // public float match(String text, Query query) { >>>>>> // return match(createIndex(text), query); >>>>>> // } >>>>>> >>>>>> public Directory createIndex(String text) { >>>>>> Directory dir =3D new RAMDirectory(); >>>>>> IndexWriter writer =3D null; >>>>>> try { >>>>>> writer =3D new IndexWriter(dir, analyzer, true); >>>>>> // writer.setUseCompoundFile(false); >>>>>> // writer.mergeFactor =3D 2; >>>>>> // writer.minMergeDocs =3D 1; >>>>>> // writer.maxMergeDocs =3D 1; >>>>>> >>>>>> writer.addDocument(createDocument(text)); >>>>>> // writer.optimize(); >>>>>> return dir; >>>>>> } catch (IOException e) { // should never happen (RAMDirectory) >>>>>> throw new RuntimeException(e); >>>>>> } finally { >>>>>> try { >>>>>> if (writer !=3D null) writer.close(); >>>>>> } catch (IOException e) { // should never happen (RAMDirectory) >>>>>> throw new RuntimeException(e); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> private Document createDocument(String content) { >>>>>> Document doc =3D new Document(); >>>>>> doc.add(Field.UnStored("content", content)); >>>>>> // doc.add(Field.Text("x", content)); >>>>>> return doc; >>>>>> } >>>>>> >>>>>> /** >>>>>> * Lucene microbenchmark simulating STREAMING XQuery fulltext=20 >>>>>> search as >>>>>> * typical for XML network routers, message queuing system, P2P=20 >>>>>> networks, >>>>>> * etc. In this on-the-fly main memory indexing scenario, each=20 >>>>>> individual >>>>>> * string is immediately matched as soon as it becomes available >>>>>> without any >>>>>> * persistance involved. This usage scenario and corresponding=20 >>>>>> performance >>>>>> * profile is quite different in comparison to fulltext search=20 >>>>>> over >>>>>> * persistent (read-mostly) indexes. >>>>>> * >>>>>> * Example XPath:=20 >>>>>> count(/table/row[lucene:match(string(./firstname), >>>>>> * "James") > 0.0]) >>>>>> */ >>>>>> public static void main(String[] args) throws Exception { >>>>>> int k =3D -1; >>>>>> int runs =3D 5; >>>>>> if (args.length > ++k) runs =3D Integer.parseInt(args[k]); >>>>>> >>>>>> int nodes =3D 10000; >>>>>> if (args.length > ++k) nodes =3D Integer.parseInt(args[k]); >>>>>> >>>>>> String content =3D "James is out in the woods"; >>>>>> if (args.length > ++k) content =3D args[k]; >>>>>> >>>>>> String expression =3D "James"; >>>>>> if (args.length > ++k) expression =3D args[k]; >>>>>> >>>>>> LuceneMatcher matcher =3D new LuceneMatcher(); >>>>>> Query query =3D matcher.parseQuery(expression); // to be reused N=20 >>>>>> times >>>>>> >>>>>> for (int r =3D 0; r < runs; r++) { >>>>>> long start =3D System.currentTimeMillis(); >>>>>> int matches =3D 0; >>>>>> >>>>>> for (int i =3D 0; i < nodes; i++) { >>>>>> // if (LuceneUtil.match(content + i, expression) > 0.0f) { >>>>>> if (matcher.match(matcher.createIndex(content + i), query) > >>>>>> 0.0f) { >>>>>> matches++; >>>>>> } >>>>>> } >>>>>> >>>>>> long end =3D System.currentTimeMillis(); >>>>>> System.out.println("matches=3D" + matches); >>>>>> System.out.println("secs=3D" + ((end-start) / 1000.0f)); >>>>>> System.out.println("queries/sec=3D" + (nodes / ((end-start) /=20 >>>>>> 1000.0f))); >>>>>> System.out.println(); >>>>>> } >>>>>> } >>>>>> } >> >> public class StringIndexReader extends IndexReader { >> private List terms; >> public StringIndexReader(String strings[]) { >> super(null); >> terms =3D Arrays.asList(strings); >> Collections.sort(terms); >> } >> >> public TermFreqVector[] getTermFreqVectors(int docNumber) throws=20 >> IOException { >> System.out.println("StringIndexReader.getTermFreqVectors"); >> return new TermFreqVector[0]; >> } >> >> public TermFreqVector getTermFreqVector(int docNumber, String >> field) throws IOException { >> System.out.println("StringIndexReader.getTermFreqVector"); >> return null; >> } >> >> public int numDocs() { >> System.out.println("StringIndexReader.numDocs"); >> return 1; >> } >> >> public int maxDoc() { >> System.out.println("StringIndexReader.maxDoc"); >> return 1; >> } >> >> public Document document(int n) throws IOException { >> System.out.println("StringIndexReader.document"); >> return null; >> } >> >> public boolean isDeleted(int n) { >> System.out.println("StringIndexReader.isDeleted"); >> return false; >> } >> >> public boolean hasDeletions() { >> System.out.println("StringIndexReader.hasDeletions"); >> return false; >> } >> >> public byte[] norms(String field) throws IOException { >> // TODO: what value to use for this? >> System.out.println("StringIndexReader.norms: " + field); >> return new byte[] { 1 }; >> } >> >> public void norms(String field, byte[] bytes, int offset) throws=20 >> IOException { >> System.out.println("StringIndexReader.norms: " + field + "*"); >> } >> >> protected void doSetNorm(int doc, String field, byte value) throws=20 >> IOException { >> System.out.println("StringIndexReader.doSetNorm"); >> >> } >> >> public TermEnum terms() throws IOException { >> System.out.println("StringIndexReader.terms"); >> return terms(null); >> } >> >> public TermEnum terms(final Term term) throws IOException { >> System.out.println("StringIndexReader.terms: " + term); >> >> TermEnum termEnum =3D new TermEnum() { >> private String currentTerm; >> private Iterator iter; >> >> public boolean next() { >> System.out.println("TermEnum.next"); >> if (iter.hasNext()) >> currentTerm =3D (String) iter.next(); >> return iter.hasNext(); >> } >> >> public Term term() { >> if (iter =3D=3D null) { >> iter =3D terms.iterator(); >> while (next()) { >> if (currentTerm.startsWith(term.text())) >> break; >> } >> } >> System.out.println("TermEnum.term: " + currentTerm); >> return new Term(term.field(), currentTerm); >> } >> >> public int docFreq() { >> System.out.println("TermEnum.docFreq"); >> return 1; >> } >> >> public void close() { >> System.out.println("TermEnum.close"); >> } >> }; >> return termEnum; >> } >> >> public int docFreq(Term term) throws IOException { >> System.out.println("StringIndexReader.docFreq: " + term); >> return terms.contains(term.text()) ? 1 : 0; >> } >> >> public TermDocs termDocs() throws IOException { >> System.out.println("StringIndexReader.termDocs"); >> >> TermDocs td =3D new TermDocs() { >> private boolean done =3D false; >> String currentTerm; >> >> public void seek(Term term) { >> System.out.println(".seek: " + term); >> currentTerm =3D term.text(); >> done =3D false; >> } >> >> public void seek(TermEnum termEnum) { >> seek(termEnum.term()); >> } >> >> public int doc() { >> System.out.println(".doc"); >> return 0; >> } >> >> public int freq() { >> System.out.println(".freq"); >> return 1; >> } >> >> public boolean next() { >> System.out.println(".next"); >> return false; >> } >> >> public int read(int[] docs, int[] freqs) { >> System.out.println(".read: " + docs.length); >> >> if (done) return 0; >> >> done =3D true; >> docs[0] =3D 0; >> freqs[0] =3D freq(); >> return 1; >> } >> >> public boolean skipTo(int target) { >> System.out.println(".skipTo"); >> return false; >> } >> >> public void close() { >> System.out.println(".close"); >> >> } >> }; >> return td; >> } >> >> public TermPositions termPositions() throws IOException { >> System.out.println("StringIndexReader.termPositions"); >> return null; >> } >> >> protected void doDelete(int docNum) throws IOException { >> System.out.println("StringIndexReader.doDelete"); >> >> } >> >> protected void doUndeleteAll() throws IOException { >> System.out.println("StringIndexReader.doUndeleteAll"); >> >> } >> >> protected void doCommit() throws IOException { >> System.out.println("StringIndexReader.doCommit"); >> >> } >> >> protected void doClose() throws IOException { >> System.out.println("StringIndexReader.doClose"); >> >> } >> >> public Collection getFieldNames() throws IOException { >> System.out.println("StringIndexReader.getFieldNames"); >> return null; >> } >> >> public Collection getFieldNames(boolean indexed) throws IOException >> { >> System.out.println("StringIndexReader.getFieldNames"); >> return null; >> } >> >> public Collection getIndexedFieldNames(Field.TermVector tvSpec) { >> System.out.println("StringIndexReader.getIndexedFieldNames"); >> return null; >> } >> >> public Collection getFieldNames(FieldOption fldOption) { >> System.out.println("StringIndexReader.getFieldNames"); >> return null; >> } >> >> public static void main(String[] args) { >> IndexReader reader =3D new StringIndexReader(new String[] {"foo", = >> "bar", "baz"}); >> IndexSearcher searcher =3D new IndexSearcher(reader); >> >> Hits hits =3D null; >> try { >> hits =3D searcher.search(new WildcardQuery(new=20 >> Term("field","ba*"))); >> } catch (IOException e) { >> e.printStackTrace(); >> } >> System.out.println("found " + hits.length()); >> } >> } >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org