Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 28762 invoked from network); 17 Nov 2004 15:32:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 17 Nov 2004 15:32:30 -0000 Received: (qmail 31563 invoked by uid 500); 17 Nov 2004 15:32:28 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 31539 invoked by uid 500); 17 Nov 2004 15:32:28 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 31526 invoked by uid 99); 17 Nov 2004 15:32:27 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [195.53.213.253] (HELO correo.iberia.es) (195.53.213.253) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 17 Nov 2004 07:32:26 -0800 Received: from lmzimc3.ib ([192.168.26.10]) by correosmtp2 with InterScan Messaging Security Suite; Wed, 17 Nov 2004 16:25:27 +0100 Received: by lmzimc3.ib with Internet Mail Service (5.5.2653.19) id ; Wed, 17 Nov 2004 16:32:20 +0100 Message-ID: From: "PROYECTA.Fernandez Garcia, Ivan" To: Lucene Developers List Subject: RE: Queries Lucene 1.3 Date: Wed, 17 Nov 2004 16:32:19 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N First of all, Xiaozheng thanks for your attention. I have tested it but we have no results. I explain in detail: We would like search text in a pdf file. I think we must index the content of each page to search text, = isn=B4t it? So we must use sentence document.add(Field.Text()). isn=B4t it? We search text using following sentences: Query q =3D QueryParser.parse(m_texto + "*", CValoresGlobales.M_CONTENIDO_PAGINA, analizador); q =3D q.rewrite(indexReader); hits =3D searcher.search(q); is O.K.? Tnaks for your help. -----Mensaje original----- De: Xiaozheng Ma [mailto:Xiaozheng.Ma@redwood.com] Enviado el: mi=E9rcoles, 17 de noviembre de 2004 16:21 Para: Lucene Developers List Asunto: RE: Queries Lucene 1.3 I used the following to index and it works fine. document.add(Field.Text("author", ifile.getAuthor())); document.add(Field.Text("title", ifile.getTitle())); document.add(Field.Text("extension", ifile.getExtension())); -----Original Message----- From: PROYECTA.Fernandez Garcia, Ivan = [mailto:proyecta.ifernandez@iberia.es] Sent: Wednesday, November 17, 2004 10:08 AM To: Lucene Developers List Subject: RE: Queries Lucene 1.3 If we don=B4t update IndexWriter.minMergeDocs attribute, Lucene not = found anything (We don=B4t know why?) When we change value for IndexWriter.minMergeDocs attribute and file = has a lot of pages. OutofMemory Exception ocurred. -----Mensaje original----- De: Xiaozheng Ma [mailto:Xiaozheng.Ma@redwood.com] Enviado el: mi=E9rcoles, 17 de noviembre de 2004 15:59 Para: Lucene Developers List Asunto: RE: Queries Lucene 1.3 A bit confused if the first problem is solved (i.e. the break point at = 10). For Out of memory exception(OOME), You need to increase the JVM MAX = momoery size. IF you use tomcat 5, run tomcat5w.exe to reset this value ( or do = it by editing registry, or if you wish change JAVA_OPTIONs of the = carolina.bat or Carolina.sh). Hope this works. Xiaozheng=20 =20 -----Original Message----- From: PROYECTA.Fernandez Garcia, Ivan = [mailto:proyecta.ifernandez@iberia.es] Sent: Wednesday, November 17, 2004 9:49 AM To: lucene-dev@jakarta.apache.org Subject: Queries Lucene 1.3 Good afternoon everybody, First of all thanks for your attention. We are using Lucene1.3 api to index and search text in pdf files. We have two environment to develop with it: Windows, using Apache Tomcat 5.0 and Sun Solaris, using Oracle Aplication Server. First we extract text pages from pdf file using Multivalent API (this process seems run O.K.). Then we search text in new index created before. At this moment we have the following problem: - If pdf file number page is 10, text is found. - If pdf file number page is more than 10, text is not found. We modify IndexWriter.minMergeDocs attribute assign two values: Total number document pages and "1" value. In both cases: - if document is not big, index process seems run O.K. and text search seems run O.K. - if document is big (600 pages), index process run K.O raising OutofMemory exception. We send you our source code file where index a pdf file and search text if you can see some error. We don=B4t know what more have we do with this problem. Can you help us , please? Thanks you for your help. <> <>=20 > Iv=E1n Fern=E1ndez Garc=EDa > Proyecta Sistemas de Informaci=F3n >=20 >=20 >=20 >=20 >=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 =20 ---------------------------------------------- Has decidido el mejor precio. Has decidido IBERIA.com=20 You=B4ve chosen the best price. You=B4ve chosen IBERIA.com=20 ---------------------------------------------- http://www.iberia.com=20 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 =20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 =20 ---------------------------------------------- Has decidido el mejor precio. Has decidido IBERIA.com=20 You=B4ve chosen the best price. You=B4ve chosen IBERIA.com=20 ---------------------------------------------- http://www.iberia.com=20 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 =20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 =20 ---------------------------------------------- Has decidido el mejor precio. Has decidido IBERIA.com=20 You=B4ve chosen the best price. You=B4ve chosen IBERIA.com=20 ---------------------------------------------- http://www.iberia.com=20 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org