Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 78801 invoked from network); 12 Jun 2002 06:26:47 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 12 Jun 2002 06:26:47 -0000 Received: (qmail 20755 invoked by uid 97); 12 Jun 2002 06:26:58 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 20705 invoked by uid 97); 12 Jun 2002 06:26:57 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 20693 invoked by uid 98); 12 Jun 2002 06:26:56 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Message-ID: <002301c211da$27f33c30$8bd38118@mycomputer> From: "Chris Sibert" To: "Lucene Users List" References: <20020612021250.11658.qmail@web12707.mail.yahoo.com> Subject: Creating indexes Date: Wed, 12 Jun 2002 02:26:58 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I have a big ( 40 MB or so) file to index. The file contains a whole bunch of documents, which are each pretty small, about a few typewritten pages long. There's a title, date, and author for each document, in addition to the documents' actual text. I'm not quite sure how you index this in Lucene. For each document in the original file, I assume that I create a separate Lucene Document object in the index with author, date, title, and text fields. If so, my question is that when I'm reading in the original file for indexing, does Lucene know where each document begins and ends in the original file ? Or do I have to write a parser or filter or something for the InputStream that's reading the file ? Chris Sibert -- To unsubscribe, e-mail: For additional commands, e-mail: