Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 43416 invoked from network); 9 Jan 2010 03:18:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Jan 2010 03:18:42 -0000 Received: (qmail 22656 invoked by uid 500); 9 Jan 2010 03:18:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 22496 invoked by uid 500); 9 Jan 2010 03:18:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 22486 invoked by uid 99); 9 Jan 2010 03:18:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jan 2010 03:18:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 74.125.78.26 as permitted sender) Received: from [74.125.78.26] (HELO ey-out-2122.google.com) (74.125.78.26) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jan 2010 03:18:31 +0000 Received: by ey-out-2122.google.com with SMTP id 4so273337eyf.5 for ; Fri, 08 Jan 2010 19:18:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.90.196 with SMTP id e46mr1226960wef.194.1263007089946; Fri, 08 Jan 2010 19:18:09 -0800 (PST) In-Reply-To: <27084145.post@talk.nabble.com> References: <27084145.post@talk.nabble.com> Date: Fri, 8 Jan 2010 22:18:09 -0500 Message-ID: <359a92831001081918h428870afr29a4d9f7bcd57da8@mail.gmail.com> Subject: Re: Indexing pages and chapters of a book From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6dab0cdf1f149047cb2c0dd --0016e6dab0cdf1f149047cb2c0dd Content-Type: text/plain; charset=ISO-8859-1 Sure, you can add any data to any document that you want, probably stored but not indexed in this case. It could even be a serialized Java object. Or an XML packet or a stringized map. Or... whatever suits your fancy. If it's not indexed, only stored it'll make your index larger but have a negligible impact on search performance. The trick is getting token offsets to put in your meta data. You'll have to get the term positions and store them, but it's do-able. HTH Erick On Fri, Jan 8, 2010 at 7:04 PM, LucasMeadows wrote: > > I have a large number of text files (books) that I am trying to make > searchable with Lucene 2.3.2. > > I would like search results to display the page and chapter in which a > match > with the search term occurred. > > My question is whether it is possible to add structural data (xml perhaps) > to the files so that they can be indexed in a way that captures the > relationship of the terms to the pages and chapters that contain them. > > Many thanks in advance! > -- > View this message in context: > http://old.nabble.com/Indexing-pages-and-chapters-of-a-book-tp27084145p27084145.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e6dab0cdf1f149047cb2c0dd--