Return-Path: Delivered-To: apmail-xml-cocoon-users-archive@xml.apache.org Received: (qmail 14473 invoked by uid 500); 18 Mar 2003 10:32:29 -0000 Mailing-List: contact cocoon-users-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: cocoon-users@xml.apache.org Delivered-To: mailing list cocoon-users@xml.apache.org Received: (qmail 14456 invoked from network); 18 Mar 2003 10:32:28 -0000 Received: from anchor-post-33.mail.demon.net (194.217.242.91) by daedalus.apache.org with SMTP; 18 Mar 2003 10:32:28 -0000 Received: from media.demon.co.uk ([80.177.14.141]) by anchor-post-33.mail.demon.net with esmtp (Exim 3.35 #1) id 18vEOf-000Fvs-0X for cocoon-users@xml.apache.org; Tue, 18 Mar 2003 10:32:41 +0000 Date: Tue, 18 Mar 2003 10:32:39 +0000 Subject: Re: Lucene index building Content-Type: text/plain; delsp=yes; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v551) From: Jeremy Quinn To: cocoon-users@xml.apache.org Content-Transfer-Encoding: 7bit In-Reply-To: <3E765EA4.30454.A9F353@localhost> Message-Id: X-Mailer: Apple Mail (2.551) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Monday, March 17, 2003, at 11:47 PM, Upayavira wrote: > I have built a site which I want to index with Lucene. > > I am using the create-index.xsp file in the $COCOON-ROOT/search > directory to > build my index. > > I have added the following to cocoon.xconf: > > > .*/search/.* > cocoon-view=lucene-links > > > > body > cocoon-view=lucene-content > > This all looks fine My exclude string looks like this though : .*\.png$,.*\.js$,.*\.css$,.*\.gif$,.*\.jpg$,.*/search/.*,.*/ easy/.* I believe as soon as you specify an exclude string, the default values for images etc. are not used. > I've set up a view lucene-links which works, giving back just links > from a page. > I've set up a view lucene-content just giving back the content. The > content is like: > > ....list of links > ... the body content ... > > > I have had it partially working (indexing both links and body), but > now whenever I > run create-index, it fails with a Cannot parse!: > org.xml.sax.SAXParseException: > Premature end of file. > > Any ideas what I might be doing wrong? I got problems like this, it turned out to be pages that did not return valid xml. Look in your logs to see if indexing stops on a particular url. I also found that I could overcome the need to provide more memory by stripping un-needed tags from my 'content' xml being indexed. My content for indexing looks like this: title gets stored, then displayed with hit summary gets stored, then displayed with hit all of my body content with tags stripped out Hope this helps regards Jeremy --------------------------------------------------------------------- To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org For additional commands, e-mail: cocoon-users-help@xml.apache.org