Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 40375 invoked from network); 23 Sep 2004 17:06:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 23 Sep 2004 17:06:42 -0000 Received: (qmail 49316 invoked by uid 500); 23 Sep 2004 17:08:40 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 49277 invoked by uid 500); 23 Sep 2004 17:08:39 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 49262 invoked by uid 99); 23 Sep 2004 17:08:38 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=NO_REAL_NAME X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [209.10.150.72] (HELO web02-nyc.clicvu.com) (209.10.150.72) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 23 Sep 2004 10:08:37 -0700 Received: from [192.168.0.72] by web02-nyc.clicvu.com (Post.Office MTA v3.5.3 release 223 ID# 0-64039U1000L100S0V35) with SMTP id com for ; Thu, 23 Sep 2004 12:06:06 -0400 Received: from giant-pandas.net (localhost [127.0.0.1]) by host21.the-web-host.com (8.12.10/8.12.9) with ESMTP id i8NGAEDA012558 for <*Email Address Suppressed*>; Thu, 23 Sep 2004 12:10:14 -0400 From: roy-lucene-user@xemaps.com To: lucene-user@jakarta.apache.org Subject: Re: demo HTML parser question Date: Thu, 23 Sep 2004 11:10:14 -0500 Message-Id: <20040923161014.M5760@giant-pandas.net> In-Reply-To: <6.0.1.1.2.20040922223354.03fffb18@fast.synernet.com> References: <6.0.1.1.2.20040922223354.03fffb18@fast.synernet.com> X-Mailer: Open WebMail 1.81 20021127 X-OriginatingIP: 209.21.98.168 (roy) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Fred, We were originally attempting to use the demo html parser (Lucene 1.2), but as you know, its for a demo. I think its threaded to optimize on time, to allow the calling thread to grab the title or top message even though its not done parsing the entire html document. That's just a guess, I would love to hear from others about this. Anyway, since it is a separate thread, a token error could kill it and there is no way for the calling thread to know about it. We had to create our own html parser since we only cared about grabbing the entire text from the html document and also we wanted to avoid the extra thread. We also do a lot of "SKIP"ping for minimal EOF errors (html documents in email almost never follow standards). For your html needs, you might want to check out other JavaCC HTML parsers from the JavaCC web site. Roy. On Wed, 22 Sep 2004 22:42:55 -0400, Fred Toth wrote > Hi, > > I've been working with the HTML parser demo that comes with > Lucene and I'm trying to understand why it's multi-threaded, > and, more importantly, how to exit gracefully on errors. > > I've discovered if I throw an exception in the front-end static > code (main(), etc.), the JVM hangs instead of exiting. Presumably > this is because there are threads hanging around doing something. > But I'm not sure what! > > Any pointers? I just want to exit gracefully on an error such as > a required meta tag is missing or similar. > > Thanks, > > Fred > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org