Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 17877 invoked from network); 4 May 2002 14:15:43 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 4 May 2002 14:15:43 -0000 Received: (qmail 14419 invoked by uid 97); 4 May 2002 14:15:43 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 14319 invoked by alias); 4 May 2002 14:15:42 -0000 Delivered-To: jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 14305 invoked by uid 97); 4 May 2002 14:15:42 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 14293 invoked by uid 98); 4 May 2002 14:15:41 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Message-ID: <20020504141540.90758.qmail@web12702.mail.yahoo.com> Date: Sat, 4 May 2002 07:15:40 -0700 (PDT) From: Otis Gospodnetic Subject: Notes about webcrawler-LARM contribution To: lucene-dev@jakarta.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hello, A few notes about webcrawler-LARM contribution, which I just imported in Lucene Sandbox. I will put these notes in the contribution's README.txt later as well. - This contribution requires: a) HTTPClient (not Jakarta's, but this one: http://www.innovation.ch/java/HTTPClient/ b) Jakarta ORO package for regular expressions - The original archive file that I got from Clemens had ORO and HTTPClient in lib directory. I don't think we should include those there, so I took them out. - This contribution also uses 3rd party (X?)HTML parser, which is included. I am not sure if Clemens' modified this parser in any way. If not, maybe we don't have to include it and can instead just add it to the list of required packages. - There is no Ant build file yet, just build.sh script. build.xml for this contribution should be really simple to write. - The key classes are documented fairly well, less central ones are not, but Clemens actually told me yesterday that he wants to document them more. I got a feel that he wants to do it soon/now. - Clemens would be happy to use Lucene Sandbox repository for further development. I would like to give him access to this repository. That will eliminate dealing with diffs, patching, conflicts, etc., and one of the reasons for having the sandbox is a separate repository was to allow access to a broader group of developers. I will send a separate email asking for +1s. - Uh, it just occurred to me that I only looked at about a dozen classes, compiled it, etc., but I have not actually tried running it. Ooops. I do get a feeling, from looking at the code, that it will run as documented. - This code requires(?) JDK 1.4, as it uses assert keyword. That's all I can think of for now. Clemens is subscribed to this list as well, so if you have questions you can post them here. Otis __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com -- To unsubscribe, e-mail: For additional commands, e-mail: