lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Notes about webcrawler-LARM contribution
Date Sat, 04 May 2002 14:15:40 GMT

A few notes about webcrawler-LARM contribution, which I just imported
in Lucene Sandbox.  I will put these notes in the contribution's
README.txt later as well.

- This contribution requires:
  a) HTTPClient (not Jakarta's, but this one:
  b) Jakarta ORO package for regular expressions

- The original archive file that I got from Clemens had ORO and
HTTPClient in lib directory.  I don't think we should include those
there, so I took them out.

- This contribution also uses 3rd party (X?)HTML parser, which is
  I am not sure if Clemens' modified this parser in any way.  If not,
maybe we don't have to include it and can instead just add it to the
list of required packages.

- There is no Ant build file yet, just script.
  build.xml for this contribution should be really simple to write.

- The key classes are documented fairly well, less central ones are
not, but Clemens actually told me yesterday that he wants to document
them more.  I got a feel that he wants to do it soon/now.

- Clemens would be happy to use Lucene Sandbox repository for further
development.  I would like to give him access to this repository.  That
will eliminate dealing with diffs, patching, conflicts, etc., and one
of the reasons for having the sandbox is a separate repository was to
allow access to a broader group of developers.  I will send a separate
email asking for +1s.

- Uh, it just occurred to me that I only looked at about a dozen
classes, compiled it, etc., but I have not actually tried running it.
Ooops.  I do get a feeling, from looking at the code, that it will run
as documented.

- This code requires(?) JDK 1.4, as it uses assert keyword.

That's all I can think of for now.
Clemens is subscribed to this list as well, so if you have questions
you can post them here.


Do You Yahoo!?
Yahoo! Health - your guide to health and wellness

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message