lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cmarsch...@apache.org
Subject cvs commit: jakarta-lucene-sandbox/contributions/webcrawler-LARM TODO.txt
Date Fri, 11 Apr 2003 14:33:31 GMT
cmarschner    2003/04/11 07:33:31

  Modified:    contributions/webcrawler-LARM TODO.txt
  Log:
  fixed build process
  
  Revision  Changes    Path
  1.3       +4 -4      jakarta-lucene-sandbox/contributions/webcrawler-LARM/TODO.txt
  
  Index: TODO.txt
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/webcrawler-LARM/TODO.txt,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- TODO.txt	18 Jun 2002 11:39:51 -0000	1.2
  +++ TODO.txt	11 Apr 2003 14:33:30 -0000	1.3
  @@ -7,11 +7,13 @@
   solved:
   -----------------------------------------------------------------------------------------------
   
  +
   Bugs:
   	- some relative URLs are not appended appropriately, leading to wrong and growing URLs
   	  - 301/302 URLs were not updated: the docs were saved under the old URL, which lead to
   	    wrong relative URLs (cmarschner, 2002-06-17)
  -
  +    - fixed build.xml
  +    
   URLs: 
   	- include a URLNormalizer
   	  * lowercase host names
  @@ -35,8 +37,6 @@
   	  probably this will be solved by changing from HTTPClient.* to Jakarta HTTP client and
reuse sockets
   
   
  -* Build
  -	- added build.xml, but build.bat and build.sh are still working without ANT. Change that.
   
   * LuceneStorage
   	- define a configurable interface that saves fetched pages into a Lucene index
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message