lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Linto Joseph Mathew" <>
Subject Re: UNIX command-line indexing script?
Date Tue, 16 Mar 2004 11:42:12 GMT

I  have wrote one that will index PDF,DOC,XLS,XML,HTML,TXT and plain/text files. I wrote this
based on demo application and using other 
open soure componets POI by Apache (for doc and exel) and PDFBox. I modified client interface
also. Now its looks like google. Still i have to do a couple of things.
  1) At present i'm using UNIX 'file' command to check it is plain text.
     This will spwan process and take more time. The advantage this is        in unix based
mechines where file extention is not important.( it         uses magic numbers. )
  2) The information such as Index Location, Directory, URL, etc. should      be kept in an
xml file. So that it cam be dynamic.
  3) Categeory 

Since apache guys provided good frame work every thing made easy. Thanks guys!


On Sat, 13 Mar 2004 Charlie Smith wrote :
>Anyone written a simple UNIX command-line indexing script which will read a
>bunch off different kinds of docs and index them?  I'd like to make a cron job
>out of this so as to be able to come back and read it later during a search.
>PERL or JAVA script would be fine.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message