lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Smith" <>
Subject Re: UNIX command-line indexing script?
Date Fri, 26 Mar 2004 17:40:46 GMT
So, Linto,

 Did you write this in PERL or JAVA.  Would you be willing to part with copy of

>Linto wrote on 3/16/04

>I  have wrote one that will index PDF,DOC,XLS,XML,HTML,TXT and plain/text
files. I wrote this based on >demo application and using other 
>open soure componets POI by Apache (for doc and exel) and PDFBox. I modified
client interface also. Now i>ts looks like google. Still i have to do a couple
of things.
 > 1) At present i'm using UNIX 'file' command to check it is plain text.
  >   This will spwan process and take more time. The advantage this is       
in unix based mechines where file >extention is not important.( it         uses
magic numbers. )
>  2) The information such as Index Location, Directory, URL, etc. should     
be kept in an xml file. So that it >cam be dynamic.
>  3) Categeory 
>Since apache guys provided good frame work every thing made easy. Thanks


On Sat, 13 Mar 2004 Charlie Smith wrote :
>Anyone written a simple UNIX command-line indexing script which will read a
>bunch off different kinds of docs and index them?  I'd like to make a cron
>out of this so as to be able to come back and read it later during a search.
>PERL or JAVA script would be fine.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message