lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Linto Joseph Mathew" <lint...@rediffmail.com>
Subject Re: Re: UNIX command-line indexing script?
Date Tue, 30 Mar 2004 14:09:46 GMT
charlie,

i wrote this in java.Ofcourse I am ready to share. But i have some problems when indexing
large volume of data. I am under testing.

Linto


 


On Fri, 26 Mar 2004 Charlie Smith wrote :
>So, Linto,
>
>  Did you write this in PERL or JAVA.  Would you be willing to part with copy of
>source?
>
>
>
> >Linto wrote on 3/16/04
>
> >I  have wrote one that will index PDF,DOC,XLS,XML,HTML,TXT and plain/text
>files. I wrote this based on >demo application and using other
> >open soure componets POI by Apache (for doc and exel) and PDFBox. I modified
>client interface also. Now i>ts looks like google. Still i have to do a couple
>of things.
>  > 1) At present i'm using UNIX 'file' command to check it is plain text.
>   >   This will spwan process and take more time. The advantage this is
>in unix based mechines where file >extention is not important.( it         uses
>magic numbers. )
> >  2) The information such as Index Location, Directory, URL, etc. should
>be kept in an xml file. So that it >cam be dynamic.
> >  3) Categeory
> >
> >
> >Since apache guys provided good frame work every thing made easy. Thanks
>guys!
> >
>
> >Linto
>
>
>
>
>On Sat, 13 Mar 2004 Charlie Smith wrote :
> >Anyone written a simple UNIX command-line indexing script which will read a
> >bunch off different kinds of docs and index them?  I'd like to make a cron
>job
> >out of this so as to be able to come back and read it later during a search.
> >
> >PERL or JAVA script would be fine.
> >
> >
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message