lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Kumar K <prade...@robosoftin.com>
Subject Re: indexing and searching different file formats
Date Thu, 14 Feb 2002 10:58:35 GMT
Thanks  a lot Andy.
-Pradeep

On Wednesday, February 13, 2002, at 09:50 PM, Andrew Libby wrote:

>
> Pradeep,
>     Currently Lucene does not provide the ability to convert documents
> to text for indexing.  There is talk of adding this kind of thing to the
> goal of the project, along with providing crawlers to traverse web,
> local disk, ftp, and RDBMS sources of data.
>
> The problem with indexining irrespective of file type is that each 
> document
> format contains embedded information that must be stripped out (or 
> ignored)
> and the text needs to be retrieved for indexing.  An extreeme example is
> a PDF which has a considerably complicated document format.
>
> On the contributions page there are some pointers that may provide 
> information
> about processing the types of documents you're interested in.
>
> http://jakarta.apache.org/lucene/docs/contributions.html
>
> If you've not taken the time to do so, look at the FAQs, they are very
> informative:
>
> http://www.lucene.com/cgi-bin/faq/faqmanager.cgi
> http://jakarta.apache.org/lucene/docs/gettingstarted.html
> http://www.jguru.com/faq/Lucene
>
> Good luck!
>
> Andy
>
>
> On Wed, Feb 13, 2002 at 09:24:33PM +0530, Pradeep Kumar K wrote:
>> Hi Lucene friends!
>>
>>    How the files of different format can be indexed and searched? 
>> ( As I
>> know lucene is having HTML indexer and searcher, which comes along with
>> it and also XML indexer, but is there any way to index files
>> irrespective of the file type)
>> Any suggestions will be greatly appreciated..
>>
>> Thanks in advance.
>> Pradeep
>>
>>
>> --------------------------------------------------------------
>> Robosoft Technologies, Mangalore, India
>>
>>
>>
>> --
>> To unsubscribe, e-mail:   <mailto:lucene-user-
>> unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail: <mailto:lucene-user-
>> help@jakarta.apache.org>
>>
>
> --
> --------------------------------------------------
> Andrew Libby
> CommNav, Inc
> alibby@commnav.com
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
> help@jakarta.apache.org>
>


--------------------------------------------------------------
Robosoft Technologies, Mangalore, India



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message