lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Honey George <honey_geo...@yahoo.com>
Subject Re: searchhelp
Date Thu, 19 Aug 2004 10:31:16 GMT
Hi,
  Note that Lucene only provides an API to build a
search engine you can use it how ever you want it. You
can pass data to indexing in 2 forms.
1. java.lang.String
2. java.io.Reader

What Lucene recieves is any of the two objects above.
Now in the case of non-text documents you need to
extract the text information from the documents and
either create as a text file and convert to a Reader
object or creat a String object (for small files). 

For indexing database contents, you need to write your
own APIs to get data from the database (using JDBC/EJB
etc), convert the data to a String object and pass it
to Lucene for indexing.

Again Lucene is not responsible for getting the data
from your application. It only indexed the data given
it to you.

Also for extracting contents from pdf & doc
files(generally known as straining) I know of 2 more
tools
wvWare -> for word documents
pdftotext(xpdf) -> for pdf documents.

Google around and you will get lot of links.

Hope this helps.

Thanks,
   George

 --- Santosh <santosh.s@softprosys.com> wrote: 
> I am recently joined into list, I didnt gone through
> any previous mails, if
> you have any mails or related code please forward it
> to me
> ----- Original Message -----
> From: "Chandan Tamrakar" <chandan@ccnep.com.np>
> To: "Lucene Users List"
> <lucene-user@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 3:47 PM
> Subject: Re: searchhelp
> 
> 
> > For PDF you need to extract a text from pdf files
> using pdfbox library
> and
> > for word documents u can use apache POI api's .
> There are messages
> > posted on the  lucene list related to your
> queries. About database ,i
> guess
> > someone must have done it . :)
> >
> > ----- Original Message -----
> > From: "Santosh" <santosh.s@softprosys.com>
> > To: <lucene-user@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 3:58 PM
> > Subject: searchhelp
> >
> >
> > Hi,
> >
> > I am using lucene search engine for my
> application.
> >
> > i am able to search through the text files and
> htmls as specified by
> lucene
> >
> > can you please clarify my doubts
> >
> > 1.can lucene search through pdfs and word
> documents? if yes then how?
> >
> > 2.can lucene search through database ? if yes then
> how?
> >
> > thankyou
> >
> > santosh
> >
> >
> > -----------------------SOFTPRO
> DISCLAIMER------------------------------
> >
> > Information contained in this E-MAIL and any
> attachments are
> > confidential being  proprietary to SOFTPRO SYSTEMS
>  is 'privileged'
> > and 'confidential'.
> >
> > If you are not an intended or authorised recipient
> of this E-MAIL or
> > have received it in error, You are notified that
> any use, copying or
> > dissemination  of the information contained in
> this E-MAIL in any
> > manner whatsoever is strictly prohibited. Please
> delete it immediately
> > and notify the sender by E-MAIL.
> >
> > In such a case reading, reproducing, printing or
> further dissemination
> > of this E-MAIL is strictly prohibited and may be
> unlawful.
> >
> > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT
> that an attachment
> > hereto is free from computer viruses or other
> defects.
> >
> > The opinions expressed in this E-MAIL and any
> ATTACHEMENTS may be
> > those of the author and are not necessarily those
> of SOFTPRO SYSTEMS.
> >
>
------------------------------------------------------------------------
> >
> >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 
>  


	
	
		
___________________________________________________________ALL-NEW Yahoo! Messenger - all
new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message