lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad KVSH" <Prasad.Kokep...@ness.com>
Subject RE: lucene-3.0.3
Date Wed, 01 Feb 2012 16:41:27 GMT
Hi 
 
We have added all the files including PDF/Word/Excel/Txt  files but it is searching and finding
which are there text files. How to Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS)

Thanks,
Prasad K.V.S.H. * Project Manager *
PACIFIC COAST STEEL (Pinnacle) Project
Ness Technologies
Road No 11, Banjara Hills, Hyderabad, India.Tel: +91 40 66041401 | Mobile: +91 9247475840

prasad.kokepudi@ness.com <mailto:prasad.kokepudi@ness.com>  | www.ness.com <https://hyd1owa.ness.com/exchweb/bin/redir.asp?URL=http://www.ness.com/>


________________________________

From: KARTHIK SHIVAKUMAR [mailto:nskarthik.k@gmail.com]
Sent: Wed 2/1/2012 7:04 PM
To: java-user@lucene.apache.org
Subject: Re: lucene-3.0.3



Hi

>>lucene-3.0.3 can be used for searching a text from

Lucene 's primary job is to do a text search.

May it be PDF/HTML/XML/MSword/PPT/XLS

U have to have the code for plugin to do 2 things

1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS)
2) Index this processed text using Lucene

The indexed process can be later used for Searching thru the required
content.

;)
with regards
karthik


On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH <Prasad.Kokepudi@ness.com>wrote:

> Hi,
>
>
>
> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc,
> xls, msg, TXT files. For this we have any common function to accomplish
> this. Please help me on this.
>
>
>
> Thanks
>
> Prasad
>
>
>
>


--
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*




Mime
View raw message