lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Honey George <honey_geo...@yahoo.com>
Subject Re: Search PDF ???
Date Mon, 25 Oct 2004 05:30:07 GMT

 --- Eric Chow <eric138@gmail.com> wrote: 
> Hello,
> 
> 1. Is it possibleto use Lucene to search PDF
> contents ?
Yes, you need to use some external tools to extract
the text from the PDF file and then pass it to lucene
for indexing. If you do a search of this list you will
get lot of mails related to that.
 
> 2. Can it search Chinese contents PDF files ???
I have used a tool called xpdf (in linux) and it works
with both chinese traditional and chinese simplified.
It gives language support packages for many of the
languages. Please take a look at the URL below.
http://www.foolabs.com/xpdf/download.html

Now the tool only helps in extracting the text.
Whether you can search chinese text or not depends on
the analyzer you use in Lucene. Try CJKAnalyzer for
CJK text search.

Thanks,
  George


	
	
		
___________________________________________________________ALL-NEW Yahoo! Messenger - all
new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message