lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Koch <>
Subject Re: Introduction to PyLucene Community and some doubts
Date Tue, 11 Jun 2013 15:11:23 GMT
I suggest you have a look at Apache TIKA:

You can easily call a "java -jar tika.jar" command via python tools like os.popen and convert
files in various formats to text.

There's even a python wrapper based on JCC but I'm not sure if that's still maintained:

Am 11.06.2013 um 12:05 schrieb Vishrut Mehta <>:

> Hello Everyone,
>                I am Vishrut Mehta, currently a third year students at IIIT
> Hyderabad, India. I have been contributing to Open Source since two years
> and also have contributed to organizations like E-cidadania, Sahana
> Software Foundation, Gnome, etc. I am very interested in Search engines and
> search related libraries.
>               I need some help from the community, I am currently working
> on a project which deals with the follow issue - Need to search within any
> uploaded documents(like .pdf, .doc, etc) from the user    and need to
> search text or strings within those documents. Can anyone help me for this,
> it would be a great help ?!
> Thanks You!
> Regards,
> -- 
> *Vishrut Mehta*
> International Institute of Information Technology,
> Gachibowli,Hyderabad-500032

View raw message