lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: solr 4 tika config
Date Wed, 17 Oct 2012 13:56:50 GMT
Hi,

Try the new post.jar in version 4.0.0

It will allow you to say
java -Dauto -Drecursive -Dfiletypes=ppt -jar post.jar "d:\myfiles" 

You can inspect your Solr log file to see what ExtractingRequestHandler URLs are actually
called for each

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

16. okt. 2012 kl. 13:14 skrev cmd.ares <cmd.ares@gmail.com>:

> I want to index all pdf files in "d:\myfiles\*.*" 
> file fullname as the field id
> file content as the field txt
> the index should be like this:
> 
> ---------id-------------------------------txt--------------
> d:\myfiles\0.pdf                    aaaaaaaaaaaaaaaaaaaaaaaaa
> d:\myfiles\subfolder1\1.pdf     bbbbbbbbbbbbbbbbbbbbbbbbb
> d:\myfiles\subfolder2\2.pdf     ccccccccccccccccccccccccc
> 
> how to config dih?
> thanks.
> 
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/solr-4-tika-config-tp4013947.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message