lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] [Updated] (SOLR-1929) Index encrypted files
Date Tue, 26 Jun 2012 23:31:44 GMT


Jan Høydahl updated SOLR-1929:

    Attachment: SOLR-1929.patch

Updated patch for trunk which utilizes the new Tika feature in TIKA-850. Contains a RegexRulesPasswordProvider
backed by regex rules file and/or explicit password.

New solr cell request params:
* resource.password - explicit password for this file
* passwordsFile - name of property file with list of known passwords based on filename regex.
Loaded using ResourceLoader

Note that Tika currently support passwords for PDF and DOCX files, not legacy DOC files or
any other type. I tried to decrypt the existing test file password-is-solrcell.docx but it
fails due to unsupported enctyption method in Apache POI.

In order to apply this patch and have tests pass, you also need to add two binary files by
unzipping in project root.
> Index encrypted files
> ---------------------
>                 Key: SOLR-1929
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Yiannis Pericleous
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: 4.0, 5.0
>         Attachments: SOLR-1929.patch, SOLR-1929.patch
> SolrCell should be able to index encrypted files (pdfs, word docs).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message