jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nilay Parmar <nil...@cybage.com>
Subject RE: Can Lucene be configured to avoid downloading file contents?
Date Wed, 18 Dec 2013 09:59:16 GMT

Try using EmptyParser for those types of document which you want to avoid indexing(document
content) in your tika-config file.

Thanks and regards,
Nilay Parmar

-----Original Message-----
From: Patrick Welfringer [mailto:patrickwelfringer@gmail.com] 
Sent: Wednesday, December 18, 2013 3:21 PM
To: users@jackrabbit.apache.org
Subject: Can Lucene be configured to avoid downloading file contents?


*Can anyone familiar with Lucene please share their insight?*

The question is this: *is there any way to configure Lucene to index only
certain whitelisted metadata*, or exclude blacklisted metadata?

Indeed, we believe that excluding the “file” metadata could dramatically
reduce the time it takes Lucene to download and process the large number of
PDF files in our particular setup.

We don’t need file contents to be indexed, only other metadata like
“creation date”, “keywords” etc.

The “Luke” tool tells us that none of the file contents are indexed. Yet
during the hour long indexing, we see all of the metadata being downloaded
and written to disk, including document contents.

If you can help us find a way to prevent Lucene to index the entire
Jackrabbit repository, you’ll cheer up many mailing list subscribers that
have similar issues!



"Legal Disclaimer: This electronic message and all contents contain information from Cybage
Software Private Limited which may be privileged, confidential, or otherwise protected from
disclosure. The information is intended to be for the addressee(s) only. If you are not an
addressee, any disclosure, copy, distribution, or use of the contents of this message is strictly
prohibited. If you have received this electronic message in error please notify the sender
by reply e-mail to and destroy the original message and all copies. Cybage has taken every
reasonable precaution to minimize the risk of malicious content in the mail, but is not liable
for any damage you may sustain as a result of any malicious content in this e-mail. You should
carry out your own malicious content checks before opening the e-mail or attachment." 
View raw message