jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Welfringer <patrickwelfrin...@gmail.com>
Subject Can Lucene be configured to avoid downloading file contents?
Date Wed, 18 Dec 2013 09:51:06 GMT

*Can anyone familiar with Lucene please share their insight?*

The question is this: *is there any way to configure Lucene to index only
certain whitelisted metadata*, or exclude blacklisted metadata?

Indeed, we believe that excluding the “file” metadata could dramatically
reduce the time it takes Lucene to download and process the large number of
PDF files in our particular setup.

We don’t need file contents to be indexed, only other metadata like
“creation date”, “keywords” etc.

The “Luke” tool tells us that none of the file contents are indexed. Yet
during the hour long indexing, we see all of the metadata being downloaded
and written to disk, including document contents.

If you can help us find a way to prevent Lucene to index the entire
Jackrabbit repository, you’ll cheer up many mailing list subscribers that
have similar issues!



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message