lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <>
Subject Re: Not-indexed, Stored Thumbnails or NoSQL?
Date Sun, 02 Dec 2018 11:37:10 GMT

It's perfectly fine to store binary blobs in Lucene. This does not affect performance of queries.
The stored data is also compressed using LZ4.

Just one thing: why the hell UUEncode? You can store binary blobs as is. Just pass a byte[]
as stored field. There is one StoredField constructor to put a byte array. If you get it from
Indexreader it's received as byte array, too. That's the most efficient way to encode it.

No need for a side database.


Am December 2, 2018 9:20:13 AM UTC schrieb Joe MA <>:
>I have an index where I import documents such as powerpoint, PDF, and
>so forth.  One nice feature I added is that for each document, I store
>a thumbnail of the first page as an encoded String (uuencode) using a
>stored,not-indexed field.  This thumbnail gets displayed when the user
>finds a document.   
>I am wondering if, as the size of the index grows to perhaps hundreds
>of thousands if not millions of documents,  how efficient is this?  Is
>it a good idea?
>These encoded strings could be several hundred bytes in size, and of
>course are completely unique for each file indexed, and provide no
>'search' value.  On the surface, it seems like there could be a better
>way to do this given the size, as well as the extra retrieval time for
>Lucene to pull these fields for found documents.
>Since I also have a unique hash for each document in the index, it
>would not be too difficult to set up a separate, independent NoSQL
>key/value store with the thumbnail images, such as MongoDB or similar,
>and then retrieve the thumbnails from that store instead of keeping
>them in the Lucene index.  Does this seem like a better approach? Or is
>Lucene stored field retrieval efficient enough that there would be no
>benefit to doing this?  Any other ideas?
>Thanks in advance,
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Uwe Schindler
Achterdiek 19, 28357 Bremen
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message