lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bushman, Lamont" <>
Subject Compression and Highlighter
Date Mon, 25 Mar 2013 07:13:48 GMT
    I have a project where I need to index documents using Lucene 4.1.0.  One of the fields
for the stored Document is the actual text from the document(.pdf, .docx, etc.)  I want to
be able to highlight text from the documents  in the search results.  I was looking at some
older tutorials about storing the field with TermVectors and also storing it in the index
with Store.COMPRESS.  However, with Lucene 4.1 they have done away with Store.COMPRESS.  Is
there still a way to compress the field?
    I am worried about the amount of space that will be stored in the index if I have to have
the "body" Field stored and uncompressed.
    Are there ways around having to store the whole Field in its original form?
    Since I am already going to be storing the actual documents on the server, would it be
feasible (time) to not store TermVectors or Store the field at all until the user searches
for a document.  Then at runtime I can re-index the top docs from the original documents in
RAM and use Highlighter to return fragments?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message