lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bushman, Lamont" <bus08...@byui.edu>
Subject RE: Compression and Highlighter
Date Mon, 25 Mar 2013 22:02:22 GMT
Thank you very much for the help Simon.  I am amazed I was able to accomplish what I wanted.
 I didn't store the body in the Index.  And I used Highlighter to return the best fragments
by parsing my original document.
________________________________________
From: Simon Willnauer [simon.willnauer@gmail.com]
Sent: Monday, March 25, 2013 4:07 AM
To: java-user@lucene.apache.org
Subject: Re: Compression and Highlighter

On Mon, Mar 25, 2013 at 8:13 AM, Bushman, Lamont <bus08002@byui.edu> wrote:
>     I have a project where I need to index documents using Lucene 4.1.0.  One of the
fields for the stored Document is the actual text from the document(.pdf, .docx, etc.)  I
want to be able to highlight text from the documents  in the search results.  I was looking
at some older tutorials about storing the field with TermVectors and also storing it in the
index with Store.COMPRESS.  However, with Lucene 4.1 they have done away with Store.COMPRESS.
 Is there still a way to compress the field?

Lucene 4.1 uses a compressed stored fields format under the hoods by
default. The compression is completely transparent and enabled by
default. Here is some background:
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

>     I am worried about the amount of space that will be stored in the index if I have
to have the "body" Field stored and uncompressed.
>     Are there ways around having to store the whole Field in its original form?
>     Since I am already going to be storing the actual documents on the server, would
it be feasible (time) to not store TermVectors or Store the field at all until the user searches
for a document.  Then at runtime I can re-index the top docs from the original documents in
RAM and use Highlighter to return fragments?

this is what the highlighter does if you are not using the
FastVectorHighlighter. You can just pass in the string value you wanna
highlight no matter if you stored it in lucene or not. You just need
to see if that works for you performance wise without storing TV.

simon
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message