lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@gmail.com>
Subject Re: Compression and Highlighter
Date Tue, 26 Mar 2013 08:53:27 GMT
^5 ;)

On Mon, Mar 25, 2013 at 11:02 PM, Bushman, Lamont <bus08002@byui.edu> wrote:
> Thank you very much for the help Simon.  I am amazed I was able to accomplish what I
wanted.  I didn't store the body in the Index.  And I used Highlighter to return the best
fragments by parsing my original document.
> ________________________________________
> From: Simon Willnauer [simon.willnauer@gmail.com]
> Sent: Monday, March 25, 2013 4:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: Compression and Highlighter
>
> On Mon, Mar 25, 2013 at 8:13 AM, Bushman, Lamont <bus08002@byui.edu> wrote:
>>     I have a project where I need to index documents using Lucene 4.1.0.  One of
the fields for the stored Document is the actual text from the document(.pdf, .docx, etc.)
 I want to be able to highlight text from the documents  in the search results.  I was looking
at some older tutorials about storing the field with TermVectors and also storing it in the
index with Store.COMPRESS.  However, with Lucene 4.1 they have done away with Store.COMPRESS.
 Is there still a way to compress the field?
>
> Lucene 4.1 uses a compressed stored fields format under the hoods by
> default. The compression is completely transparent and enabled by
> default. Here is some background:
> http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene
>
>>     I am worried about the amount of space that will be stored in the index if I
have to have the "body" Field stored and uncompressed.
>>     Are there ways around having to store the whole Field in its original form?
>>     Since I am already going to be storing the actual documents on the server, would
it be feasible (time) to not store TermVectors or Store the field at all until the user searches
for a document.  Then at runtime I can re-index the top docs from the original documents in
RAM and use Highlighter to return fragments?
>
> this is what the highlighter does if you are not using the
> FastVectorHighlighter. You can just pass in the string value you wanna
> highlight no matter if you stored it in lucene or not. You just need
> to see if that works for you performance wise without storing TV.
>
> simon
>>
>> Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message