lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6764) Payloads should be compressed
Date Fri, 28 Aug 2015 18:16:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720355#comment-14720355
] 

Paul Elschot commented on LUCENE-6764:
--------------------------------------

Alternatively the html structure could be indexed, see LUCENE-5627.
Then one can query the structure and add the weights to the query.


> Payloads should be compressed
> -----------------------------
>
>                 Key: LUCENE-6764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> I think we should at least try to do something simple, eg. deduplicate or apply simple
LZ77 compression. For instance if you use enclosing html tags to give different weights to
individual terms, there might be lots of repetitions as there are not that many unique html
tags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message