lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renaud Delbru (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2886) Adaptive Frame Of Reference
Date Fri, 04 Feb 2011 11:46:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990535#comment-12990535
] 

Renaud Delbru commented on LUCENE-2886:
---------------------------------------

{quote}
In the case of 240 1's, i was surprised to see this selector was used over 2% of the time
for the gov collection's doc file?
{quote}
our results were performed on the wikipedia dataset and blogs dataset. I don;t know what was
our selection rate, I was just referring to the gain in overall compression rate.

{quote}
But still, for the all 1's case I'm not actually thinking about unstructured text so much...
in this case I am thinking about metadata fields and more structured data?
{quote}

Yes, this makes sense. In the context of SIREn (kind of simple xml node based inverted index)
which is meant for indexing semi-structured data, the difference was more observable (mainly
on the frequency and position files, as well as other structure node files).
This might be also useful on the document id file for very common terms (maybe for certain
type of facets, with a very few number of values covering a large portion of the document
collection).

> Adaptive Frame Of Reference 
> ----------------------------
>
>                 Key: LUCENE-2886
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2886
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Codecs
>            Reporter: Renaud Delbru
>             Fix For: 4.0
>
>         Attachments: LUCENE-2886_simple64.patch, LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on the lucene-4.0
branch.
> I am providing the source code of its implementation. Some work needs to be done, as
this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR implementation,
as well as the implementations of PFOR and of Simple64 (simple family codec working on 64bits
word) that has been used in the experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message