lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renaud Delbru (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2886) Adaptive Frame Of Reference
Date Fri, 04 Feb 2011 12:06:28 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990538#comment-12990538
] 

Renaud Delbru edited comment on LUCENE-2886 at 2/4/11 12:05 PM:
----------------------------------------------------------------

Just an additional comment on semi-structured data indexing. AFOR-2 and AFOR-3 (AFOR-3 refers
to AFOR-2 with special code for allOnes frames), was able to beat Rice on two datasets, and
S-64 on one (but it was very close to Rice on the others):

DBpedia dataset: (structured version of wikipedia)

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.246|0.043|0.141|0.065|0.180|0.816|
|AFOR-2|0.229|0.039|0.132|0.059|0.167|0.758|
|AFOR-3|0.229|0.031|0.131|0.054|0.159|0.736|
|FOR|0.315|0.061|0.170|0.117|0.216|1.049|
|PFOR|0.317|0.044|0.155|0.070|0.205|0.946|
|Rice|0.240|0.029|0.115|0.057|0.152|0.708|
|S-64|0.249|0.041|0.133|0.062|0.171|0.791|
|VByte|0.264|0.162|0.222|0.222|0.245|1.335|

Geonames Dataset: 

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.129|0.023|0.058|0.025|0.025|0.318|
|AFOR-2|0.123|0.023|0.057|0.024|0.024|0.307|
|AFOR-3|0.114|0.006|0.056|0.016|0.008|0.256|
|FOR|0.150|0.021|0.065|0.025|0.023|0.349|
|PFOR|0.154|0.019|0.057|0.022|0.023|0.332|
|Rice|0.133|0.019|0.063|0.029|0.021|0.327|
|S-64|0.147|0.021|0.058|0.023|0.023|0.329|
|VByte|0.216|0.142|0.143|0.143|0.143|0.929|

Sindice Dataset: Very heterogeneous dataset containing hundred of thousands of web dataset

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|2.578|0.395|0.942|0.665|1.014|6.537|
|AFOR-2|2.361|0.380|0.908|0.619|0.906|6.082|
|AFOR-3|2.297|0.176|0.876|0.530|0.722|5.475|
|FOR|3.506|0.506|1.121|0.916|1.440|8.611|
|PFOR|3.221|0.374|1.153|0.795|1.227|7.924|
|Rice|2.721|0.314|0.958|0.714|0.941|6.605|
|S-64|2.581|0.370|0.917|0.621|0.908|6.313|
|VByte|3.287|2.106|2.411|2.430|2.488|15.132|

Here, Ent refers to entity id (similar to doc id), Att and Val are structural node ids.

      was (Author: renaud.delbru):
    Just an additional comment on semi-structured data indexing. AFOR-2 and AFOR-3 (AFOR-3
refers to AFOR-2 with special code for allOnes frames), was able to beat Rice on two datasets,
and S-64 on one (but it was very close to Rice on the others):

DBpedia dataset: (structured version of wikipedia)

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.246|0.043|0.141|0.065|0.180|0.816|
|AFOR-2|0.229|0.039|0.132|0.059|0.167|0.758|
|AFOR-3|0.229|0.031|0.131|0.054|0.159|0.736|
|FOR|0.315|0.061|0.170|0.117|0.216|1.049|
|PFOR|0.317|0.044|0.155|0.070|0.205|0.946|
|Rice|0.240|0.029|0.115|0.057|0.152|0.708|
|S-64|0.249|0.041|0.133|0.062|0.171|0.791|
|VByte|0.264|0.162|0.222|0.222|0.245|1.335|

Geonames Dataset: 

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|0.129|0.023|0.058|0.025|0.025|0.318|
|AFOR-2|0.123|0.023|0.057|0.024|0.024|0.307|
|AFOR-3|0.114|0.006|0.056|0.016|0.008|0.256|
|FOR|0.150|0.021|0.065|0.025|0.023|0.349|
|PFOR|0.154|0.019|0.057|0.022|0.023|0.332|
|Rice|0.133|0.019|0.063|0.029|0.021|0.327|
|S-64|0.147|0.021|0.058|0.023|0.023|0.329|
|VByte|0.264|0.162|0.222|0.222|0.245|1.335|

Sindice Dataset: Very heterogeneous dataset containing hundred of thousands of web dataset

||Method||Ent||Frq||Att||Val||Pos||Total||
|AFOR-1|2.578|0.395|0.942|0.665|1.014|6.537|
|AFOR-2|2.361|0.380|0.908|0.619|0.906|6.082|
|AFOR-3|2.297|0.176|0.876|0.530|0.722|5.475|
|FOR|3.506|0.506|1.121|0.916|1.440|8.611|
|PFOR|3.221|0.374|1.153|0.795|1.227|7.924|
|Rice|2.721|0.314|0.958|0.714|0.941|6.605|
|S-64|2.581|0.370|0.917|0.621|0.908|6.313|
|VByte|3.287|2.106|2.411|2.430|2.488|15.132|

Here, Ent refers to entity id (similar to doc id), Att and Val are structural node ids.
  
> Adaptive Frame Of Reference 
> ----------------------------
>
>                 Key: LUCENE-2886
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2886
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Codecs
>            Reporter: Renaud Delbru
>             Fix For: 4.0
>
>         Attachments: LUCENE-2886_simple64.patch, LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on the lucene-4.0
branch.
> I am providing the source code of its implementation. Some work needs to be done, as
this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR implementation,
as well as the implementations of PFOR and of Simple64 (simple family codec working on 64bits
word) that has been used in the experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message