lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-475) multi-valued faceting via un-inverted field
Date Sun, 23 Nov 2008 23:03:44 GMT

    [ https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650071#action_12650071
] 

Yonik Seeley commented on SOLR-475:
-----------------------------------

Some further results on a bigger index to show some practical limits.
This table (JIRA markup format) shows the performance and memory characteristics of facet
requests on a 50M document index, for different fields and different numbers of documents
being counted in the base query.

|| ||f10_100_t||f100_10_t||f1000_5_t||f10000_5_t||f100000_5_t||f100000_10_t
|field inversion time (sec)|	17.2|	17.9|	69.4|	87.8|	133.6|	388.0
|inverted field size (MB)|	68.1|	629.6|	416.9|	479.0|	589.9|	807.4
|1000 docs facet time (ms)|	7|	20|	13	|13	|16	|17
|10,000 docs|	55|	428	|22|	23	|29|	28
|100,000 docs|	54	|421|	35	|36	|46	|56
|1,000,000 docs|	55|	431	|149	|155|	249	|307
|10,000,000 docs|	54| 434|	625|	625	|1183|	1620

The "profile" of the faceted field is encoded in it's name.  For example, the field f1000_5_t
has 1000 unique values across the whole index and between 0 and 5 values per document.  It
took 35 ms to facet on this field when the base query matched 100,000 documents.


Test Hardware: Commodity PC
 Processor: AMD Athlon 64 X2 5000+ (2.6GHz dual core)
Hard Drive: Western Digital Caviar GP WD5000AACS 500GB 5400 to 7200 RPM SATA 3.0Gb/s
Memory: 8GB DDR2 800 SDRAM (PC2 6400)
Operating System: Linux - Ubuntu 8.04 desktop, 64 bit version (x86_64)
Java VM: Sun Java6 (1.6.0_05) 64 bit hotspot (x86_64)



> multi-valued faceting via un-inverted field
> -------------------------------------------
>
>                 Key: SOLR-475
>                 URL: https://issues.apache.org/jira/browse/SOLR-475
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>         Attachments: facet_performance.html, UnInvertedField.java, UnInvertedField.java
>
>
> Facet multi-valued fields via a counting method (like the FieldCache method) on an un-inverted
representation of the field.  For each doc, look at it's terms and increment a count for that
term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message