lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Sélamy <jym...@gmail.com>
Subject Re: Optimize facets when actually single valued?
Date Mon, 12 Nov 2012 14:14:01 GMT
Hi,

The version of Solr is 3.6.1,

Here's my query, you can find it a bit huge! But i absolutly need all this
in my response.

====================================

q=*:*

fq=language_code:("fr_CA") AND acl_name:(cch_CP_AP_Archives OR
cch_archive_content OR cch_browse_official_feed_folder OR cch_folder_acl OR
cch_official_feed_content OR cch_official_press_release_acl OR
cch_published_story OR cch_pubpage_folder_acl OR cch_raw_content OR
cch_restricted_rights_content OR cch_sched_acl OR cch_schedule_acl OR
cch_source_acl OR cch_wire_feeds_acl) AND feed_type:("WF" OR "OF" OR
"RW")&fq=((type:("cch_published_story" OR "cch_story") AND
language_code:("fr_CA") AND acl_name:(cch_CP_AP_Archives OR
cch_archive_content OR cch_browse_official_feed_folder OR cch_folder_acl OR
cch_official_feed_content OR cch_official_press_release_acl OR
cch_published_story OR cch_pubpage_folder_acl OR cch_raw_content OR
cch_restricted_rights_content OR cch_sched_acl OR cch_schedule_acl OR
cch_source_acl OR cch_wire_feeds_acl) AND feed_type:("WF" OR "OF" OR "RW"))
OR (type:("cch_photo") AND mfile_url:([* TO *]) AND
acl_name:(cch_CP_AP_Archives OR cch_archive_content OR
cch_browse_official_feed_folder OR cch_folder_acl OR
cch_official_feed_content OR cch_official_press_release_acl OR
cch_published_story OR cch_pubpage_folder_acl OR cch_raw_content OR
cch_restricted_rights_content OR cch_sched_acl OR cch_schedule_acl OR
cch_source_acl OR cch_wire_feeds_acl) AND feed_type:("WF" OR "OF" OR
"RW")))&rows=0&start=0&

facet.sort=count&
facet.field=source_id&
*facet.field=facet_tme_person_name_french&*
*facet.field=facet_tme_geographic_location_french&*
*facet.field=facet_tme_iptc_category&*
*facet.field=facet_tme_organization_name_french&*
facet.field=feed_type&

f.source_id.facet.limit=-1&
f.source_id.facet.mincount=1&
f.facet_tme_person_name_french.facet.limit=25&
f.facet_tme_person_name_french.facet.mincount=1&
f.facet_tme_geographic_location_french.facet.limit=25&
f.facet_tme_geographic_location_french.facet.mincount=1&
f.facet_tme_iptc_category.facet.limit=25&
f.facet_tme_iptc_category.facet.mincount=1&
f.facet_tme_organization_name_french.facet.limit=25&
f.facet_tme_organization_name_french.facet.mincount=1&
f.feed_type.facet.limit=25&
f.feed_type.facet.mincount=1&
facet.range=r_creation_date1&
facet.range=r_creation_date2&
facet.range=r_creation_date3&
facet.range=r_creation_date4&
f.r_creation_date1.facet.range.start=NOW-1HOUR&
f.r_creation_date1.facet.range.end=NOW&
f.r_creation_date1.facet.range.gap=+1HOUR&
f.r_creation_date2.facet.range.start=NOW-24HOUR&
f.r_creation_date2.facet.range.end=NOW&
f.r_creation_date2.facet.range.gap=+24HOUR&
f.r_creation_date3.facet.range.start=NOW-48HOUR&
f.r_creation_date3.facet.range.end=NOW&
f.r_creation_date3.facet.range.gap=+48HOUR&
f.r_creation_date4.facet.range.start=NOW-7DAY&
f.r_creation_date4.facet.range.end=NOW&
f.r_creation_date4.facet.range.gap=+7DAY

facet=true

=====================================

The fields in bold are the fields that i'm having performance issues.

I've put the facet.method=enum this increase the performance perhaps it is
still not acceptable for my application. There are the log i've did with
the same fq perhaps with each facet field by themselves. Note that only the
facet name that starts with "facet" are my multivalued fields.


o Date range facet (681,25 ms)

o Feed type (586,5 ms)

o Categories (898 ms)

o facet_tme_geographic_location_french (1249 ms)

o facet_tme_person_name_french (1940,75 ms )

o facet_tme_organiztion_name_french (1240,75 ms)

All combined give me 6000 ms.

For the other questions you've asked me like "How many unique values are
there in the field" I don't know how to get this info.

*Jimmy M. Sélamy*


2012/11/11 Erick Erickson <erickerickson@gmail.com>

> You have to provide more details. How many unique values are there in the
> field in question? What's the query you're using? Are you sure other parts
> of the query aren't the culprit? What Solr version are you using?
>
> Please review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best
> Erick
>
>
> On Sat, Nov 10, 2012 at 9:41 PM, Jimmy Sélamy <jymysy@gmail.com> wrote:
>
>> **
>> Im having perfomance issues with facet on multivalued field with an index
>> over 20Million documents.
>>
>> And when doing faceting search on multivalued field the QTIME is
>> unacceptable for my application because it can take up to 6000ms.
>>
>> Ive put the facet.method to enum! Which increased my performance to the
>> time i just mentionned! Its still not acceptable.
>>
>> Is there any suggestions ?
>>
>> Envoyé avec BlackBerry sur le réseau mobile de Vidéotron
>> ------------------------------
>> *From: * Robert Muir <rcmuir@gmail.com>
>> *Date: *Sat, 10 Nov 2012 21:33:47 -0500
>> *To: *<dev@lucene.apache.org>
>> *ReplyTo: * dev@lucene.apache.org
>> *Subject: *Optimize facets when actually single valued?
>>
>> I am guessing at times people are lazy about schema definition. But, I
>> think with lucene 4 stats we can detect if a field is actually single
>> valued... Something like terms.size == terms.doccount == terms.sumdocfreq.
>> I have to think about it a bit, maybe its even simpler than this? Anyway,
>> this couple be used instead of actual schema def to just build a fieldcache
>> instead of uninverted field I think... Should be a simple opto but maybe
>> potent...
>>
>
>

Mime
View raw message