lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2366) Facet Range Gaps
Date Sat, 02 Apr 2011 23:45:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015088#comment-13015088
] 

Hoss Man commented on SOLR-2366:
--------------------------------

In no particular order...

* I like Jan's {{facet.range.spec}} naming suggestion better then my {{facet.range.buckets}}
suggestion ... but i think {{facet.range.series}}, {{facet.range.seq}}, or {{facet.range.sequence}}
might be better still.

* I think Jan's point about {{N}} vs {{+N}} in the sequence list as a way to mix absolute
values vs increments definitely makes sense, and would be consistent with the existing date
match expression.  

* the complexity with supporting *both* absolute values and increments would be the question
of what solr should do with input like {{facet.range.seq=10,20,+50,+100,120,150}} ?  what
ranges would we return? (10-20, 20-70, 70-???....)  would it be an error? would we give back
ranges that overlapped?  what about {{facet.range.seq=10,50,+50,100,150&facet.range.include=all}}
.. would that result in one of the ranges being [100 TO 100] or would we throw that one out?
 (I think it would be wise to start out only implementing the absolute value approavh, since
that seems (to me) the more useful option of the two, and then consider adding the incremental
values as a separate issue later after hashing out hte semantics of these types of situations)

* A few of Jan's sample input suggestions used {{*}} at either the start or end of the sequence
to denote "everything before" the second value or "everything after" the second to last value
-- i don't think we need to support this syntax, I think the existing {{facet.range.other}}
would still be the right way to support this with {{facet.range.sequence}}.  if you want "everything
before" and/or "everything after" use {{facet.range.include=before}} and/or {{facet.range.include=after}}
.. otherwise it would be confusing to decide what things like {{facet.range.include=before&facet.range.seq=*,10,20}}
and {{facet.range.include=none&facet.range.seq=*,10,20}} mean.

* I *REALLY* don't think we should try to implement something like Jan's {{facet.range.labels}}
suggestion.  I can't imagine any way of supporting it thta wouldn't prevent or radically complicate
the "..." type continuation of series i suggested before, and that seems like a much more
powerful feature then labels.  if a user is going to provide a label for every range, then
you must enumerate every range, and you might as well enumerate them (and label them) with
{{facet.query}} where the label and the query can be side by side.

This...

{code}
facet.query={!label="One or more"}bedrooms:[1 TO *]
facet.query={!label="Two or more"}bedrooms:[2 TO *]
facet.query={!label="Three or more"}bedrooms:[3 TO *]
facet.query={!label="Four or more"}bedrooms:[4 TO *]
{code}

...seems way more readable, and less prone to user error in tweaking, then this...

{code}
f.bedrooms.facet.range.spec=1..*,2..*,3..*,4..*
f.bedrooms.facet.range.labels="One or more","Two or more","Three or more","Four or more"
{code}

* Herman commented...

bq. While using fact.query allows us to construct arbitrary ranges, we must then pick them
out of the results separately. This becomes more difficult if we arbitrarily facet on two
or more fields/expressions. 

I don't see that as being particularly hard problem that we need to worry about helping users
avoid,  Especially since users can anotate those queries using localparams and set any arbitrary
key=val pairs on them that you want to help organize them and identify them later when parsing
the response...

{code}
facet.query={!group=bed label="One or more"}bedrooms:[1 TO *]
facet.query={!group=bed label="Two or more"}bedrooms:[2 TO *]
facet.query={!group=bed label="Three or more"}bedrooms:[3 TO *]
facet.query={!group=bed label="Four or more"}bedrooms:[4 TO *]
facet.query={!group=size label="Small"}sqft:[* TO 1000]
facet.query={!group=size label="Medium"}sqft:[1000 TO 2500]
facet.query={!group=size label="Large"}sqft:[2500 TO *]
{code}




> Facet Range Gaps
> ----------------
>
>                 Key: SOLR-2366
>                 URL: https://issues.apache.org/jira/browse/SOLR-2366
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2366.patch, SOLR-2366.patch
>
>
> There really is no reason why the range gap for date and numeric faceting needs to be
evenly spaced.  For instance, if and when SOLR-1581 is completed and one were doing spatial
distance calculations, one could facet by function into 3 different sized buckets: walking
distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance.
 We should be able to quantize the results into arbitrarily sized buckets.  I'd propose the
syntax to be a comma separated list of sizes for each bucket.  If only one value is specified,
then it behaves as it currently does.  Otherwise, it creates the different size buckets. 
If the number of buckets doesn't evenly divide up the space, then the size of the last bucket
specified is used to fill out the remaining space (not sure on this)
> For instance,
> facet.range.start=0
> facet.range.end=400
> facet.range.gap=5,25,50,100
> would yield buckets of:
> 0-5,5-30,30-80,80-180,180-280,280-380,380-400

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message