lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From britske <gbr...@gmail.com>
Subject Re: modeling prices based on daterange using multipoints
Date Tue, 11 Dec 2012 20:59:32 GMT
Hi David,

Yeah interesting (as well as problematic as far is implementing) use-case
indeed :)

1. You mention "there are no special caches / memory requirements inherent
in this.". For a given user-query this would mean all hotels would have to
seach for all point.x each time right? What would be a good plugin-point to
build in some custom cached filter code for this (perhaps using the Solr
Filter cache)? As I see it, determining all hotels that have a particular
point.x value is probably: A) pretty costly to do on each user query. B).
is static and can be cached easily without a lot of memory (relatively
speaking) i.e: 20.000 filters (representing all of the 20.000 different
point.x, that is, <date,duration,nr persons, roomtype> combos) with a
bitset per filter  representing ids of hotels that have the said point.x.

2. I'm not sure I explained C. (sorting) well, since I believe you're
talking about implementing custom code to sort multiple point.y's per
hotel, correct?. That's not what I need. Instead, for every user-query at
most 1 point ever matches. I.e: a hotel has a price for a particular <date,
duration,nrpersons,roomtype>-combo (P.x) or it hasn't.

Say a user queries for the <date,duration,nrpersons,roomtype>-combo: <21
dec 2012,3 days,2 persons, double>. This might be encoded into a value,
say: 12345.
Now, for the hotels that do match that query (i.e: those hotels that have a
point P for which P.x=12345) I want to sort those hotels on P.y (the price
for the requested P.x)

Geert-Jan




2012/12/11 David Smiley (@MITRE.org) [via Lucene] <
ml-node+s472066n4026151h71@n3.nabble.com>

> Hi Britske,
>   This is a very interesting question!
>
> britske wrote
> ...
> I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm
> wondering if I could model multiple prices per day as multipoints, whereas:
>
>  - date*duration*nr of persons*roomtype is modeled as point.x (discretized
> in some 20.000 values)
>  - price modeled as point.y ( in dollarcents / normalized as avg price per
> day: range:  [0,200000] covering a max price of $2.000/day)
>
> The stuff that needs to be possible:
>  A) 1 required filter on point.x (filtering a 1 particular
> <date*duration*nr of persons* roomtype> combo.
>  B) an optional range query on point.y (min and./or max price filter)
>  C) optional soring on point.y (sorting on price (normal or reverse))
>
> I'm pretty certain A) and B) won't be a problem as far is functionality is
> concerned, but how about performance? I.e: would some sort of cached Solr
> filter jump in for a given <date*duration*nr of persons* roomtype> combo,
> for quick doc-interesection, just as would with multiple dynamic fields in
> my desribed as-is-case?
>
> A & B are indeed not a problem and there are no special caches / memory
> requirements inherent in this.
>
> britske wrote
> How about C)? Is sorting on point.y possible? (potenially in conjunction
> with other sorting-fields used as tiebreaker, to give a stable sort? I
> remember to have read that any filterquery can be used for sorting combined
> with multipoints (which would make the above work I guess) but just would
> like to confirm.
> ...
>
> 'C' (sorting) is the challenge.  As it stands, you will have to implement
> a variation of this class:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/spatial/src/java/org/apache/lucene/spatial/util/ShapeFieldCacheDistanceValueSource.java?view=markup
> Unlike this implementation, your implementation should  ensure the point is
> indeed in the query shape, and it should be configured to take the smallest
> or largest 'y' as desired.  Note that the cache infrastructure that this is
> built on is flakey right now -- a memory hog in multiple ways.  There will
> be a Point implementation in memory for all of your indexed points, and an
> ArrayList per doc.  And it's not NRT search friendly, and doesn't
> relinquish its resources (i.e. on commit) as quickly as it should.  I know
> what it's problems are but I have been quite busy.
>
> ~ David
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026151.html
>  To unsubscribe from modeling prices based on daterange using multipoints, click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4026011&code=Z2JyaXRzQGdtYWlsLmNvbXw0MDI2MDExfDExNjk3MTIyNTA=>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026169.html
Sent from the Solr - User mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message