From users-return-15924-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Tue Aug 24 13:07:54 2010 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 10363 invoked from network); 24 Aug 2010 13:07:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Aug 2010 13:07:54 -0000 Received: (qmail 33966 invoked by uid 500); 24 Aug 2010 13:07:49 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 33470 invoked by uid 500); 24 Aug 2010 13:07:47 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 33454 invoked by uid 99); 24 Aug 2010 13:07:46 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 13:07:46 +0000 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aklimets@day.com designates 207.126.148.95 as permitted sender) Received: from [207.126.148.95] (HELO eu3sys201amo011.postini.com) (207.126.148.95) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 24 Aug 2010 13:07:23 +0000 Received: from source ([209.85.212.52]) by eu3sys201aob105.postini.com ([207.126.154.11]) with SMTP ID DSNKTHPD9hpGvqNIXF8F7KTCbOxBRCidqAoH@postini.com; Tue, 24 Aug 2010 13:07:03 UTC Received: by vws14 with SMTP id 14so608286vws.39 for ; Tue, 24 Aug 2010 06:07:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.158.9 with SMTP id d9mr4002734vcx.173.1282655221533; Tue, 24 Aug 2010 06:07:01 -0700 (PDT) Received: by 10.220.185.202 with HTTP; Tue, 24 Aug 2010 06:07:01 -0700 (PDT) In-Reply-To: <1282561709530-2334944.post@n4.nabble.com> References: <1282561709530-2334944.post@n4.nabble.com> Date: Tue, 24 Aug 2010 15:07:01 +0200 Message-ID: Subject: Re: Faceted Search Implementation From: Alexander Klimetschek To: users@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Aug 23, 2010 at 13:08, Gadbury wrote: > > Hi all, > > I am trying to work out a good way to implement faceted search for produc= ts > in an ecommerce solution. =A0Please consider the following diagram which = shows > the structure of my categories and products: > > http://jackrabbit.510166.n4.nabble.com/file/n2334944/category-product_str= ucture.png > > I have the following custom node types which implement mix:referenceable = so > they eachhave a unique UUID: > > - Category (i.e. hardware) > - FacetType (i.e. manufacturer, warranty) > - FacetValue (i.e. amd, intel, samsung, 1 year, 2 years, 3 years) > > A product has a number of properties but of importance here are the > following properties which are weak references (a String representing the > UUID(s) ) to the nodes Category and FacetValue: > > - categoryUUIDs > - facetvalueUUIDs > > Currently I am tracking the facet type values the user has selected and > adding them to a query, which retrieves the relevant products. =A0This wo= rks > although it may be slow with many products! =A0Here is an example of the = XPath > query: > > //element(*, > jpg:product)[@jpg:categoryUUIDs=3D'd93681a3-8b4e-4c2a-9dcb-a219848f8f3a' = and > ((@jpg:facetvalueUUIDs=3D'70588aa9-6cb1-4ee1-af95-a21f78968e74') and > (@jpg:facetvalueUUIDs=3D'bec141e8-f4c5-41c5-9cef-560dab296750'))] order b= y > @jpg:cost > > Once the query is executed, I am iterating over all products, and getting= : > > each unique facet type UUID and name > each unique facet value UUID and name > a count of each occurence of a facetValueUUID > > This data is presented back to the user to offer them a selection of face= ts > to filter by. =A0For example: > > Manufacturer: > amd [2] > intel [3] > samsung [5] > > Warranty > 1 year [3] > 3 years [7] > > I know this works but I am sure there must be a more efficient / refined = way > to do this... perhaps I am completely misunderstanding Jackrabbit and how= to > get the most out of Lucene. =A0Is there another way that I should conside= r > doing this? =A0I would really appreciate any suggestions / improvements. > > Thanks for reading and kind regards, I would not use UUIDs, but rather use the paths of the facets. See also David's Model, rule #7 [1]. Paths are already unique, if you avoid SNS, and if you don't expect frequent move or merge operations on the facets (because you have to update all the content nodes then - which might be ok). Finally you can leverage the hierarchy on facets to avoid the distinction of facet categories and values. For the manufacturer you'd have these facets: /facets/manufacturer/amd /facets/manufacturer/intel /facets/manufacturer/samsung And on the content (products), you'd only have a multi-value string property "facets", containing the paths of all facets. You can search for facet values directly (@facets=3D'/facets/manufacturer/intel') but using jcr:like you can also search for facet categories: jcr:like(@facets, '/facets/manufacturer/%'). [1] http://wiki.apache.org/jackrabbit/DavidsModel Regards, Alex --=20 Alexander Klimetschek alexander.klimetschek@day.com