Return-Path: Delivered-To: apmail-cocoon-users-archive@www.apache.org Received: (qmail 65749 invoked from network); 8 Sep 2009 08:45:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Sep 2009 08:45:40 -0000 Received: (qmail 37063 invoked by uid 500); 8 Sep 2009 08:45:39 -0000 Delivered-To: apmail-cocoon-users-archive@cocoon.apache.org Received: (qmail 36991 invoked by uid 500); 8 Sep 2009 08:45:39 -0000 Mailing-List: contact users-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: users@cocoon.apache.org List-Id: Delivered-To: mailing list users@cocoon.apache.org Received: (qmail 36979 invoked by uid 99); 8 Sep 2009 08:45:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2009 08:45:38 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.18.2.20] (HELO exprod7og121.obsmtp.com) (64.18.2.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 08 Sep 2009 08:45:28 +0000 Received: from source ([209.85.218.207]) by exprod7ob121.postini.com ([64.18.6.12]) with SMTP ID DSNKSqYZixJ7oBvD3iCiKXnWLfD92QYjklgr@postini.com; Tue, 08 Sep 2009 01:45:07 PDT Received: by mail-bw0-f207.google.com with SMTP id 3so2491983bwz.2 for ; Tue, 08 Sep 2009 01:44:59 -0700 (PDT) Received: by 10.103.125.38 with SMTP id c38mr6599132mun.119.1252399499582; Tue, 08 Sep 2009 01:44:59 -0700 (PDT) Received: from ?192.168.1.21? ([212.241.50.201]) by mx.google.com with ESMTPS id s11sm23800mue.43.2009.09.08.01.44.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 08 Sep 2009 01:44:59 -0700 (PDT) Message-ID: <4AA61989.2080701@onehippo.com> Date: Tue, 08 Sep 2009 10:44:57 +0200 From: Jeroen Reijn Organization: Hippo User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: users@cocoon.apache.org Subject: Re: how-to query an xml repository efficiently References: <7C655C04B6F59643A1EF66056C0E095E02A3B6C9@eusex01.sweden.ecsoft> <4AA60133.7030708@onehippo.com> <7C655C04B6F59643A1EF66056C0E095E02A8BB7C@eusex01.sweden.ecsoft> In-Reply-To: <7C655C04B6F59643A1EF66056C0E095E02A8BB7C@eusex01.sweden.ecsoft> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Robby, in this case I even think SOLR would be a great match for this use case. You can push XML with a http client to SOLR and let SOLR index the information. See the post.jar that comes with the SOLR example. It pushes XML to the solr app and indexes it based on your configuration. The great thing is that you can even configure all kinds of facets based on what is stored in such a product file, so you can create a nice facet view in your webapp. A couple of years ago I was looking a some Forrest components [1], which were made for using SOLR from a cocooon point of view. It helps you to perform queries to a SOLR instance from your sitemap and get XML response back. Regards, Jeroen [1]http://wiki.apache.org/solr/SolrForrest Robby Pelssers wrote: > Hi jeroen and others who replied to my mail... Let me further explain > my usecase and existing infrastructure. > > My customer stores their product data in xml-files on file system > > E.g. > ${repofolder}/ > products/ > product-1/ > product-1.xml > product-1-image.jpg > ... > product-2/ > product-2.xml > product-2-image.jpg > ... > > This is a simplified representation but as you see there is no concept > of an xml database. > > Now let's start with a small fictive example for product-1.xml: > > > xxxx > grandma's cookies > food > 2.0 > > > From a functional point of view they want to be able to search for > products based on some criteria. So I'll have to build a small > searchform containing: > - Dropdown with all possible categories > - textbox to search for part of description > - price "between/ equal to / greather then / less then" search > functionality > > So for certain "Filter"-criteria I'll have to get all possible values so > they can pick one and for others I don't need to know anything about the > actual data. > > The actual product xml-files are +- 500kb on average and I'm talking > about LOTS of products so I have to consider performance upfront. > > SOLR seems good for indexing static html files etc but I don't get the > impression it can offer the necessary functionality for this use case. > > Any comments?? > > Cheers, > Robby > > > > > > -----Original Message----- > From: Jeroen Reijn [mailto:j.reijn@onehippo.com] > Sent: Tuesday, September 08, 2009 9:01 AM > To: users@cocoon.apache.org > Subject: Re: how-to query an xml repository efficiently > > Hi Robby, > > do you perhaps have any more specs on what kind of XML database it is? > > At our company we have experience with an Apache Slide backed database, > which we used for storing XML files and let Slide indexed them with > Lucene. Then based on DASL queries we could search the repository really > > quickly. > > Next to DASK I know there are also XML databases that can use XQueries > to perform fast searches on their XML database. > > Regards, > > Jeroen > > Robby Pelssers wrote: >> Hi all, >> >> >> >> I have following use case. The customer has an xml repository which > is >> nothing more then a directory on filesystem which contains >> subdirectories containing one or more xml files. They now want to > query >> those xml files on some predefined criteria which might change over > time... >> >> >> I'm looking for a solution which results in high performance search > and >> some things that came to my mind was >> >> * extracting information and storing them in a database (e.g. >> HSQLDB) >> >> * using lucene >> >> >> >> Is there somewhere detailed documentation available on using these? > And >> what would you recommend for my use case? >> >> >> >> I already found some stuff but no real quick-start material. >> >> http://cocoon.apache.org/2.1/userdocs/concepts/xmlsearching.html >> >> http://cocoon.apache.org/2.2/blocks/hsqldb-client/1.0/ >> >> http://cocoon.apache.org/2.2/blocks/hsqldb-server/1.0/ >> >> >> >> Thx in advance, >> >> Robby Pelssers >> >> >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org > For additional commands, e-mail: users-help@cocoon.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org > For additional commands, e-mail: users-help@cocoon.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org For additional commands, e-mail: users-help@cocoon.apache.org