cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <Andre.Davig...@developpement-durable.gouv.fr>
Subject Re: how-to query an xml repository efficiently
Date Tue, 08 Sep 2009 15:09:35 GMT
Robby,

One more thing about this subject.

You can do all that stuff directly with Cocoon / Lucene with java code 
only, but Solr offers rich possibilities of index configuration by 
schema.xml and index can be handled with a HTTP client inside Cocoon 
through the Solr XML / HTTP API. Or in java code with SolrJ API if you 
prefer.

André (not David ;-) )


Le 08/09/2009 11:12, > Robby Pelssers (par Internet, dépôt 
users-return-97980-andre.davignon=developpement-durable.gouv.fr@cocoon.apache.org) 
a écrit  :
> You all convinced me to investigate the SOLR path further ;-)
>  
> I already installed SOLR yesterday but I probably did not spent enough
> time on playing with it due to lack of time.  That's why I ask the
> experts on this mailing list ;-)
> 
> David's answer "The facet research funtionality in Solr can give access
> to all possible values in the index of your data for a given property so
> the user can pick one among them, then find all matching data." was the
> missing piece of the puzzle.
> 
> Thx a lot guys !!
> 
> Robby
> 
> -----Original Message-----
> From: Jeroen Reijn [mailto:j.reijn@onehippo.com] 
> Sent: Tuesday, September 08, 2009 10:45 AM
> To: users@cocoon.apache.org
> Subject: Re: how-to query an xml repository efficiently
> 
> Hi Robby,
> 
> in this case I even think SOLR would be a great match for this use case.
> 
> You can push XML with a http client to SOLR and let SOLR index the 
> information. See the post.jar that comes with the SOLR example. It 
> pushes XML to the solr app and indexes it based on your configuration.
> 
> The great thing is that you can even configure all kinds of facets based
> 
> on what is stored in such a product file, so you can create a nice facet
> 
> view in your webapp.
> 
> A couple of years ago I was looking a some Forrest components [1], which
> 
> were made for using SOLR from a cocooon point of view. It helps you to 
> perform queries to a SOLR instance from your sitemap and get XML 
> response back.
> 
> Regards,
> 
> Jeroen
> 
> [1]http://wiki.apache.org/solr/SolrForrest
> 
> Robby Pelssers wrote:
>> Hi jeroen and others who replied to my mail...  Let me further explain
>> my usecase and existing infrastructure.
>>
>> My customer stores their product data in xml-files on file system 
>>
>> E.g. 
>>   ${repofolder}/
>> 	products/
>> 		product-1/	
>> 			product-1.xml
>> 			product-1-image.jpg
>>   			...
>> 		product-2/	
>> 			product-2.xml
>> 			product-2-image.jpg
>> 		...
>>
>> This is a simplified representation but as you see there is no concept
>> of an xml database.
>>
>> Now let's start with a small fictive example for product-1.xml: 
>>
>> <product>
>>   <id>xxxx</id>
>>   <description>grandma's cookies</description>
>>   <category>food</category>
>>   <price>2.0</price>
>> </product>
>>
>> From a functional point of view they want to be able to search for
>> products based on some criteria.  So I'll have to build a small
>> searchform containing:
>> 	- Dropdown with all possible categories
>> 	- textbox to search for part of description
>> 	- price "between/ equal to / greather then / less then" search
>> functionality
>>
>> So for certain "Filter"-criteria I'll have to get all possible values
> so
>> they can pick one and for others I don't need to know anything about
> the
>> actual data.
>>
>> The actual product xml-files are +- 500kb on average and I'm talking
>> about LOTS of products so I have to consider performance upfront.
>>
>> SOLR seems good for indexing static html files etc but I don't get the
>> impression it can offer the necessary functionality for this use case.
>>
>> Any comments??
>>
>> Cheers,
>> Robby
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Jeroen Reijn [mailto:j.reijn@onehippo.com] 
>> Sent: Tuesday, September 08, 2009 9:01 AM
>> To: users@cocoon.apache.org
>> Subject: Re: how-to query an xml repository efficiently
>>
>> Hi Robby,
>>
>> do you perhaps have any more specs on what kind of XML database it is?
>>
>> At our company we have experience with an Apache Slide backed
> database, 
>> which we used for storing XML files and let Slide indexed them with 
>> Lucene. Then based on DASL queries we could search the repository
> really
>> quickly.
>>
>> Next to DASK I know there are also XML databases that can use XQueries
> 
>> to perform fast searches on their XML database.
>>
>> Regards,
>>
>> Jeroen
>>
>> Robby Pelssers wrote:
>>> Hi all,
>>>
>>>  
>>>
>>> I have following use case.  The customer has an xml repository which
>> is 
>>> nothing more then a directory on filesystem which contains 
>>> subdirectories containing one or more xml files.  They now want to
>> query 
>>> those xml files on some predefined criteria which might change over
>> time...
>>>  
>>>
>>> I'm looking for a solution which results in high performance search
>> and 
>>> some things that came to my mind was
>>>
>>> *         extracting information and storing them in a database (e.g.
> 
>>> HSQLDB) 
>>>
>>> *         using lucene
>>>
>>>  
>>>
>>> Is there somewhere detailed documentation available on using these?
>> And 
>>> what would you recommend for my use case?
>>>
>>>  
>>>
>>> I already found some stuff but no real quick-start material.
>>>
>>> http://cocoon.apache.org/2.1/userdocs/concepts/xmlsearching.html
>>>
>>> http://cocoon.apache.org/2.2/blocks/hsqldb-client/1.0/
>>>
>>> http://cocoon.apache.org/2.2/blocks/hsqldb-server/1.0/
>>>
>>>  
>>>
>>> Thx in advance,
>>>
>>> Robby Pelssers
>>>
>>>  
>>>
>>>  
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>> For additional commands, e-mail: users-help@cocoon.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
>> For additional commands, e-mail: users-help@cocoon.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message