incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reto Bachmann-Gmuer <reto.bachm...@trialox.org>
Subject Lucene (was Re: Clips architecture / lucene / google alerts)
Date Thu, 18 Feb 2010 18:36:19 GMT
Hi Oli

Mostly I don't know what you are referring too. Also there's no
attachment, if you would like us to have look at something that can't
easily be expressed in plain text put it on the web an post a link
here.

I may be able to say something about Lucene: I think TDB integrates
lucene, at least there's the possibility to use lucene with arq [1]
(our current backend independent sparqlprovider is based on arq). If
you expirience poor performance with queries including regexp (using
the tdb backend with clerezza) it would be interesting to compare this
with a direct query to tdb (without using clerezza), currently we
haven't implemented a fast-lane to the sparql service in the tdb
provider so this might bring big performance gains.

What would the cool api be you would additionally provide for the
search engine, what would like to do that you can't do with sparql and
its regex?

Cheers,
reto



1. http://www.openjena.org/ARQ/lucene-arq.html

On Thu, Feb 18, 2010 at 4:56 PM, Oliver Strässer
<oliver.straesser@getunik.com> wrote:
> Hello together
>
>
>
> how marco told you, we have new project, that we want to develop on the top
> of clerezza.  I think marco told you the big goal ;-)
>
> in this project,  we need an efficient searchengine.
>
>
>
> The hole architecture (see attachment) includes new connection to new
> services like google alert and the apache lucene project.
>
> components:
>
> google alerts manager:
>
> - no creation of the feed - reason: missing api for the google alert service
>
> - connect existed google alerts feed, with the clerezza system
>
> - overview over all added feeds
>
> - add additional infomation
>
> - connected with conceptmanager
>
>
>
>
>
> google alerts provider
>
> - fetching of the google alerts feed
>
> - deliver the feeds to other bundles
>
>
>
> apache lucene provider
>
> - connection to lucene indexes
>
> - later: managing of different indexes
>
> - fast search enginge with cool api
>
>
>
> clips search frontend
>
> - search ofer lucene index
>
> - many different configurations (selection of afeed range, type definition,…
> )
>
> - enrich the search result with user generated content (comment,
> classification,.. ) saved in clips graph / lucene
>
>
>
>
>
> clips rss / e-mail manager
>
> - create rss feeds / an intervall e-mail with an specific result of an
> search query (searched in lucene)
>
>
>
>
>
> clips middleware
>
> - implementation of the connection between the feeds, graph and lucene
> (Business logic)
>
> - reading the feed and filling lucene (special keywords - edited in the
> conceptmanager)
>
> - connection of lucene index and the graph is the url of the feed
> articel-entry
>
>
>
>
>
> apache lucene
>
> - internal seach engine for the clips search frontend
>
>
>
> apache lucene provider
>
> - managing of different indexes, for different bundles
>
> - the index access / modification is done by the provider
>
> - extension: overview over all indexes (like Luke)
>
>
>
>
>
> pro lucene:
>
> fast search engine
>
> well formed api
>
> we get the possibility to connect clerezza with apache lucene, and connect
> us to all lucene indexes
>
>
>
> contra:
>
> it isn't in the graph
>
> perhaps redundant data (Url / Title)
>
>
>
>
>
> what you think about this architecture?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --getunik ag-------------------------------------------
>   oliver straesser              oliver.straesser@getunik.com
>
>   hardturmstrasse 101    fon: +41 (0)44 388 55 88
>   ch-8005 zuerich              fax: +41 (0)44 388 55 89
>
>    --aktuelles getunik projekt-------------------------
>
>    Agieren Sie lokal! Geo Marketing für Ihre E-Mail Kampagne:
> www.geomarketing.com
>
>
>
>  --best of swiss web awards 2009------------------
>
>    Gold & Silber für Connect2Earth / Bronze für WWF UK
>
>
>
> we make the web a better place - www.getunik.com
>
>
>
>
>
>
>
>
>
> *****************************************************************
>
> P Bitte drucken Sie dieses E-Mail nur bei Bedarf aus. Die Umwelt dankt es
> Ihnen.
>
> *****************************************************************
>
>

Mime
View raw message