Return-Path: Delivered-To: apmail-incubator-clerezza-dev-archive@minotaur.apache.org Received: (qmail 54261 invoked from network); 18 Feb 2010 18:36:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Feb 2010 18:36:52 -0000 Received: (qmail 7835 invoked by uid 500); 18 Feb 2010 18:36:52 -0000 Delivered-To: apmail-incubator-clerezza-dev-archive@incubator.apache.org Received: (qmail 7797 invoked by uid 500); 18 Feb 2010 18:36:52 -0000 Mailing-List: contact clerezza-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: clerezza-dev@incubator.apache.org Delivered-To: mailing list clerezza-dev@incubator.apache.org Received: (qmail 7787 invoked by uid 99); 18 Feb 2010 18:36:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Feb 2010 18:36:52 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [216.86.168.182] (HELO mxout-07.mxes.net) (216.86.168.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Feb 2010 18:36:43 +0000 Received: from gv-out-0910.google.com (unknown [216.239.58.184]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 55F0922E257 for ; Thu, 18 Feb 2010 13:36:21 -0500 (EST) Received: by gv-out-0910.google.com with SMTP id c17so277730gvd.17 for ; Thu, 18 Feb 2010 10:36:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.81.37 with SMTP id i37mr7328773mul.39.1266518179709; Thu, 18 Feb 2010 10:36:19 -0800 (PST) X-Originating-IP: [217.71.246.200] Date: Thu, 18 Feb 2010 19:36:19 +0100 Message-ID: Subject: Lucene (was Re: Clips architecture / lucene / google alerts) From: Reto Bachmann-Gmuer To: clerezza-dev@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Oli Mostly I don't know what you are referring too. Also there's no attachment, if you would like us to have look at something that can't easily be expressed in plain text put it on the web an post a link here. I may be able to say something about Lucene: I think TDB integrates lucene, at least there's the possibility to use lucene with arq [1] (our current backend independent sparqlprovider is based on arq). If you expirience poor performance with queries including regexp (using the tdb backend with clerezza) it would be interesting to compare this with a direct query to tdb (without using clerezza), currently we haven't implemented a fast-lane to the sparql service in the tdb provider so this might bring big performance gains. What would the cool api be you would additionally provide for the search engine, what would like to do that you can't do with sparql and its regex? Cheers, reto 1. http://www.openjena.org/ARQ/lucene-arq.html On Thu, Feb 18, 2010 at 4:56 PM, Oliver Str=C3=A4sser wrote: > Hello together > > > > how marco told you, we have new project, that we want to develop on the t= op > of clerezza. =C2=A0I think marco told you the big goal ;-) > > in this project,=C2=A0 we need an efficient searchengine. > > > > The hole architecture (see attachment) includes new connection to new > services like google alert and the apache lucene project. > > components: > > google alerts manager: > > - no creation of the feed - reason: missing api for the google alert serv= ice > > - connect existed google alerts feed, with the clerezza system > > - overview over all added feeds > > - add additional infomation > > - connected with conceptmanager > > > > > > google alerts provider > > - fetching of the google alerts feed > > - deliver the feeds to other bundles > > > > apache lucene provider > > - connection to lucene indexes > > - later: managing of different indexes > > - fast search enginge with cool api > > > > clips search frontend > > - search ofer lucene index > > - many different configurations (selection of afeed range, type definitio= n,=E2=80=A6 > ) > > - enrich the search result with user generated content (comment, > classification,.. ) saved in clips graph / lucene > > > > > > clips rss / e-mail manager > > - create rss feeds / an intervall e-mail with an specific result of an > search query (searched in lucene) > > > > > > clips middleware > > - implementation of the connection between the feeds, graph and lucene > (Business logic) > > - reading the feed and filling lucene (special keywords - edited in the > conceptmanager) > > - connection of lucene index and the graph is the url of the feed > articel-entry > > > > > > apache lucene > > - internal seach engine for the clips search frontend > > > > apache lucene provider > > - managing of different indexes, for different bundles > > - the index access / modification is done by the provider > > - extension: overview over all indexes (like Luke) > > > > > > pro lucene: > > fast search engine > > well formed api > > we get the possibility to connect clerezza with apache lucene, and connec= t > us to all lucene indexes > > > > contra: > > it isn't in the graph > > perhaps redundant data (Url / Title) > > > > > > what you think about this architecture? > > > > > > > > > > > > > > > > > > --getunik ag------------------------------------------- > =C2=A0 oliver straesser=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0oliver.straesser@getunik.com > > =C2=A0 hardturmstrasse 101=C2=A0=C2=A0=C2=A0=C2=A0fon: +41 (0)44 388 55 8= 8 > =C2=A0 ch-8005 zuerich=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 fax: +41 (0)44 388 55 89 > > =C2=A0=C2=A0=C2=A0--aktuelles getunik projekt------------------------- > > =C2=A0=C2=A0 Agieren Sie lokal! Geo Marketing f=C3=BCr Ihre E-Mail Kampag= ne: > www.geomarketing.com > > > > =C2=A0--best of swiss web awards 2009------------------ > > =C2=A0=C2=A0 Gold & Silber f=C3=BCr Connect2Earth / Bronze f=C3=BCr WWF U= K > > > > we make the web a better place - www.getunik.com > > > > > > > > > > ***************************************************************** > > P Bitte drucken Sie dieses E-Mail nur bei Bedarf aus. Die Umwelt dankt es > Ihnen. > > ***************************************************************** > >