Return-Path: X-Original-To: apmail-incubator-stanbol-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-stanbol-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C34343F5 for ; Thu, 2 Jun 2011 08:54:43 +0000 (UTC) Received: (qmail 4314 invoked by uid 500); 2 Jun 2011 08:54:43 -0000 Delivered-To: apmail-incubator-stanbol-dev-archive@incubator.apache.org Received: (qmail 4269 invoked by uid 500); 2 Jun 2011 08:54:42 -0000 Mailing-List: contact stanbol-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: stanbol-dev@incubator.apache.org Delivered-To: mailing list stanbol-dev@incubator.apache.org Received: (qmail 4261 invoked by uid 99); 2 Jun 2011 08:54:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 08:54:42 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rupert.westenthaler@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-ww0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2011 08:54:38 +0000 Received: by wwb17 with SMTP id 17so628279wwb.0 for ; Thu, 02 Jun 2011 01:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=H66q99VHBuxhtPNSCyQ/gig3H1QDp+SbW/VDSYg8QZs=; b=IpiC9bwui2y/net7lVhFO1f/a5DYFH7HxiG/HNrU9BJpIAvD6HjPg63VaixUBKo4DM O/i+EVT9vAO44ZdAypKL9u4Br011aVfOSvlt9loDN04f8igpq11jcqrpjYIubx6TAMAy 0Mk4v4PN6dBuvkBSo+IUgsEV22KRZz2JQbN0E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=xC7QqV644Mtzhh/5jXins81wMaMMKMsp8VyWr77hu7wt7oZMAkCVxl579l/GLBTV5J 5T+lPAw474AKwpCx7pfScW9EBuiT/WvY1nQyeoyFO9kOEVbApRZPEdCBMso6iadIuPCo 2+S1ohaba7B9qWKvjZjUl23udqnnFdJYBYQLo= MIME-Version: 1.0 Received: by 10.216.140.147 with SMTP id e19mr465383wej.49.1307004855391; Thu, 02 Jun 2011 01:54:15 -0700 (PDT) Received: by 10.216.71.193 with HTTP; Thu, 2 Jun 2011 01:54:15 -0700 (PDT) In-Reply-To: References: <4DE5E1FF.2020909@gmail.com> Date: Thu, 2 Jun 2011 10:54:15 +0200 Message-ID: Subject: Re: Contenthub structure From: Rupert Westenthaler To: stanbol-dev@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all I will try to create a small usage Szenario here: A user posts a query for "CMS workshops in France" to the Contenthub: The semantic Search component of the Contenthub uses several SeachEngines (like EnhancementEngines in the Enhancer). 1. OntologySearcher: It tries to identify Concepts mentioned in the Search. For the example it will find the Concpet "Workshop" 2. EntitySearcher: It tries to find Entities for words used in the Query. For the example it will find "France" 3. Faceted Search engine: It will compose a Lucene type search for Documents with * a reference Workshop * a reference to France * the text "CMS" If there would be an other Search engine that can understand internal structure of the query one could even search for things * with the type Workshop * located within Paris * the text "CMS" and because Workshops are events one could activate Facets for * Location * Time * Participants * facets explicitly requested with the query (e.g. Tags, Creator ...) So the Idea is to use * Ontologies (CMS-Adapter & Kres) * Entityhub * maybe neuronal networks with learned query patterns?? * other stuff?? for query preprocessing and * full text indices over Documents * full text indices over Facts (like the Workshop) * SPARQL endpoints over Enhancements * other things?? for the execution of the enhances query. Joining results from the different sources (Documents, Facts, Enhancements) would be challenging. However I think this feature would not be necessary for a first version. I would also like to consider this [Screencast](http://www.srdc.com.tr/iks/2ndyear/DemoVideo.htm) in the context of this Usage Scenario. WDYT Rupert On Wed, Jun 1, 2011 at 10:26 AM, Olivier Grisel wrote: > 2011/6/1 Suat Gonul : >> Hi everbody, >> >> After discussing with Rupert yesterday, we have come up with a basic des= ign >> for the Contenthub component. >> >> It will provide two main RESTful interface to: >> >> 1) Upload (register) content and metadata (Available in current >> implementation) >> 2) Search for registered content >> >> There would be Indexing Engines for (1) and Search Engines for (2). The >> Contenthub implementation would then implement Indexing Engines to store= the >> enhancements in a triple store and Search Engines to search enhancements= and >> content items in triple store. >> >> There is also an already started implementation for the search part in >> google code base of IKS project at [1]. It will be integrated to the >> Contenthub component. >> >> What do you think? > > I think the default search implementation for content should be based > on fulltext indexing using the EntityHub's SolrYard extended with > faceted search. > > I find fulltext search + structure facet based structured refinements > combo much more intuitive than the traditional multi-fields form based > search interface. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > --=20 | Rupert Westenthaler=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 rupert= .westenthaler@gmail.com | Bodenlehenstra=C3=9Fe 11=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 ++43-699-11108907 | A-5500 Bischofshofen