Return-Path: Delivered-To: apmail-xml-forrest-dev-archive@www.apache.org Received: (qmail 76146 invoked from network); 15 Sep 2003 09:33:57 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 15 Sep 2003 09:33:57 -0000 Received: (qmail 70641 invoked by uid 500); 15 Sep 2003 09:33:30 -0000 Delivered-To: apmail-xml-forrest-dev-archive@xml.apache.org Received: (qmail 70485 invoked by uid 500); 15 Sep 2003 09:33:29 -0000 Mailing-List: contact forrest-dev-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: forrest-dev@xml.apache.org Delivered-To: mailing list forrest-dev@xml.apache.org Received: (qmail 70462 invoked from network); 15 Sep 2003 09:33:29 -0000 Received: from unknown (HELO rosone.porcelanosa.com) (212.101.68.110) by daedalus.apache.org with SMTP; 15 Sep 2003 09:33:29 -0000 Received: (qmail 13479 invoked by uid 95); 15 Sep 2003 09:28:03 -0000 Received: from rprades@porcelanosa.com by rosone by uid 92 with qmail-scanner-1.10 (uvscan: v4.1.60/v4217. . Clear:0. Processed in 0.263199 secs); 15 sep 2003 09:28:03 -0000 Received: from unknown (HELO pcramon) (128.1.1.12) by rosone.porcelanosa.com with SMTP; 15 Sep 2003 09:28:03 -0000 From: "Ramon Prades" To: Subject: RE: about lucent and exist Date: Mon, 15 Sep 2003 11:33:49 +0200 Message-ID: <002f01c37b6c$7b3f0d40$0cd4a8c0@pcramon> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal In-Reply-To: <3F633DF8.30708@che-che.com> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Juan Jose Do you think we should drop Lucene and use Xindice instead? This is what I think: - Use Xindice. - Populate the database using a crawler and cocoon's xml-views. - Create a search page with a number of options as in "search in = content", "search in title" and so on. Regards. Ram=F3n > -----Mensaje original----- > De: Juan Jose Pablos [mailto:cheche@che-che.com]=20 > Enviado el: s=E1bado, 13 de septiembre de 2003 17:56 > Para: forrest-dev@xml.apache.org > Asunto: Re: about lucent and exist >=20 >=20 > Stefano Mazzocchi wrote: > >=20 > > Lucene is based on algorithms that don't allow the above. > >=20 >=20 > Thanks for backing this up. That was my initial feeling. >=20 > > For that, you need what is called an "xml database", which=20 > could be,=20 > > in > > the most simple case, a collection of files in a file=20 > system and a very=20 > > slow incremental collector that opens all files, scans them=20 > and collects=20 > > the matching elements and returns the results as a new=20 > document. In the=20 > > best case, it's a semi-structured database with multidimensional=20 > > indexing features (exist and xindice are much closer to that). > >=20 >=20 > I am happy to look at xindice. >=20 > >=20 > > You are trying to create "virtual documents" out of=20 > XML-aware queries > > over a repository of hierarchical content (not necessarely XML, but=20 > > XML-viewable). >=20 > Are you saying that because we are making the request to document-v12=20 > schema? I am not sure about this. I am not thinking about doing the=20 > request to the document-v12 schema. >=20 > In Forrest we are importing from another schema and on that=20 > process we=20 > are losing information ( i.e. becames

). So I=20 > would like=20 > to get a search on the source and get the results to where I can=20 > retrieve that document. >=20 > > Eh, if it was that easy. You are implying that: > >=20 > > 1) a tag is used to indicate the semantics of the nodes contained > > therein. Although this is generally the case (and there is=20 > the ability=20 > > to have RDF/XML to performm this way) this is not generalizable. >=20 > I would like to see an example on this. >=20 > >=20 > > 2) without namespaces, there is a tremendous semantic=20 > collision. With > > namespaces, you are assuming that the namespace refers to=20 > the 'meaning'=20 > > of the tag, again not generalizable. > >=20 >=20 > ok, I have not mention anything about namespaces, the request=20 > that put=20 > as an example only deals with faq schema. I had not thought=20 > about multi=20 > namespace documents or other type of XML input. >=20 > > This said, I agree that having the ability to run XQuery=20 > queries over a=20 > > content repository that exposes XML views would be a=20 > tremendous help. > > Just don't call it "semantic searching", because that's not=20 > even close=20 > > (but very few are able to explain the difference and the=20 > reason why we=20 > > need the entire RDF stack in the first place, so don't worry). > >=20 > > --=20 > > Stefano. >=20 > ok, I will not used that name, I will not worry either. >=20 > Cheers, > Cheche >=20 >=20 >=20