Mailing-List: contact forrest-dev-help@xml.apache.org; run by ezmlm
Precedence: bulk
Reply-To: forrest-dev@xml.apache.org
From: "Ramon Prades" <rprades@porcelanosa.com>
To: <forrest-dev@xml.apache.org>
Subject: RE: about lucent and exist
Date: Mon, 15 Sep 2003 11:33:49 +0200
Message-ID: <002f01c37b6c$7b3f0d40$0cd4a8c0@pcramon>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Importance: Normal
In-Reply-To: <3F633DF8.30708@che-che.com>

Hi Juan Jose

Do you think we should drop Lucene and use Xindice instead?

This is what I think:

- Use Xindice.
- Populate the database using a crawler and cocoon's xml-views.
- Create a search page with a number of options as in "search in =
content",
"search in title" and so on.

Regards.

Ram=F3n

> -----Mensaje original-----
> De: Juan Jose Pablos [mailto:cheche@che-che.com]=20
> Enviado el: s=E1bado, 13 de septiembre de 2003 17:56
> Para: forrest-dev@xml.apache.org
> Asunto: Re: about lucent and exist
>=20
>=20
> Stefano Mazzocchi wrote:
> >=20
> > Lucene is based on algorithms that don't allow the above.
> >=20
>=20
> Thanks for backing this up. That was my initial feeling.
>=20
> > For that, you need what is called an "xml database", which=20
> could be,=20
> > in
> > the most simple case, a collection of files in a file=20
> system and a very=20
> > slow incremental collector that opens all files, scans them=20
> and collects=20
> > the matching elements and returns the results as a new=20
> document. In the=20
> > best case, it's a semi-structured database with multidimensional=20
> > indexing features (exist and xindice are much closer to that).
> >=20
>=20
> I am happy to look at xindice.
>=20
> >=20
> > You are trying to create "virtual documents" out of=20
> XML-aware queries
> > over a repository of hierarchical content (not necessarely XML, but=20
> > XML-viewable).
>=20
> Are you saying that because we are making the request to document-v12=20
> schema? I am not sure about this. I am not thinking about doing the=20
> request to the document-v12 schema.
>=20
> In Forrest we are importing from another schema and on that=20
> process we=20
> are losing information ( i.e. <author/> becames <p> ). So I=20
> would like=20
> to get a search on the source and get the results to where I can=20
> retrieve that document.
>=20
> > Eh, if it was that easy. You are implying that:
> >=20
> >  1) a tag is used to indicate the semantics of the nodes contained
> > therein. Although this is generally the case (and there is=20
> the ability=20
> > to have RDF/XML to performm this way) this is not generalizable.
>=20
> I would like to see an example on this.
>=20
> >=20
> >  2) without namespaces, there is a tremendous semantic=20
> collision. With
> > namespaces, you are assuming that the namespace refers to=20
> the 'meaning'=20
> > of the tag, again not generalizable.
> >=20
>=20
> ok, I have not mention anything about namespaces, the request=20
> that put=20
> as an example only deals with faq schema. I had not thought=20
> about multi=20
>   namespace documents or other type of XML input.
>=20
> > This said, I agree that having the ability to run XQuery=20
> queries over a=20
> > content repository that exposes XML views would be a=20
> tremendous help.
> > Just don't call it "semantic searching", because that's not=20
> even close=20
> > (but very few are able to explain the difference and the=20
> reason why we=20
> > need the entire RDF stack in the first place, so don't worry).
> >=20
> > --=20
> > Stefano.
>=20
> ok, I will not used that name, I will not worry either.
>=20
> Cheers,
> Cheche
>=20
>=20
>=20