forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Jose Pablos <>
Subject Re: about lucent and exist
Date Sat, 13 Sep 2003 15:55:36 GMT
Stefano Mazzocchi wrote:
> Lucene is based on algorithms that don't allow the above.

Thanks for backing this up. That was my initial feeling.

> For that, you need what is called an "xml database", which could be, in 
> the most simple case, a collection of files in a file system and a very 
> slow incremental collector that opens all files, scans them and collects 
> the matching elements and returns the results as a new document. In the 
> best case, it's a semi-structured database with multidimensional 
> indexing features (exist and xindice are much closer to that).

I am happy to look at xindice.

> You are trying to create "virtual documents" out of XML-aware queries 
> over a repository of hierarchical content (not necessarely XML, but 
> XML-viewable).

Are you saying that because we are making the request to document-v12 
schema? I am not sure about this. I am not thinking about doing the 
request to the document-v12 schema.

In Forrest we are importing from another schema and on that process we 
are losing information ( i.e. <author/> becames <p> ). So I would like 
to get a search on the source and get the results to where I can 
retrieve that document.

> Eh, if it was that easy. You are implying that:
>  1) a tag is used to indicate the semantics of the nodes contained 
> therein. Although this is generally the case (and there is the ability 
> to have RDF/XML to performm this way) this is not generalizable.

I would like to see an example on this.

>  2) without namespaces, there is a tremendous semantic collision. With 
> namespaces, you are assuming that the namespace refers to the 'meaning' 
> of the tag, again not generalizable.

ok, I have not mention anything about namespaces, the request that put 
as an example only deals with faq schema. I had not thought about multi 
  namespace documents or other type of XML input.

> This said, I agree that having the ability to run XQuery queries over a 
> content repository that exposes XML views would be a tremendous help.
> Just don't call it "semantic searching", because that's not even close 
> (but very few are able to explain the difference and the reason why we 
> need the entire RDF stack in the first place, so don't worry).
> -- 
> Stefano.

ok, I will not used that name, I will not worry either.


View raw message