forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Jose Pablos <che...@che-che.com>
Subject Re: about lucent and exist
Date Wed, 17 Sep 2003 06:02:31 GMT
Ramon,

> 
> Do you think we should drop Lucene and use Xindice instead?
> 

I think that we should not drop anything until we get a replacement that 
improves the actual situation. Lucene works and there is room for Lucene 
and xindice.


 > - Populate the database using a crawler and cocoon's xml-views.

On todays forrest situation we have this schemas:

document
sdocbook/docbook
howto
faq
changes/todo/contributors??
book/site




> This is what I think:
> 
> - Use Xindice.
> - Create a search page with a number of options as in "search in content",
> "search in title" and so on.
> 
> Regards.
> 
> Ramón
> 
> 
>>-----Mensaje original-----
>>De: Juan Jose Pablos [mailto:cheche@che-che.com] 
>>Enviado el: sábado, 13 de septiembre de 2003 17:56
>>Para: forrest-dev@xml.apache.org
>>Asunto: Re: about lucent and exist
>>
>>
>>Stefano Mazzocchi wrote:
>>
>>>Lucene is based on algorithms that don't allow the above.
>>>
>>
>>Thanks for backing this up. That was my initial feeling.
>>
>>
>>>For that, you need what is called an "xml database", which 
>>
>>could be, 
>>
>>>in
>>>the most simple case, a collection of files in a file 
>>
>>system and a very 
>>
>>>slow incremental collector that opens all files, scans them 
>>
>>and collects 
>>
>>>the matching elements and returns the results as a new 
>>
>>document. In the 
>>
>>>best case, it's a semi-structured database with multidimensional 
>>>indexing features (exist and xindice are much closer to that).
>>>
>>
>>I am happy to look at xindice.
>>
>>
>>>You are trying to create "virtual documents" out of 
>>
>>XML-aware queries
>>
>>>over a repository of hierarchical content (not necessarely XML, but 
>>>XML-viewable).
>>
>>Are you saying that because we are making the request to document-v12 
>>schema? I am not sure about this. I am not thinking about doing the 
>>request to the document-v12 schema.
>>
>>In Forrest we are importing from another schema and on that 
>>process we 
>>are losing information ( i.e. <author/> becames <p> ). So I 
>>would like 
>>to get a search on the source and get the results to where I can 
>>retrieve that document.
>>
>>
>>>Eh, if it was that easy. You are implying that:
>>>
>>> 1) a tag is used to indicate the semantics of the nodes contained
>>>therein. Although this is generally the case (and there is 
>>
>>the ability 
>>
>>>to have RDF/XML to performm this way) this is not generalizable.
>>
>>I would like to see an example on this.
>>
>>
>>> 2) without namespaces, there is a tremendous semantic 
>>
>>collision. With
>>
>>>namespaces, you are assuming that the namespace refers to 
>>
>>the 'meaning' 
>>
>>>of the tag, again not generalizable.
>>>
>>
>>ok, I have not mention anything about namespaces, the request 
>>that put 
>>as an example only deals with faq schema. I had not thought 
>>about multi 
>>  namespace documents or other type of XML input.
>>
>>
>>>This said, I agree that having the ability to run XQuery 
>>
>>queries over a 
>>
>>>content repository that exposes XML views would be a 
>>
>>tremendous help.
>>
>>>Just don't call it "semantic searching", because that's not 
>>
>>even close 
>>
>>>(but very few are able to explain the difference and the 
>>
>>reason why we 
>>
>>>need the entire RDF stack in the first place, so don't worry).
>>>
>>>-- 
>>>Stefano.
>>
>>ok, I will not used that name, I will not worry either.
>>
>>Cheers,
>>Cheche
>>
>>
> 
> 
> 



Mime
View raw message