forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Jose Pablos <che...@che-che.com>
Subject Re: about lucent and exist
Date Wed, 17 Sep 2003 06:03:43 GMT
I had not finished my last email. please ignore...


Juan Jose Pablos wrote:
> Ramon,
> 
>>
>> Do you think we should drop Lucene and use Xindice instead?
>>
> 
> I think that we should not drop anything until we get a replacement that 
> improves the actual situation. Lucene works and there is room for Lucene 
> and xindice.
> 
> 
>  > - Populate the database using a crawler and cocoon's xml-views.
> 
> On todays forrest situation we have this schemas:
> 
> document
> sdocbook/docbook
> howto
> faq
> changes/todo/contributors??
> book/site
> 
> 
> 
> 
>> This is what I think:
>>
>> - Use Xindice.
>> - Create a search page with a number of options as in "search in 
>> content",
>> "search in title" and so on.
>>
>> Regards.
>>
>> Ramón
>>
>>
>>> -----Mensaje original-----
>>> De: Juan Jose Pablos [mailto:cheche@che-che.com] Enviado el: sábado, 
>>> 13 de septiembre de 2003 17:56
>>> Para: forrest-dev@xml.apache.org
>>> Asunto: Re: about lucent and exist
>>>
>>>
>>> Stefano Mazzocchi wrote:
>>>
>>>> Lucene is based on algorithms that don't allow the above.
>>>>
>>>
>>> Thanks for backing this up. That was my initial feeling.
>>>
>>>
>>>> For that, you need what is called an "xml database", which 
>>>
>>>
>>> could be,
>>>
>>>> in
>>>> the most simple case, a collection of files in a file 
>>>
>>>
>>> system and a very
>>>
>>>> slow incremental collector that opens all files, scans them 
>>>
>>>
>>> and collects
>>>
>>>> the matching elements and returns the results as a new 
>>>
>>>
>>> document. In the
>>>
>>>> best case, it's a semi-structured database with multidimensional 
>>>> indexing features (exist and xindice are much closer to that).
>>>>
>>>
>>> I am happy to look at xindice.
>>>
>>>
>>>> You are trying to create "virtual documents" out of 
>>>
>>>
>>> XML-aware queries
>>>
>>>> over a repository of hierarchical content (not necessarely XML, but 
>>>> XML-viewable).
>>>
>>>
>>> Are you saying that because we are making the request to document-v12 
>>> schema? I am not sure about this. I am not thinking about doing the 
>>> request to the document-v12 schema.
>>>
>>> In Forrest we are importing from another schema and on that process 
>>> we are losing information ( i.e. <author/> becames <p> ). So I would

>>> like to get a search on the source and get the results to where I can 
>>> retrieve that document.
>>>
>>>
>>>> Eh, if it was that easy. You are implying that:
>>>>
>>>> 1) a tag is used to indicate the semantics of the nodes contained
>>>> therein. Although this is generally the case (and there is 
>>>
>>>
>>> the ability
>>>
>>>> to have RDF/XML to performm this way) this is not generalizable.
>>>
>>>
>>> I would like to see an example on this.
>>>
>>>
>>>> 2) without namespaces, there is a tremendous semantic 
>>>
>>>
>>> collision. With
>>>
>>>> namespaces, you are assuming that the namespace refers to 
>>>
>>>
>>> the 'meaning'
>>>
>>>> of the tag, again not generalizable.
>>>>
>>>
>>> ok, I have not mention anything about namespaces, the request that 
>>> put as an example only deals with faq schema. I had not thought about 
>>> multi  namespace documents or other type of XML input.
>>>
>>>
>>>> This said, I agree that having the ability to run XQuery 
>>>
>>>
>>> queries over a
>>>
>>>> content repository that exposes XML views would be a 
>>>
>>>
>>> tremendous help.
>>>
>>>> Just don't call it "semantic searching", because that's not 
>>>
>>>
>>> even close
>>>
>>>> (but very few are able to explain the difference and the 
>>>
>>>
>>> reason why we
>>>
>>>> need the entire RDF stack in the first place, so don't worry).
>>>>
>>>> -- 
>>>> Stefano.
>>>
>>>
>>> ok, I will not used that name, I will not worry either.
>>>
>>> Cheers,
>>> Cheche
>>>
>>>
>>
>>
>>
> 



Mime
View raw message