forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Jose Pablos <che...@che-che.com>
Subject Lucent and Xindice (Re: about lucent and exist)
Date Wed, 17 Sep 2003 06:39:30 GMT
Ramon Prades wrote:
 > Hi Juan Jose
 >
 > Do you think we should drop Lucene and use Xindice instead?

I think that we should not drop anything until we get a replacement that 
improves the actual situation. Lucene works and there is room for Lucene 
and xindice.


 > - Populate the database using a crawler and cocoon's xml-views.

Doing this it will allow to populate your indices from varios sources, 
not only files. But this implementation is independent on wherever you 
use Xindice or Lucene.


 > - Create a search page with a number of options as in "search in 
content",
 > "search in title" and so on.

I have been thinking a bit on this. Not about the search page itself, 
but about the power of been able to search to any XML format and get a 
link to the HTML/PDF page makes a big step.

But on todays forrest's situation we only have a few xml schemas:

document
howto
faq
changes/todo/contributors??
book/site
sdocbook/docbook


Out of these schema I have not found many use case examples of search:

Document-v*
-----------
Search for an author/person
Search for an acronym
Search for a figure.
Search for fixme notes.

Howto
-----------
Search for an author/person
Search for an audience (novice... etc)

FAQ
-----------
Search for an author/person
Search for a question.
Search for an answer.

...


So The work actually neede to implement in our actual release does not 
requiere much.

What do you think?

Cheers,
Cheche


> 
> This is what I think:
> 
> - Use Xindice.
> - Populate the database using a crawler and cocoon's xml-views.
> - Create a search page with a number of options as in "search in content",
> "search in title" and so on.
> 
> Regards.
> 
> Ramón
> 
> 
>>-----Mensaje original-----
>>De: Juan Jose Pablos [mailto:cheche@che-che.com] 
>>Enviado el: sábado, 13 de septiembre de 2003 17:56
>>Para: forrest-dev@xml.apache.org
>>Asunto: Re: about lucent and exist
>>
>>
>>Stefano Mazzocchi wrote:
>>
>>>Lucene is based on algorithms that don't allow the above.
>>>
>>
>>Thanks for backing this up. That was my initial feeling.
>>
>>
>>>For that, you need what is called an "xml database", which 
>>
>>could be, 
>>
>>>in
>>>the most simple case, a collection of files in a file 
>>
>>system and a very 
>>
>>>slow incremental collector that opens all files, scans them 
>>
>>and collects 
>>
>>>the matching elements and returns the results as a new 
>>
>>document. In the 
>>
>>>best case, it's a semi-structured database with multidimensional 
>>>indexing features (exist and xindice are much closer to that).
>>>
>>
>>I am happy to look at xindice.
>>
>>
>>>You are trying to create "virtual documents" out of 
>>
>>XML-aware queries
>>
>>>over a repository of hierarchical content (not necessarely XML, but 
>>>XML-viewable).
>>
>>Are you saying that because we are making the request to document-v12 
>>schema? I am not sure about this. I am not thinking about doing the 
>>request to the document-v12 schema.
>>
>>In Forrest we are importing from another schema and on that 
>>process we 
>>are losing information ( i.e. <author/> becames <p> ). So I 
>>would like 
>>to get a search on the source and get the results to where I can 
>>retrieve that document.
>>
>>
>>>Eh, if it was that easy. You are implying that:
>>>
>>> 1) a tag is used to indicate the semantics of the nodes contained
>>>therein. Although this is generally the case (and there is 
>>
>>the ability 
>>
>>>to have RDF/XML to performm this way) this is not generalizable.
>>
>>I would like to see an example on this.
>>
>>
>>> 2) without namespaces, there is a tremendous semantic 
>>
>>collision. With
>>
>>>namespaces, you are assuming that the namespace refers to 
>>
>>the 'meaning' 
>>
>>>of the tag, again not generalizable.
>>>
>>
>>ok, I have not mention anything about namespaces, the request 
>>that put 
>>as an example only deals with faq schema. I had not thought 
>>about multi 
>>  namespace documents or other type of XML input.
>>
>>
>>>This said, I agree that having the ability to run XQuery 
>>
>>queries over a 
>>
>>>content repository that exposes XML views would be a 
>>
>>tremendous help.
>>
>>>Just don't call it "semantic searching", because that's not 
>>
>>even close 
>>
>>>(but very few are able to explain the difference and the 
>>
>>reason why we 
>>
>>>need the entire RDF stack in the first place, so don't worry).
>>>
>>>-- 
>>>Stefano.
>>
>>ok, I will not used that name, I will not worry either.
>>
>>Cheers,
>>Cheche
>>
>>
> 
> 
> 



Mime
View raw message