cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: Infrasearch
Date Thu, 01 Jun 2000 11:50:08 GMT
COFFMAN Steven wrote:
> Hi,
>         I think I found a piece that will help with making Cocoon semantic
> search friendly as Stefano expounded upon the need for.
>         The Gnutella folks have come up with InfraSearch, a way to use
> Gnutella as a search engine aggregator...sorta. Infrasearch passes a query
> to any of the participating clients, whether they are search engines or
> websites, and displays their responses. The mechanism by which all the
> responses are sorted is still pretty beta.

I've recently analyzed the Gnutella protocol and I find it _way_
unoptimized, I don't think it scales very well.

TCP/IP took decades to reach the impressive scalability it has. Don't
make the mistake to take scalability for granted when going distributed!

> However, if cocoon had a local search that it could respond to the
> InfraSearch queries with, it would be instantly part of the Stefano
> Knowledge Web. 

Please, don't say that. I've just quoted TBL's ideas on this and
concentrate on "how to make it work" rather than "how to specify its

> Then we could leave it to the Infrasearch folks to work out
> how to sort Cocoon site responses.

In my vision, each site collaborates to the research. In fact,
Infrasearch has something that no other search engine has: maximum query

I believe there are _many_ ways to improve such a search engine and many
ways to _abuse_ it (pirated mp3 and all that).

Each Cocoon, at the end, should be a cell in this network.... well, I
say let's stay at the window to see what happens. It's way to early to
tell but I do see problems in this way of doing web resource, expecially
in a metadata-less web as today.

The web won over other information systems because of its scalability.
Yahoo, Altavista and all that are centralized, they are the "old model".

A new model is about distributed small engines that work on the content
they own, on the metadata they understand. The only required thing is
the "semantic contract" between local searchers.

This can done by RDFSchema and the concept of metadata inheritance it
puts forward.

But there is a _long_ way to go:

- designing the protocol for scalability (the Gnutella protocol is not
- testing time-dependent behavior and geographical dependency of the
"meta-net" entry point (you could be too far away from the information
you want for the query time you specified)
- creating the logic language for metadata querying (W3C proposed
Metalog, any alternatives?)
- creating a way to send "compiled" metadata between clusters (to save
network usage and disk space on clusters)
- design such meta-net to be safe and (possibly) protect copyrights with
specific metadata.

I coined a name for such a project, which should implement Gnutella-like
capabilities as well as be able to understand RDF and handle
metadata-driven queries and pass them along to the other entities
involved in such a network.


Two reasons:

1) assonance with "semantic"
2) Fox Mulder's sister (yes, people, I was watching the X-Files episode
when Mulder finally understands what happened to his sister)

This is one of the two crazy projects I was secretly working on (only at
a theorical level for now). The other will remain a secret, at least for

But I will leave them in my todo list for now since we have to finish
Cocoon2 before all that.

> I still don't know where to find the local Cocoon search piece. Maybe DAML
> or IBM.

I don't follow you here.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message