cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From COFFMAN Steven <SCoff...@CBSINC.com>
Subject RE: Infrasearch
Date Thu, 01 Jun 2000 15:14:39 GMT
Samantha is a very cool name, for a very cool project idea.

> - design such meta-net to be safe and (possibly) protect copyrights with
> specific metadata.

Safe? I'm not sure what you mean, but I'd be most worried about a hacked
Cocoon node that always responds to queries positively, similar to the web
pages that put all sorts of non-pertinent words in HTML comments to increase
keyword search hits. That or a Cocoon node that is pertinent to the query,
but of low quality. In a distributed search, trust and value of nodes should
be the primary sorting attribute. The only successful scalable distributed
appraisal systems I've seen are similar to the slashdot moderation scheme,
utilizing the masses to evaluate the massive.

My earlier confusing comment about IBM was because they've retired a local
metadata-based semantic search technology. They could come out of the blue
(sorry about the pun) with a less flawed version that could provide local
cocoon node searching.

However, I disagree about Gnutella. The Gnutella protocol itself is *very*
scalable, the existing network of Gnutella clients is not scalable because
of poor implementations. The main problem is overaggressive pinging.
[Clients should listen to and cache PONG messages destined for other
servers, and realize the redundancy of issuing their own PING. Further they
should only PING on startup, and again later only if a user manually
initiates a single PING] Other things clients should do are preventing
people from using Gnutella to chat (using queries to chat or forwarding chat
packets), ignoring pings with payloads (since they're bogus), and limited
queries per time frames (query spamming). I think these issues will be
resolved in short order, since they're known and discussed. It's not in
client's own self-interest to misbehave. Sorry for the digression.

-Steve


-----Original Message-----
From: Stefano Mazzocchi [mailto:stefano@apache.org]
Sent: Thursday, June 01, 2000 7:50 AM
To: cocoon-dev@xml.apache.org
Subject: Re: Infrasearch


COFFMAN Steven wrote:
> 
> Hi,
>         I think I found a piece that will help with making Cocoon semantic
> search friendly as Stefano expounded upon the need for.
> 
>         The Gnutella folks have come up with InfraSearch, a way to use
> Gnutella as a search engine aggregator...sorta. Infrasearch passes a query
> to any of the participating clients, whether they are search engines or
> websites, and displays their responses. The mechanism by which all the
> responses are sorted is still pretty beta.

I've recently analyzed the Gnutella protocol and I find it _way_
unoptimized, I don't think it scales very well.

TCP/IP took decades to reach the impressive scalability it has. Don't
make the mistake to take scalability for granted when going distributed!

> However, if cocoon had a local search that it could respond to the
> InfraSearch queries with, it would be instantly part of the Stefano
> Knowledge Web. 

Please, don't say that. I've just quoted TBL's ideas on this and
concentrate on "how to make it work" rather than "how to specify its
design".

> Then we could leave it to the Infrasearch folks to work out
> how to sort Cocoon site responses.

In my vision, each site collaborates to the research. In fact,
Infrasearch has something that no other search engine has: maximum query
time.

I believe there are _many_ ways to improve such a search engine and many
ways to _abuse_ it (pirated mp3 and all that).

Each Cocoon, at the end, should be a cell in this network.... well, I
say let's stay at the window to see what happens. It's way to early to
tell but I do see problems in this way of doing web resource, expecially
in a metadata-less web as today.

The web won over other information systems because of its scalability.
Yahoo, Altavista and all that are centralized, they are the "old model".

A new model is about distributed small engines that work on the content
they own, on the metadata they understand. The only required thing is
the "semantic contract" between local searchers.

This can done by RDFSchema and the concept of metadata inheritance it
puts forward.

But there is a _long_ way to go:

- designing the protocol for scalability (the Gnutella protocol is not
enough)
- testing time-dependent behavior and geographical dependency of the
"meta-net" entry point (you could be too far away from the information
you want for the query time you specified)
- creating the logic language for metadata querying (W3C proposed
Metalog, any alternatives?)
- creating a way to send "compiled" metadata between clusters (to save
network usage and disk space on clusters)
- design such meta-net to be safe and (possibly) protect copyrights with
specific metadata.

I coined a name for such a project, which should implement Gnutella-like
capabilities as well as be able to understand RDF and handle
metadata-driven queries and pass them along to the other entities
involved in such a network.

"Samantha"

Two reasons:

1) assonance with "semantic"
2) Fox Mulder's sister (yes, people, I was watching the X-Files episode
when Mulder finally understands what happened to his sister)

This is one of the two crazy projects I was secretly working on (only at
a theorical level for now). The other will remain a secret, at least for
now.

But I will leave them in my todo list for now since we have to finish
Cocoon2 before all that.

> I still don't know where to find the local Cocoon search piece. Maybe DAML
> or IBM.

I don't follow you here.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------


Mime
View raw message