lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David E. Wheeler <da...@kineticode.com>
Subject [lucy-dev] A Schema for PGXN
Date Fri, 18 Mar 2011 20:29:10 GMT
Howdy Lucites,

I'm starting work on an index for PGXN. Not heard of PGXN? Think of it as CPAN for PostgreSQL.

  http://pgxn.org/

Anyway, the things I want to index are:

* Distributions. Includes name, version, tags, abstract, description, user, and some other
stuff.

* Extensions (modules in CPAN-speak). Mainly documentation in HTML.

* Tags. Contains a list of distributions associated with tags.

* User. Includes name, email, URL, twitter nick, and a list of distributions.

* Documentation: Random docs associated with a distribution but not a specific extension

By default, a user will be able to search all these things at once. So I was thinking that
I'd have just one schema/index, and use categories to separate the different objects. Given
that, I was thinking of a schema with:

Title:     Name of a distribution/extension/tag/user
Abstract:  For distributions and extensions
Content:   Description and random docs for distributions,
           documentation body for extensions, distribution names
	   for users and tags
Tags:      Tags associated with an distribution
Metadata:  Additional metadata: email addresses, URLs, dates,
           and other stuff associated with a distribution.

So for those fields that don't apply to a thing, like "tags" for a tag object, I'd just provide
no value. Otherwise, I'd like to do a full-text search on all these fields.

So, does this seem like a reasonable search schema? I would appreciate any feedback and suggestions.

Thanks!

David


Mime
View raw message