Return-Path: Delivered-To: apmail-jakarta-taglibs-dev-archive@www.apache.org Received: (qmail 17185 invoked from network); 8 Jul 2004 03:39:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 8 Jul 2004 03:39:53 -0000 Received: (qmail 82523 invoked by uid 500); 8 Jul 2004 03:39:44 -0000 Delivered-To: apmail-jakarta-taglibs-dev-archive@jakarta.apache.org Received: (qmail 82470 invoked by uid 500); 8 Jul 2004 03:39:43 -0000 Mailing-List: contact taglibs-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Tag Libraries Developers List" Reply-To: "Tag Libraries Developers List" Delivered-To: mailing list taglibs-dev@jakarta.apache.org Received: (qmail 82457 invoked by uid 99); 8 Jul 2004 03:39:43 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [208.37.31.85] (HELO divine.virtualserverhost.com) (208.37.31.85) by apache.org (qpsmtpd/0.27.1) with ESMTP; Wed, 07 Jul 2004 20:39:41 -0700 Received: (qmail 4695 invoked from network); 8 Jul 2004 03:45:15 -0000 Received: from unknown (HELO ?127.0.0.1?) (61.95.192.222) by eon.co.in with SMTP; 8 Jul 2004 03:45:15 -0000 Message-ID: <40ECC1F8.6050308@mullassery.com> Date: Thu, 08 Jul 2004 09:09:36 +0530 From: Abey Mullassery User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040608 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Tag Libraries Developers List Subject: PROPOSAL New Search Tag library Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I) Motivation With the huge amounts of information available, providing search is the best way to make access to information quickly. A search facility would soon be a must-have feature of any website of average size, whether it is database backed or a set of HTML/PDF/XML documents. Hence there is an upcoming need for a search Tag library. II) Overview The exact tag names and design need to be worked out. But the basic usage scenarios are:- 1. Index a. plain text (streamed/ files) b. HTML/ XML/ PDF c. non-text with meta data (Images/ Flash) 2. Search a. use meta-data b. ranked results c. quick view/ result snippets 3. Managing a. Optimize b. Crawl c. Remove d. Update III) Requirements The search could be based on existing libraries such as:- a. Lucene b. JSearch (license issues??) Lucene requires a set of Analyzers for HTML, PDF, MS Word, MS Excel, etc., and a crawler. IV) Commitment I just started working on developing a basic version for my own use. But if we find it worthwhile to add it to the taglibs (sandbox) I could "restart" with discussions about the usage scenarios making it generic and base my development on that feedback. Thus I won't have to do it twice. Let me know your views/ comments. Abey Mullassery http://www.mullassery.com --------------------------------------------------------------------- To unsubscribe, e-mail: taglibs-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: taglibs-dev-help@jakarta.apache.org