Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 95136 invoked from network); 26 Jan 2009 17:45:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Jan 2009 17:45:19 -0000 Received: (qmail 36617 invoked by uid 500); 26 Jan 2009 17:45:16 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 36574 invoked by uid 500); 26 Jan 2009 17:45:16 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 36562 invoked by uid 99); 26 Jan 2009 17:45:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2009 09:45:15 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shalinmangar@gmail.com designates 209.85.200.173 as permitted sender) Received: from [209.85.200.173] (HELO wf-out-1314.google.com) (209.85.200.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2009 17:45:09 +0000 Received: by wf-out-1314.google.com with SMTP id 28so5992790wfc.20 for ; Mon, 26 Jan 2009 09:44:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=6SqfKr0BlO19nBwrd+V6lr24iT7vQ/xNyjd1wsg+OOU=; b=VXA1nYG9Op88HlBOPbci2TqQ05sy2tT6wl7FvsELXrGhXU97dSZBEO0kW+1NyI28W/ 0WjRHfbAxpUORVhyUv/rlXFHk1qf6U62Do4aQ4c/rqtQp18FgLFwbQkzo58dkX34IkW5 O2PtI5DpkEg7jTVVTaKM2wPr2m9YgzrmkmCsY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=odEvXcz0lqcIpMtjUuL3iRghe+gnOyxCNrvPmjRAa6e4ya5K4E3COaacJ/qe+o0OGY +b0fNfzNKR/mfXugSdtUDxuHdn3VDrl3NScYw5z9iBdf8jLCNFK4TP/M0HKXOvozjp6s VwTEqEAV2wyQO6U8ZccsR1FAw2GCzkAQ/7DjM= MIME-Version: 1.0 Received: by 10.142.84.5 with SMTP id h5mr4017285wfb.81.1232991888376; Mon, 26 Jan 2009 09:44:48 -0800 (PST) In-Reply-To: <8c9f7ec80901260929n469f1ef5g7cb8e88c6b851bb5@mail.gmail.com> References: <8c9f7ec80901260929n469f1ef5g7cb8e88c6b851bb5@mail.gmail.com> Date: Mon, 26 Jan 2009 23:14:48 +0530 Message-ID: <69de18140901260944j557d34cfr2f9dabee627cbed8@mail.gmail.com> Subject: Re: Text classification with Solr From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001636e910ef8482c10461664b58 X-Virus-Checked: Checked by ClamAV on apache.org --001636e910ef8482c10461664b58 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Mon, Jan 26, 2009 at 10:59 PM, Neal Richter wrote: > Hey all, > > I'm in the processing of implementing a system to do 'text > classification' with Solr. The basic idea is to take an > ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index > it and then classify documents into the taxonomy by pushing parsed > document into the Solr search API. Why? Lucene/Solr's ability to do > weighted term boosting at both search and index time has lots of > obvious uses here. > > Has anyone worked on this or a similar project yet? I've seen some > talk on the list about this area but it's pretty thin... December > thread "Taxonomy Support on Solr". I'm assuming Grant Ingersoll is > looking at similar things with his 'taming text' project. > > I store the 'documents' in another repository and they are far too > dynamic (write intensive) for direct indexing in Solr... so the > previously suggested procedure of 1) store document 2) execute > more-like-this and 3) delete document would be too slow. > > If people are interested I could start a JIRA issue on this (I do not > see anything there at the moment). > > Thanks - Neal Richter > http://aicoder.blogspot.com > Grant did some work at https://issues.apache.org/jira/browse/SOLR-769 Take a look and see if that helps. -- Regards, Shalin Shekhar Mangar. --001636e910ef8482c10461664b58--