Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 58179 invoked from network); 7 Apr 2011 12:05:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Apr 2011 12:05:21 -0000 Received: (qmail 72381 invoked by uid 500); 7 Apr 2011 12:05:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72349 invoked by uid 500); 7 Apr 2011 12:05:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72341 invoked by uid 99); 7 Apr 2011 12:05:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2011 12:05:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chrisspen@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2011 12:05:13 +0000 Received: by wwi18 with SMTP id 18so2086771wwi.5 for ; Thu, 07 Apr 2011 05:04:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=EMKIEQrjoBPnMdiWdFfKRoAK8NRLyh1WBxsGk4zJ8ZU=; b=iWxIWbprWPbT82qeeylU/B/V0aLdLc7bhUmh9U4p6wy+c9su8wfLmwTSv8HjxGMuFQ hbHr4kUXnsBaioCC763T6jY0aGUshOZ4/32YCX1/+GL7efcNExAu6XkYE0Io1q7TyHSn RtMXKFu2XaCFY8GjfQQJKpuZs5Zr93iQa3pXU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=H3N33jxRjt2j7mp8tNs9Q3rBLA/0/8XLR+j85R5RsMTzhFeqXZ9QIQjl2wGAj2nBRc U+TwPppDNeI7litvBJMyHqmWWKj+KIGLs1hEnaWORQKNAFNXKIewYkyInzqclevaFCBz 654B4Nvs38SoM2l3HIn/weWGVXu6XzztX3RKM= MIME-Version: 1.0 Received: by 10.216.121.208 with SMTP id r58mr757596weh.61.1302177892631; Thu, 07 Apr 2011 05:04:52 -0700 (PDT) Received: by 10.216.93.19 with HTTP; Thu, 7 Apr 2011 05:04:52 -0700 (PDT) In-Reply-To: <962010.45169.qm@web130106.mail.mud.yahoo.com> References: <962010.45169.qm@web130106.mail.mud.yahoo.com> Date: Thu, 7 Apr 2011 08:04:52 -0400 Message-ID: Subject: Re: Indexing Non-Textual Data From: Chris Spencer To: java-user@lucene.apache.org Cc: Otis Gospodnetic Content-Type: multipart/alternative; boundary=00261883adfcb99c1c04a052ea3a X-Virus-Checked: Checked by ClamAV on apache.org --00261883adfcb99c1c04a052ea3a Content-Type: text/plain; charset=ISO-8859-1 My question wasn't just about classification. I'm asking, is there a way to classify non-textual data with Lucene? Yes, I know how to Google, and I've read the mailing list logs. All of those results only concern classifying simple text, not arbitrary numeric features. Regards, Chris On Thu, Apr 7, 2011 at 1:04 AM, Otis Gospodnetic wrote: > Hi Chris, > > Yes, people have done classification with Lucene before. Have a look at > http://search-lucene.com/?q=classifier&fc_project=Lucene for some > discussions > and actual code (in old JIRA issues) > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Chris Spencer > > To: java-user@lucene.apache.org > > Sent: Wed, April 6, 2011 7:46:45 PM > > Subject: Indexing Non-Textual Data > > > > Hi, > > > > I'm new to Lucene, so forgive me if this is a newbie question. I have a > > dataset composed of several thousand lists of 128 integer features, each > > list associated with a class label. Would it be possible to use Lucene > as a > > classifier, by indexing the label with respect to these integer > features, > > and then classify a new list by finding the most similar labels with > Lucene? > > > > I'm specifically interested in doing so through the PyLucene API, so > I've > > been going through the PyLucene samples, but they only seem to involve > > indexing text, not continuous features (understandably). Could anyone > point > > me to an example that indexes non-textual data? > > > > I think the project Lire (http://www.semanticmetadata.net/lire/) is > using > > Lucene to do something similar to this, although with an emphasis on > image > > features. I've dug into their code a little, but I'm not a strong Java > > programmer, so I'm not sure how they're pulling it off, nor how I might > > translate this into the PyLucene API. In your opinion, is this a > practical > > use of Lucene? > > > > Regards, > > Chris > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --00261883adfcb99c1c04a052ea3a--