Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A87E6A0A for ; Wed, 1 Jun 2011 23:28:24 +0000 (UTC) Received: (qmail 99195 invoked by uid 500); 1 Jun 2011 23:28:23 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 99170 invoked by uid 500); 1 Jun 2011 23:28:23 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 99159 invoked by uid 99); 1 Jun 2011 23:28:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 23:28:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vw0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 23:28:18 +0000 Received: by vwl1 with SMTP id 1so543025vwl.1 for ; Wed, 01 Jun 2011 16:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=HDOTkv2GAfqDrZM9vZwlmh8khs8J/avaAY0S+8h2C5g=; b=PDL3WZDG7mlp+ctUKuJ9ODk967oD0Hh4/jzlVw55Qy3DBgyRjlhV5uOXmz08JuReUr kjW3Pd15jXQfsXJ7ZgWqfMGhOY8GVLxLMJqU6NmT9fhqMa1tKDIQ9w0gy24PsNgFCKyp SELL00TmCIPEwWHb7uPl/GZ0V8Wn2z5y+yQTE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=TKOWCSokqDsbvB/zYRMXwR9RyWeWuXXPLuzGYyOROj6rUc+LNV9wWuJosUtq9Xc3bK SUNfuOc1Lcw6Fb86MTXbFeS6IaZ9VB8cxAt37hrCULQnowEttC1AuSWZ0aT1BCWinVqJ yK30TWg5qo2g4puQnXUw45PIC8ajAp0D1CtTo= Received: by 10.52.18.14 with SMTP id s14mr85344vdd.164.1306970877099; Wed, 01 Jun 2011 16:27:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.110.101 with HTTP; Wed, 1 Jun 2011 16:27:37 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Wed, 1 Jun 2011 16:27:37 -0700 Message-ID: Subject: Re: Exploring the potential of a Mahout classification system To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=20cf3054a12fdc7e4704a4aede7d --20cf3054a12fdc7e4704a4aede7d Content-Type: text/plain; charset=UTF-8 Probably. I mean that in two senses. Mahout can probably be used to build an acceptable classifier. And that classifier will give you scored outputs that represent the classifiers best guess at the answer. I would be willing to bet (up to 25 cents) that the top few categories will be correct. I would guess, but would not bet, that the top answer will be correct more than half the time. With good interface design this should be very usable. The classifier should easily run fast enough to allow you to do a live categorization of topics as the person types. That might be bad, but it would be pretty trippy to watch. Chapter 16 of Mahout in Action describes how to build a training/classification/web service pipeline for exactly this sort of thing. The source code is available freely under Apache license to purchasers of the book. (conflict of interest here ... I may someday get royalties from that book) On Wed, Jun 1, 2011 at 3:11 PM, Baker, Tristan wrote: > Can Mahout's classification system be used to classify a customer authored > problem statement in real time? I am imagining a system that would > periodically train a classifier in an offline fashion and then leverage that > classification index to provide real-time classifications of customer > problem statements as they are received. Is this possible? --20cf3054a12fdc7e4704a4aede7d--