Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4641F3156 for ; Wed, 4 May 2011 18:41:05 +0000 (UTC) Received: (qmail 88643 invoked by uid 500); 4 May 2011 18:41:04 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 88608 invoked by uid 500); 4 May 2011 18:41:04 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 88597 invoked by uid 99); 4 May 2011 18:41:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 18:41:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.177 as permitted sender) Received: from [209.85.216.177] (HELO mail-qy0-f177.google.com) (209.85.216.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 18:40:58 +0000 Received: by qyl38 with SMTP id 38so1756444qyl.1 for ; Wed, 04 May 2011 11:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=R79upMUyFQQ2KnHoJ/PqJdnCE62ygUXCcJF1/65co0c=; b=uLPriz1vA6FCCB4VWJQCIuGueiR6XKur8y7zfjOIOwuI5XV+DJQih5rf/XREa3FEqQ rkrDO8q1wy1qeISBaee/E4QKt2n4IzOBspVVXxMxg40W24q8bih06vqC7mFr17FWAv9e X+cyx7laL3v3lUtAC5pCOiRPdaV6A8F17Jbss= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=AZRH2RdABSYbjiI8OsbTLTf1oTdaNz0RYPbn9RA44lhO8OPVblgQ1imqCJ2/rO1ghO NzgnWvaHJwBu+4lbpg4ZE8JwnClax7KsU+4bnpAMAMMvQn35zg/LQCH74pUPjP/liwyT AS/+biqMlWwtS+GiarFRCQFzx+hh6Q0J/h90w= Received: by 10.52.184.98 with SMTP id et2mr1754341vdc.285.1304534437098; Wed, 04 May 2011 11:40:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.116.42 with HTTP; Wed, 4 May 2011 11:40:17 -0700 (PDT) In-Reply-To: References: <1D8C94B3-A176-475A-A0F0-9F143715E804@apache.org> From: Ted Dunning Date: Wed, 4 May 2011 11:40:17 -0700 Message-ID: Subject: Re: LDA from Lucene Indexes To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=bcaec548a067b8751804a277979a --bcaec548a067b8751804a277979a Content-Type: text/plain; charset=UTF-8 Good point. On Wed, May 4, 2011 at 11:31 AM, Jake Mannix wrote: > On Wed, May 4, 2011 at 10:46 AM, Ted Dunning > wrote: > > > Pipelining is good for abstraction and really bad for performance (in the > > map-reduce world). > > > > My thought is that we could have a multipurpose tool. Input would be a > > lucene index and the program would read term vectors or original text as > > available. Output would be either sequence file full of text or sequence > > file full of vectors. > > > > Ok, sure, then this is modifying the lucene.vectors code, not the > seq2sparse code, right? > > -jake > --bcaec548a067b8751804a277979a--