Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 50741 invoked from network); 12 Apr 2011 11:21:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Apr 2011 11:21:30 -0000 Received: (qmail 10162 invoked by uid 500); 12 Apr 2011 11:21:28 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 10111 invoked by uid 500); 12 Apr 2011 11:21:28 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 10104 invoked by uid 99); 12 Apr 2011 11:21:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 11:21:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of earwin@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 11:21:22 +0000 Received: by qyk30 with SMTP id 30so5314127qyk.14 for ; Tue, 12 Apr 2011 04:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=SNg5MxI8NTaqj33a2oLBzQIh4rcvEnGAlsjr++jzJw0=; b=BW5bAt5gjVq/KsOCUXpHrYeaF8pUP+EZtZW7XL+ILbAn+VgytZsmwYdzK66nLqGnaC Fa46MLzZUl8HREZuc3ouNCSAEnK0uuqAsZlS+WxGiBllnaW5Bj5112fVBnP+qW4TrzpU keoFxri9y3KOmnRyaqnYNG18Mr3vna5sLP1qQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=NvYvcb3hzbRilFomP2A5QfD7Pmsb87wwoxQQA6zLrVYZMKYfSeSGEJ7nhc7vinlce9 buFcKLeLOg4cSejdB6LwNMxbaLXtIhZbkFBpSj8cydRLPJhAlg8alRmW5/kQ20s6x0uE koNIsZLSYDktwtWxa6zpGxFT8v4+x7SdrlZvw= MIME-Version: 1.0 Received: by 10.229.24.133 with SMTP id v5mr4810838qcb.92.1302607261206; Tue, 12 Apr 2011 04:21:01 -0700 (PDT) Received: by 10.229.95.75 with HTTP; Tue, 12 Apr 2011 04:21:00 -0700 (PDT) In-Reply-To: <4DA41E2F.1020408@arbylon.net> References: <4DA41E2F.1020408@arbylon.net> Date: Tue, 12 Apr 2011 15:21:00 +0400 Message-ID: Subject: Re: Numerical ids for terms? From: Earwin Burrfoot To: dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Apr 12, 2011 at 13:41, Gregor Heinrich wrote: > Hi -- has there been any effort to create a numerical representation of > Lucene indices. That is, to use the Lucene Directory backend as a large > term-document matrix at index level. As this would require bijective mapp= ing > between terms (per-field, as customary in Lucene) and a numerical index > (integer, monotonous from 0 to numTerms()-1), I guess this requires some > some special modifications to the Lucene core. Lucene index already provides term <-> id mapping in some form. > Another interesting feature would be to use Lucene's Directory backend fo= r > storage of large dense matrices, for instance to data-mining tasks from > within Lucene. Lucene's Directory is a dumb abstraction for random-access named write-once byte streams. It doesn't add /any/ value over mmap. > Any suggestions? *troll mode on* Use numpy/scipy? :) --=20 Kirill Zakharenko/=D0=9A=D0=B8=D1=80=D0=B8=D0=BB=D0=BB =D0=97=D0=B0=D1=85= =D0=B0=D1=80=D0=B5=D0=BD=D0=BA=D0=BE E-Mail/Jabber: earwin@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org