lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Numerical ids for terms?
Date Tue, 12 Apr 2011 11:21:00 GMT
On Tue, Apr 12, 2011 at 13:41, Gregor Heinrich <gregor@arbylon.net> wrote:
> Hi -- has there been any effort to create a numerical representation of
> Lucene indices. That is, to use the Lucene Directory backend as a large
> term-document matrix at index level. As this would require bijective mapping
> between terms (per-field, as customary in Lucene) and a numerical index
> (integer, monotonous from 0 to numTerms()-1), I guess this requires some
> some special modifications to the Lucene core.
Lucene index already provides term <-> id mapping in some form.

> Another interesting feature would be to use Lucene's Directory backend for
> storage of large dense matrices, for instance to data-mining tasks from
> within Lucene.
Lucene's Directory is a dumb abstraction for random-access named
write-once byte streams.
It doesn't add /any/ value over mmap.

> Any suggestions?
*troll mode on* Use numpy/scipy? :)

-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: earwin@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message