lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Indexing with foreign key
Date Sun, 31 Oct 2010 20:47:00 GMT
Hmmmm. Are you too memory limited to do a first pass through the file and
the key/download links part in a map, then make another pass through the
indexing the data and grabbing the link from your map? I'm assuming that
there's a lot less than 200M in just the key/link part.

Alternatively (and this would probably be kinda slow, but...) still do a 2
process, but instead of making a map, put the data in a Lucene index on
Then the second pass searches that index for the data to add to the docs
in your "real" index.


On Sun, Oct 31, 2010 at 12:17 PM, Paulo Levi <> wrote:

> I'm stepping tru a rdf file (the project gutenberg catalog) and sending
> data
> to a lucene index to allow searches of titles authors and such. However the
> gutenberg rdf is a little bit "special". It has two sections, one for
> title,
> authors, collaborators and such, and (after all the books) starts the other
> section that has the download links. The connection is a kind of foreign
> key
> that exists on both tags (a unique number id). While i don't need to search
> the download link, i do need to save it.
> I'm memory limited and can't put in memory the 200 mb file that the catalog
> is. I'm wondering if there is some way for me to use the number id to
> connect both kinds of information without having to keep things in memory.
> A
> first search for the things i want, and a second using the number id?
> It seems very clumsy. I'm not actually using a database, and i don't want
> to
> use very large libraries. Compass is 60 mbs (!). I tried lucenesail for a
> while, but it has stopped working and the code is a mess (it is not adapted
> to the filtering of the gutemberg rdf that i'm doing).

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message