lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Indexing a database
Date Tue, 10 Jun 2003 20:03:54 GMT
This is really a DB question.  I don't see how the use of Lucene index
would help you.  Once you get the data out of that index, it will again
be in your RAM.

What you should do is figure out if you can get the results back from
the DB in batches.  PostgreSQL and MySQL extended SQL syntax to allow
one to do that.  (LIMIT and OFFSET keywords in Pg world, maybe some
other ones in MySQL, none that I know in Oracle land).
You should really ask people who use those DBs.

On a side, somebody has already written a tool to make a Lucene index
out of DB tables.  Why re-invent the wheel?


--- Bryan LaPlante <> wrote:
> Hi,
> I need some input about what direction to take. I have written a
> package for
> indexing a database using a query or list of tables to be indexed. I
> wanted
> control over how each column in each row of the result gets indexed
> or not.
> The structure first and then the problem where advice is needed.
> Structure:
>     //create an instance of :
>     ds = DataStore(String driver,String uri,String pswd, String user)
>     // pass ds to:
>     dir = DSDirectory(DataStore ds,String query);
> Calling dir.list() now will produce the entire resultset made up of a
> DSFile() representing a row in the set and a Hashtable of attributes
> stored
> internally to the dsfile object representing each column.
> The problem:
>     If you didn't guess before, this is quite memory intensive when
> you are
> talking about a sizable recordset. I need a way to either hold a
> reference
> to the records and let the user incrementally request the next n
> number of
> rows to index or I need to store the records in a temporary location
> (non-memory) where they can be retrieved on request.
> Possible solution:
>     I have one idea to create a temp Lucene index representing the
> rows and
> columns and then when the list method is called I could retrieve the
> data
> from the index there by not have the in memory constraints.
> Thoughts?
> Bryan LaPlante
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message