lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: Index Builder
Date Tue, 07 Mar 2006 16:19:13 GMT
On 3/5/06, Grant Ingersoll <grant_ingersoll@yahoo.com> wrote:
> So, I was thinking I could write a driver program that takes in my files and then calls
the API directly.  Is this doable?

It's doable...
While it will be more efficient, it's not clear how much you will
gain, esp if you run with multiple CPUs (IndexWriting is highly
synchronized).

Check out the UpdateHandler abstract class:
  public abstract int addDoc(AddUpdateCommand cmd) throws IOException;
  public abstract void delete(DeleteUpdateCommand cmd) throws IOException;
  public abstract void deleteByQuery(DeleteUpdateCommand cmd) throws
IOException;
  public abstract void commit(CommitUpdateCommand cmd) throws IOException;
  public abstract void close() throws IOException;

While the implementation of the UpdateHandler is pluggable, there
isn't a place to plug in different client handlers (like there is with
RequestHandler).  You could create another servlet in the same webapp
and get the current UpdateHandler (SolrCore.updateHandler) and use
that to update the index.

Seems like there isn't a getter for SolrCore.updateHandler... feel
free to sumbit a patch if you want to go this route.

You could even drop down to a lower level and use DocumentBuilder to
create your own Lucene Document instances and write them with an
IndexWriter yourself.

-Yonik


>  Do you do it all through HTTP requests or through a driver that calls the API?
> I think I would prefer the API calls for bulk loading.  Where should I look for these?
>
> -Grant
>
> Yonik Seeley <yseeley@gmail.com> wrote: On 3/5/06, Grant Ingersoll  wrote:
> > What/where is the Index Builder that is referred to in  http://wiki.apache.org/solr/CollectionBuilding?
>
> It's currently client-supplied (i.e. there isn't one).
>
> Having all Solr users have to write their own builders (code that gets
> data from a source and posts XML documents) certainly isn't optimal.
>
> It would be nice if we could give Solr a database URL with some SQL,
> and have it automatically slurp and index the records.  It would also
> be nice to be able to grab documents from a CSV or other simple
> structured text file and index them.
>
> These ideas are on already on the task list on the (currently down) Wiki.
>
> -Yonik

Mime
View raw message