lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@odoko.co.uk>
Subject Re: Distributed Indexing
Date Tue, 01 Feb 2011 20:38:56 GMT

On Tue, 01 Feb 2011 19:04 +0000, "Alex Cowell"
<alxcwll@gmail.com> wrote:

  I noticed there is a comment in the
  org.apache.solr.servlet.DirectSolrConnection class which
  reads, "//Find a way to turn List<ContentStream> into
  File/SolrDocument". Did anyone find a way to do this?

  Turns out that comment was left over from some experimenting
  one of our team was doing. But I suppose the question still
  stands.
  Addressing the "retrieve the unique ID from the document"
  issue, does it matter if the unique ID you do the hash on is
  the actual uniqueKey of the document? Surely as long as you
  generate some value unique for each document to index (for
  example, the name of the doc/stream + the current time) it
  would still distribute the documents as we expect?


Well, one requirement I've heard for this is for it to be
deterministic. That is, a document will always go to the same
shard, and you can work out at any point in time where a
particular document is.

Once you've parsed the document to a SolrInputDocument, surely
you can get the ID/uniqueKey out? I'll do some digging tomorrow
AM.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


Mime
View raw message