lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <>
Subject use index, big or small?
Date Fri, 04 May 2012 23:47:49 GMT
I have an index containing all students, now I want to do an index
search inside an Apache Hadoop mapper,

for each (record from mapper input reader) {
    output ="name:"+  + " OR " + " id:" + );

my question is whether I should shard the index (across terms, not
splitting the same postings list for one term) or simply replicate it.
the index for the entire dataset is not too big, so it can fig into
my local disk, and I can copy it to every node in the cluster, and let
them sit there all the time, so no copy overhead is incurred.
the only argument in favor of sharding is that a smaller index might
be faster.  but since index search is only O(lg(n)) time, maybe this
time saving is very small.

so will sharding be worth the effort?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message