lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5944) Support updates of numeric DocValues
Date Fri, 27 May 2016 02:37:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303367#comment-15303367
] 

Hoss Man commented on SOLR-5944:
--------------------------------


Ok ... more in depth comments reviewing the latest patch (ignoring some of the general higher
level stuff i've previously commented on).

(So far i've still focused on reviewing the tests, because we should make sure they're rock
solid before any discussion of refacoting/improving/changing the code)

----

* in general, all these tests seem to depend on autoCommit being disabled, and use a config
that is setup that way, but don't actaully assert that it's true in case someone changes the
configs in the future
** TestInPlaceUpdate can get direct access to the SolrCore verify that for certain to
** the distrib tests might be able to use one of hte new cnfig APIs to check this (i don't
know off the top of my head)
*** at a minimum define a String constant for the config file name in TestInPlaceUpdate and
refer to it in the other tests where the same config is expected with a comment explaining
that we're *assuming* it has autoCommit disabled and that TestInPlaceUpdate will fail if it
does not.

* TestInPlaceUpdate
** SuppressCodecs should be removed
** should at least have class level javadocs explaining what's being tested
** testUpdatingDocValues
*** for addAndGetVersion calls where we don't care about the returned version, don't bother
assigning it to a variable (distracting)
*** for addAndGetVersion calls where we do care about the returned version, we need check
it for every update to that doc...
**** currently version1 is compared to newVersion1 to assert that an update incrememnted the
version, but in between those 2 updates are 4 other places where that document was updated
-- we have to assert it has the expected value (either the same as before, or new - and if
new record it) after all of those addAndGetVersion calls, or we can't be sure where/why/how
a bug exists if that existing comparison fails.
**** ideally we should be asserting the version of every doc when we query it right along
side the assertion for it's updated "ratings" value
*** most of the use of "field(ratings)" can probbaly just be replaced with "ratings" now that
DV are returnable -- although it's nice to have both included in the test at least once to
demo that both work, but when doing that there should be a comment making it clear why
** testOnlyPartialUpdatesBetweenCommits
*** ditto comment about checking return value from addAndGetVersion
*** this also seems like a good place to to test if doing a redundent atomic update (either
via set to the same value or via inc=0) returns a new version or not -- should it?
** DocInfo should be a private static class and have some javadocs
** based on how testing has gone so far, and the discover of LUCENE-7301 it seems clear that
adding even single thread, single node, randomized testing of lots of diff types of add/update
calls would be good to add
*** we could refactor/improve the "checkReplay" function I added in the last patch to do more
testing of a randomly generated Iterable of "commands" (commit, doc, doc+atomic mutation,
etc...)
*** and of course: improve checkReplay to verify RTG against hte uncommited model as well
*** testReplayFromFile and getSDdoc should probably go away once we have more structured tests
for doing this
** createMap can be elimianted -- callers can just use SolrTestCaseJ4.map(...)
** In general the tests in this class should include more queries / sorting against the updated
docvalues field after commits to ensure that the updated value is searchable & sortable
** Likewise the test methods in this class should probably have a lot more RTG checks -- with
filter queries that constrain against the updated docvalues field, and checks of the expected
version field -- to ensure that is all working properly.

* InPlaceUpdateDistribTest
** SuppressCodecs should be removed
** should at least have class level javadocs explaining what's being tested
** Once LUCENE-7301 is fixed and we can demonstate that this passes reliably all of the time,
we should ideally refactor this to subclass SolrCloudTestCase
** in general, the "pick a random client" logic should be refactored so that sometimes it
randomly picks a CloudSolrClient
** there should almost certianly be some "delete all docs and optimize" cleanup in between
all of these tests
*** easy to do in an @Before method if we refactor to subclass SolrCloudTestCase
** docValuesUpdateTest
*** should randomize numdocs
*** we need to find away to eliminate the hardcoded "Thread.sleep(500);" calls...
**** if initially no docs have a rating value, then make the (first) test query be for {{rating:\[\*
TO \*\]}} and execute it in a rety loop until the numFound matches numDocs.
**** likewise if we ensure all ratings have a value such that abs(ratings) < X, then the
second update can use an increment such that abs(inc) > X\*3 and we can use {{-ratings:\[-X
TO X\]}} as the query in a retry loop
** ensureRtgWorksWithPartialUpdatesTest
*** even if we're only going to test one doc, we should ensure there are a random num docs
in the index (some before the doc we're editing, and some after)
*** if we're testing RTG, then we should be testing the version returned from every /get call
against the last version returned from every update
** outOfOrderUpdatesIndividualReplicaTest
*** ditto comments about only one doc
*** ditto comments about testing the expected version in RTG requests
*** if we're sending updates direct to replicas to test how they handle out of order updates,
then something better assert exactly where the leader is hosted and ensure we don't send to
it by mistake
*** what's the point of using a threadpool and SendUpdateToReplicaTask here?  why not just
send the updates in a (randdomly assigned) determinisitc order?
**** if we are going to use an ExecutorService, then the result of awaitTermination has to
be checked
**** ... and shutdown & awaitTermination have to be called in that order
*** since this tests puts replicas out of sync, a "delete all docs and optimize" followed
up "wait for recovers" should happen at the end of this test (or just in between every test)
.. especially if we refactor it (or to protect someone in the future who might refactor it)
** delayedReorderingFetchesMissingUpdateFromLeaderTest
*** ditto previous comments about using a threadpool and SendUpdateToReplicaTask
**** even more so considering the "Thread.sleep(100);" ... what's the pont of using a threadpool
if we want the requests to be sequential?
*** Is there no way we can programatically tell if LIR has kicked in? ... pehaps by setting
a ZK watch? ... this "Thread.sleep(500);" is no garuntee and seens arbitrary.
**** at a minimum polling in a loop for the expected results seems better then just a hardcoded
sleep
** SendUpdateToReplicaTask
*** based on how it's used, i'm not really sure i see the point in this class, but assuming
it continues to exist...
*** constructor takes in a Random, but method uses that global {{random()}} anyway.
**** should probably take in a seed, and construct it's own Random
**** {{random()}} ensures that each Thread gets it's own consistent Random instance -- but
in Callables like this each Thread having a consistent seed doesn't help the reproducibility
since there's no garutee which Threed from an ThreadPool (Executor) will invoke call().
*** instead of returning true, this should be a Callable<UpdateResponse> and call()
should return the results of the request so the caller can assert it was successful (via Future.get().getStatus())
** getReplicaValue
*** using SolrTestCaseJ4.params(...) would make this method a lot shorter
*** based on where/how this method is used, i don't understand why it returns String instead
of just Object
** assertReplicaValue
*** should take in some sort of assertion message and/or build/append an assertion message
using the clientId
*** if getReplicaValue returns an Object, this can take an "Object expected" param and eliminate
abonch of toString & string concating throughout the test
** simulatedUpdateRequest
*** if this method is going to assume that the only shard you ever want to similulate an update
to is SHARD1 then the method name should be "simulatedUpdateRequestToShard1Replica"
*** better still - why not ask the DocRouter which shard this doc belongs in, and fetch the
leader URL tha way?
** most usage of "addFields" can just be replaced with a call to "sdoc(...)" to simplify code
** replace createMap usage with SolrTestCaseJ4.map
** why override tearDown if we're just calling super?
** in general, i think this test would be a lot easier to read if there were well named variables
for HttpSolrClient instances pointed at specific replicas (ie HttpSolrClient SHARD1_LEADER
= ...; HttpSolrClient SHARD1_REPLICA1 = ...; etc...) and passed those around to the various
methods instead of magic ints (ie: "1", "2") to refer to which index in the static clients
list should be used for a given update.

* TestStressInPlaceUpdates
** ditto comments from InPlaceUpdateDistribTest about regarding @SupressCodecs, javadocs,
and extending SolrCloudTestCase once LUCENE-7301 is fixed and we're sure this test passes
reliably.
*** also: we should really make this test use multiple shards
** stressTest
*** it would be a lot cleaner/clearer if we refactored these anonymous Thread classes into
private static final (inner) classes and instantiated them like normal objects
**** makes it a lot easier to see what threads access/share what state
**** better still would be implementing these "workers" as Callable instances and using an
ExecutorService
*** "operations" comment is bogus (it's not just for queries)
*** maxConcurrentCommits WTF?
**** has a comment that it should be less the # warming threads, but does that even make sense
i na distrib test?
**** currently set to nWriteThreads -- so what's the point, when/how could we ever possibly
have more concurrent commits then the number of threads? doesn't that just mean that at the
moment every write thread is allowed to commit if it wants to?
**** if there is a reason for it, then replaceing "numCommitting" with a {{new Semaphore(maxConcurrentCommits)}}
would probably make more sense
*** why is the "hardCommit start" logic calling both {{commit();}} and {{clients.get(chosenClientIndex).commit();}}
?
*** I'm not convinced the "{{synchronize \{...\}; commit stuff; syncrhonize \{ ... \};}}"
sequence is actually thread safe...
**** T-W1: commit sync block 1: newCommittedModel = copy(model), version = snapshotCount++;
**** T-W2: updates a doc and adds it to model
**** T-W1: commit
**** T-W1: commit sync block 2: committedModel = newCommittedModel
**** T-R3: read sync block: get info from committedModel
**** T-R3: query for doc
**** ...
*** ... in the above sequence, query results seen by thread T-R3 won't match the model because
the update from T-W2 made it into the index before the commit, but after the model was copied
**** i guess it's not a huge problem because the query thread doesn't bother to assert anything
unless the versions match -- but that seems kind of risky ... we could theoretically never
assert anything
*** having at least one pass over the model checking every doc at the end of the test seems
like a good idea no matter what
*** I'm certain the existing "synchronized (model)" block is not thread safe relative to the
synchronized blocks that copy the model into commitedModel, because the "model.put(...)" calls
can change the iterator and trigger a ConcurrentModificationException
*** there's a bunch of "TODO" blocks realted to deletes that still need implemented
*** the writer threads should construct the SolrInputDocument themselves, and log the whole
document (not just the id) when they log things, so it's easier to tell from the logs what
updates succeed and which were rejected because of version conflicts
*** why is {{//fail("Were were results: "+response);}} commented out?
*** there's a lot of "instanceof ArrayList" checks that make no sense to me since the object
came from getFirstValue
** DocInfo
*** should be a private static class and have some javadocs
*** or sould be a public class in it's own file, w/javadocs, and re-used in the various tests
that want ot reuse it
** verbose
*** why does this method exist? why aren't callers just using log.info(...) directly?
*** or if callers really need to pass big sequences of stuff, they can use {{log.info("\{\}",
Arrays.asList(...))}}
*** or worst case: this method can simplified greatly to do that internally
** addDocAndGetVersion
*** using SolrTestCaseJ4.sdoc and SolrTestCaseJ4.params will make this method a lot sorder
*** why are we synchronizing on cloudClient but updating with leaderClient?
**** if the point is just to ensure all udpates happem synchronously regardless of client,
then we should just define some {{public static final Object UPDATE_SYNC = new Object("sync
lock for updates");}} and use that
** getClientForLeader
*** i know this method is currently just a workaround for SOLR-8733, noting that in the method
javadocs seems important
*** if we refactor this test to use multiple shards before SOLR-8733 gets resolved, this method
can take in a uniqueKey, and consult the DocRouter to pick the correct shard/node.
** replace createMap usage with SolrTestCaseJ4.map
** why override tearDown if we're just calling super?


* SolrDocument
** applyOlderUpdate
*** as mentioned before, i don't think this method is correct when it comes to multivalued
fields, and should have more unit tests of various permutations to be sure
*** this functionality should probably be moved to a private helper method in UpdateLog (it
doesn't do anything that requires it have access to the internals of the SolrDocument)
*** no matter where it lives, it should have some javadocs


> Support updates of numeric DocValues
> ------------------------------------
>
>                 Key: SOLR-5944
>                 URL: https://issues.apache.org/jira/browse/SOLR-5944
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Shalin Shekhar Mangar
>         Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
SOLR-5944.patch, TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt,
TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt,
hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be really nice
to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message