Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F6E410A3C for ; Wed, 21 Jan 2015 21:39:15 +0000 (UTC) Received: (qmail 321 invoked by uid 500); 21 Jan 2015 21:39:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 261 invoked by uid 500); 21 Jan 2015 21:39:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 250 invoked by uid 99); 21 Jan 2015 21:39:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2015 21:39:12 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of james.mckinley@cengage.com designates 69.32.147.12 as permitted sender) Received: from [69.32.147.12] (HELO ohcinmx02.cengagelearning.com) (69.32.147.12) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2015 21:39:08 +0000 Received: from pps.filterd (ohcinppagent05 [127.0.0.1]) by ohcinppagent05.corp.local (8.14.5/8.14.5) with SMTP id t0LLYBO9006686 for ; Wed, 21 Jan 2015 16:34:47 -0500 Received: from ohcinmail05.corp.local (ohcinmail05.corp.local [10.160.100.187]) by ohcinppagent05.corp.local with ESMTP id 1s2geas2tu-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Wed, 21 Jan 2015 16:34:47 -0500 Received: from OHCINMAIL04.corp.local ([fe80::b8b0:88ba:1025:2f55]) by OHCINMAIL05.corp.local ([fe80::f567:39a6:504d:30a5%18]) with mapi id 14.03.0181.006; Wed, 21 Jan 2015 16:34:46 -0500 From: "McKinley, James T" To: "java-user@lucene.apache.org" Subject: RE: ToChildBlockJoinQuery question Thread-Topic: ToChildBlockJoinQuery question Thread-Index: AdA0zbsdeiwsaTVLSHOgaHm8Zp5UtgBAH1QA///l1ac= Date: Wed, 21 Jan 2015 21:34:45 +0000 Message-ID: <436BACDAE49A8E4DB786F9238CFAB76E26CA391C@OHCINMAIL04.corp.local> References: <436BACDAE49A8E4DB786F9238CFAB76E26CA378F@OHCINMAIL04.corp.local>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.160.100.1] x-tm-as-product-ver: SMEX-10.2.0.1135-7.500.1018-21272.000 x-tm-as-result: No--54.987600-8.000000-31 x-tm-as-user-approved-sender: No x-tm-as-user-blocked-sender: No Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2015-01-21_05:2015-01-21,2015-01-21,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1501210213 X-Virus-Checked: Checked by ClamAV on apache.org Hi Greg, Thanks for responding to my question. I added some extra conditions to the= IndexRunnable run method, namely I required AGTY:np in the source query fo= r the parent docs and required that both the creatorDocs and workDocs actua= lly contain documents or else the addDocuments call would never be made: public void run() { IndexSearcher searcher =3D new IndexSearcher(reader); try { int count =3D 0; for (String crid : crids) { List docs =3D new ArrayList<>(); =09 BooleanQuery abidQuery =3D new BooleanQuery(); abidQuery.add(new TermQuery(new Term("ABID", crid)), Occur.MUST); abidQuery.add(new TermQuery(new Term("AGPR", "true")), Occur.MUST); abidQuery.add(new TermQuery(new Term("AGTY", "np")), Occur.MUST); =09 TermQuery cridQuery =3D new TermQuery(new Term("CRID", crid)); =09 TopDocs creatorDocs =3D searcher.search(abidQuery, Integer.MAX_VALUE); TopDocs workDocs =3D searcher.search(cridQuery, Integer.MAX_VALUE); =09 if ((creatorDocs.scoreDocs.length > 0) && (workDocs.scoreDocs.length >= 0)) { for (int i =3D 0; i < workDocs.scoreDocs.length; i++) { docs.add(reader.document(workDocs.scoreDocs[i].doc)); } =09 docs.add(reader.document(creatorDocs.scoreDocs[0].doc)); =09 writer.addDocuments(docs); if (++count % 100 =3D=3D 0) { System.out.println(id + " =3D " + count); writer.commit(); } } } } catch (IOException e) { throw new RuntimeException(e); } } I then modified the runToChildBlockJoinQuery method to first perform a sear= ch with the parent query and parent filter. Then using the id of each paren= t named person document I did a query for the named works with that creator= id (essentially reversing the query that was done to create the BlockJoin = index) and I do indeed get works back for every named person that passes th= e parent query and filter. However I still get the IllegalStateException c= omplaining about a non-FixedBitSet doc id set when doing the ToChildBlockJo= inQuery. Here is that code: private void runToChildBlockJoinQuery(String indexPath) throws IOException= { FSDirectory dir =3D FSDirectory.open(new File(indexPath)); IndexReader reader =3D DirectoryReader.open(dir); IndexSearcher searcher =3D new IndexSearcher(reader); =09 TermQuery parentFilterQuery =3D new TermQuery(new Term("AGTY", "np")); TermQuery parentQuery =3D new TermQuery(new Term("NT", "american")); Filter parentFilter =3D new CachingWrapperFilter(new QueryWrapperFilter(p= arentFilterQuery)); TopDocs creatorDocs =3D searcher.search(parentQuery, parentFilter, Intege= r.MAX_VALUE); =09 for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) { String[] ids =3D reader.document(scoreDoc.doc).getValues("ABID"); BooleanQuery cridQuery =3D new BooleanQuery(); for (String id : ids) { cridQuery.add(new TermQuery(new Term("CRID", id)), Occur.SHOULD); } TopDocs worksDocs =3D searcher.search(cridQuery, Integer.MAX_VALUE); System.out.println(worksDocs.scoreDocs.length); } =09 ToChildBlockJoinQuery tcbjq =3D new ToChildBlockJoinQuery(parentQuery, pa= rentFilter, true); =09 TopDocs worksDocs =3D searcher.search(tcbjq, Integer.MAX_VALUE); // =3D= =3D> IllegalStateException } So I think all the parent docs have child docs and they should have been in= dexed in the same addDocuments call with the parent being the last doc in t= he list. Then, on a lark, I just made the parentFilterQuery and the parent= Query the same and still got the exception. Am I understanding how this is supposed to work? What I think I am (and sh= ould be) doing is providing a query and filter that specifies the parent do= cs and the ToChildBlockJoinQuery should return me all the child docs for th= e resulting parent docs. Is this correct? The reason I think I'm not unde= rstanding is that I don't see why I need both a filter and a query to speci= fy the parent docs when a single query or filter should suffice. Am I misu= nderstanding what parentQuery and parentFilter mean, they both refer to par= ent docs right? I attempted to attach a small tar.gz file (< 1MB) to this message that cont= ained a 100 parent index (~10,000 docs total) that gives the exception with= my block join query, but the mailing list rejected my message, if there's = a better place to send/upload this index let me know and I surely will. Th= anks again for any help. Jim ________________________________________ From: Gregory Dearing [gregdearing@gmail.com] Sent: Wednesday, January 21, 2015 1:01 PM To: java-user@lucene.apache.org Subject: Re: ToChildBlockJoinQuery question James, I haven't actually ran your example, but I think the source problem is that your source query ("NT:American") is hitting documents that have no children. The reason the exception is so weird is that one of your index segments contains zero documents that match your filter. Specifically, there's an index segment containing docs matching "NT:american", but with no documents matching "AGTY:np". This will cause CachingWrapperFilter, which normally returns a FixedBitSet, to instead return a generic "Empty" DocIdSet. Which leads to the exception from ToChildBlockJoinQuery. The summary is, make sure that your source query only hits documents that were actually added using 'addDocuments()'. Since it looks like you're extracting your block relationships from the existing index, that might mean that you'll need to add some extra metadata to the newly created docs instead of just cloning what already exists. -Greg On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T < james.mckinley@cengage.com> wrote: > Hi, > > I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following > Mike McCandless' blog post: > > > http://blog.mikemccandless.com/2012/01/searching-relational-content-with.= html > > I have a set of child documents which are named works and a set of parent > documents which are named persons that are the creators of the named > works. The parent document has a nationality and the child document does > not. I want to query the children (named works) limiting by the > nationality of the parent (named person). I've indexed the documents as > follows (I'm pulling the docs from an existing index): > > private void createNamedWorkIndex(String srcIndexPath, String > destIndexPath) throws IOException { > FSDirectory srcDir =3D FSDirectory.open(new > File(srcIndexPath)); > FSDirectory destDir =3D FSDirectory.open(new > File(destIndexPath)); > > IndexReader reader =3D DirectoryReader.open(srcDir); > > Version version =3D Version.LUCENE_48; > IndexWriterConfig conf =3D new IndexWriterConfig(version, > new StandardTextAnalyzer(version)); > > Set crids =3D getCreatorIds(reader); > > String[] crida =3D crids.toArray(new String[crids.size()]= ); > > int numThreads =3D 24; > ExecutorService executor =3D > Executors.newFixedThreadPool(numThreads); > > int numCrids =3D crids.size(); > int batchSize =3D numCrids / numThreads; > int remainder =3D numCrids % numThreads; > > System.out.println("Inserting work/creator blocks using " > + numThreads + " threads..."); > try (IndexWriter writer =3D new IndexWriter(destDir, conf= )){ > for (int i =3D 0; i < numThreads; i++) { > String[] cridRange; > if (i =3D=3D numThreads - 1) { > cridRange =3D > Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder)= ; > } else { > cridRange =3D > Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1)); > } > String id =3D "" + ((char)('A' + i)); > Runnable indexer =3D new IndexRunnable(id= , > reader, writer, new HashSet(Arrays.asList(cridRange))); > executor.execute(indexer); > } > executor.shutdown(); > executor.awaitTermination(2, TimeUnit.HOURS); > } catch (Exception e) { > executor.shutdownNow(); > throw new RuntimeException(e); > } finally { > reader.close(); > srcDir.close(); > destDir.close(); > } > > System.out.println("Done!"); > } > > public static class IndexRunnable implements Runnable { > private String id; > private IndexReader reader; > private IndexWriter writer; > private Set crids; > > public IndexRunnable(String id, IndexReader reader, > IndexWriter writer, Set crids) { > this.id =3D id; > this.reader =3D reader; > this.writer =3D writer; > this.crids =3D crids; > } > > @Override > public void run() { > IndexSearcher searcher =3D new IndexSearcher(read= er); > > try { > int count =3D 0; > for (String crid : crids) { > List docs =3D new > ArrayList<>(); > > BooleanQuery abidQuery =3D new > BooleanQuery(); > abidQuery.add(new TermQuery(new > Term("ABID", crid)), Occur.MUST); > abidQuery.add(new TermQuery(new > Term("AGPR", "true")), Occur.MUST); > > TermQuery cridQuery =3D new > TermQuery(new Term("CRID", crid)); > > TopDocs creatorDocs =3D > searcher.search(abidQuery, Integer.MAX_VALUE); > TopDocs workDocs =3D > searcher.search(cridQuery, Integer.MAX_VALUE); > > for (int i =3D 0; i < > workDocs.scoreDocs.length; i++) { > > docs.add(reader.document(workDocs.scoreDocs[i].doc)); > } > > if (creatorDocs.scoreDocs.length = > > 0) { > > docs.add(reader.document(creatorDocs.scoreDocs[0].doc)); > } > > writer.addDocuments(docs); > if (++count % 100 =3D=3D 0) { > System.out.println(id + " > =3D " + count); > writer.commit(); > } > } > } catch (IOException e) { > throw new RuntimeException(e); > } > } > } > > I then attempt to perform a block join query as follows: > > private void runToChildBlockJoinQuery(String indexPath) throws > IOException { > FSDirectory dir =3D FSDirectory.open(new File(indexPath))= ; > IndexReader reader =3D DirectoryReader.open(dir); > IndexSearcher searcher =3D new IndexSearcher(reader); > > TermQuery parentQuery =3D new TermQuery(new Term("NT", > "american")); > TermQuery parentFilterQuery =3D new TermQuery(new > Term("AGTY", "np")); > Filter parentFilter =3D new CachingWrapperFilter(new > QueryWrapperFilter(parentFilterQuery)); > > ToChildBlockJoinQuery tcbjq =3D new > ToChildBlockJoinQuery(parentQuery, parentFilter, true); > > TopDocs worksDocs =3D searcher.search(tcbjq, 20); > > displayWorks(reader, searcher, worksDocs); > } > > and I get the following exception: > > Exception in thread "main" java.lang.IllegalStateException: parentFilter > must return FixedBitSet; got org.apache.lucene.util.WAH8DocIdSet@34e671de > at > org.apache.lucene.search.join.ToChildBlockJoinQuery$ToChildBlockJoinWeigh= t.scorer(ToChildBlockJoinQuery.java:148) > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) > at > BlockJoinQueryTester.runToChildBlockJoinQuery(BlockJoinQueryTester.java:7= 3) > at BlockJoinQueryTester.main(BlockJoinQueryTester.java:40) > > I don't understand what I'm doing wrong and what a "FixedBitSet" is and > why I don't get one out of my filter. Is FixedBitSet a special kind of > OpenBitSet and what does "fixed" mean in this context? Thanks for any he= lp. > > Jim > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org