Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of james.mckinley@cengage.com
 designates 69.32.147.12 as permitted sender)
From: "McKinley, James T" <james.mckinley@cengage.com>
To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
Subject: RE: ToChildBlockJoinQuery question
Thread-Topic: ToChildBlockJoinQuery question
Thread-Index: AdA0zbsdeiwsaTVLSHOgaHm8Zp5UtgBAH1QA///l1ac=
Date: Wed, 21 Jan 2015 21:34:45 +0000
Message-ID: <436BACDAE49A8E4DB786F9238CFAB76E26CA391C@OHCINMAIL04.corp.local>
References: 
 <436BACDAE49A8E4DB786F9238CFAB76E26CA378F@OHCINMAIL04.corp.local>,<CAASL1-_wFgVPR1N+VQYpNaUN0yWVr3mwCiAdJ-qyNGF5Hb1X8A@mail.gmail.com>
In-Reply-To: 
 <CAASL1-_wFgVPR1N+VQYpNaUN0yWVr3mwCiAdJ-qyNGF5Hb1X8A@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Hi Greg,

Thanks for responding to my question.  I added some extra conditions to the=
 IndexRunnable run method, namely I required AGTY:np in the source query fo=
r the parent docs and required that both the creatorDocs and workDocs actua=
lly contain documents or else the addDocuments call would never be made:

		public void run() {
			IndexSearcher searcher =3D new IndexSearcher(reader);

			try {
				int count =3D 0;
				for (String crid : crids) {
					List<Document> docs =3D new ArrayList<>();
				=09
					BooleanQuery abidQuery =3D new BooleanQuery();
					abidQuery.add(new TermQuery(new Term("ABID", crid)), Occur.MUST);
					abidQuery.add(new TermQuery(new Term("AGPR", "true")), Occur.MUST);
					abidQuery.add(new TermQuery(new Term("AGTY", "np")), Occur.MUST);
				=09
					TermQuery cridQuery =3D new TermQuery(new Term("CRID", crid));
				=09
					TopDocs creatorDocs =3D searcher.search(abidQuery, Integer.MAX_VALUE);
					TopDocs workDocs =3D searcher.search(cridQuery, Integer.MAX_VALUE);
				=09
					if ((creatorDocs.scoreDocs.length > 0) && (workDocs.scoreDocs.length >=
 0)) {
						for (int i =3D 0; i < workDocs.scoreDocs.length; i++) {
							docs.add(reader.document(workDocs.scoreDocs[i].doc));
						}
				=09
						docs.add(reader.document(creatorDocs.scoreDocs[0].doc));
					=09
						writer.addDocuments(docs);
						if (++count % 100 =3D=3D 0) {
							System.out.println(id + " =3D " + count);
							writer.commit();
						}
					}
				}
			} catch (IOException e) {
				throw new RuntimeException(e);
			}
		}

I then modified the runToChildBlockJoinQuery method to first perform a sear=
ch with the parent query and parent filter. Then using the id of each paren=
t named person document I did a query for the named works with that creator=
 id (essentially reversing the query that was done to create the BlockJoin =
index) and I do indeed get works back for every named person that passes th=
e parent query and filter.  However I still get the IllegalStateException c=
omplaining about a non-FixedBitSet doc id set when doing the ToChildBlockJo=
inQuery. Here is that code:

	private void runToChildBlockJoinQuery(String indexPath) throws IOException=
 {
		FSDirectory dir =3D FSDirectory.open(new File(indexPath));
		IndexReader reader =3D DirectoryReader.open(dir);
		IndexSearcher searcher =3D new IndexSearcher(reader);
	=09
		TermQuery parentFilterQuery =3D new TermQuery(new Term("AGTY", "np"));
		TermQuery parentQuery =3D new TermQuery(new Term("NT", "american"));
		Filter parentFilter =3D new CachingWrapperFilter(new QueryWrapperFilter(p=
arentFilterQuery));

		TopDocs creatorDocs =3D searcher.search(parentQuery, parentFilter, Intege=
r.MAX_VALUE);
	=09
		for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) {
			String[] ids =3D reader.document(scoreDoc.doc).getValues("ABID");
			BooleanQuery cridQuery =3D new BooleanQuery();
			for (String id : ids) {
				cridQuery.add(new TermQuery(new Term("CRID", id)), Occur.SHOULD);
			}
			TopDocs worksDocs =3D searcher.search(cridQuery, Integer.MAX_VALUE);
			System.out.println(worksDocs.scoreDocs.length);
		}
	=09
		ToChildBlockJoinQuery tcbjq =3D new ToChildBlockJoinQuery(parentQuery, pa=
rentFilter, true);
	=09
		TopDocs worksDocs =3D searcher.search(tcbjq, Integer.MAX_VALUE);  // =3D=
=3D> IllegalStateException
	}

So I think all the parent docs have child docs and they should have been in=
dexed in the same addDocuments call with the parent being the last doc in t=
he list.  Then, on a lark, I just made the parentFilterQuery and the parent=
Query the same and still got the exception.

Am I understanding how this is supposed to work?  What I think I am (and sh=
ould be) doing is providing a query and filter that specifies the parent do=
cs and the ToChildBlockJoinQuery should return me all the child docs for th=
e resulting parent docs.  Is this correct?  The reason I think I'm not unde=
rstanding is that I don't see why I need both a filter and a query to speci=
fy the parent docs when a single query or filter should suffice.  Am I misu=
nderstanding what parentQuery and parentFilter mean, they both refer to par=
ent docs right?

I attempted to attach a small tar.gz file (< 1MB) to this message that cont=
ained a 100 parent index (~10,000 docs total) that gives the exception with=
 my block join query, but the mailing list rejected my message, if there's =
a better place to send/upload this index let me know and I surely will.  Th=
anks again for any help.

Jim

________________________________________
From: Gregory Dearing [gregdearing@gmail.com]
Sent: Wednesday, January 21, 2015 1:01 PM
To: java-user@lucene.apache.org
Subject: Re: ToChildBlockJoinQuery question

James,

I haven't actually ran your example, but I think the source problem is that
your source query ("NT:American") is hitting documents that have no
children.

The reason the exception is so weird is that one of your index segments
contains zero documents that match your filter.  Specifically, there's an
index segment containing docs matching "NT:american", but with no documents
matching "AGTY:np".

This will cause CachingWrapperFilter, which normally returns a FixedBitSet,
to instead return a generic "Empty" DocIdSet.  Which leads to the exception
from ToChildBlockJoinQuery.

The summary is, make sure that your source query only hits documents that
were actually added using 'addDocuments()'.  Since it looks like you're
extracting your block relationships from the existing index, that might
mean that you'll need to add some extra metadata to the newly created docs
instead of just cloning what already exists.

-Greg


On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T <
james.mckinley@cengage.com> wrote:

> Hi,
>
> I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following
> Mike McCandless' blog post:
>
>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.=
html
>
> I have a set of child documents which are named works and a set of parent
> documents which are named persons that are the creators of the named
> works.  The parent document has a nationality and the child document does
> not.  I want to query the children (named works) limiting by the
> nationality of the parent (named person).  I've indexed the documents as
> follows (I'm pulling the docs from an existing index):
>
>         private void createNamedWorkIndex(String srcIndexPath, String
> destIndexPath) throws IOException {
>                 FSDirectory srcDir =3D FSDirectory.open(new
> File(srcIndexPath));
>                 FSDirectory destDir =3D FSDirectory.open(new
> File(destIndexPath));
>
>                 IndexReader reader =3D DirectoryReader.open(srcDir);
>
>                 Version version =3D Version.LUCENE_48;
>                 IndexWriterConfig conf =3D new IndexWriterConfig(version,
> new StandardTextAnalyzer(version));
>
>                 Set<String> crids =3D getCreatorIds(reader);
>
>                 String[] crida =3D crids.toArray(new String[crids.size()]=
);
>
>                 int numThreads =3D 24;
>                 ExecutorService executor =3D
> Executors.newFixedThreadPool(numThreads);
>
>                 int numCrids =3D crids.size();
>                 int batchSize =3D numCrids / numThreads;
>                 int remainder =3D numCrids % numThreads;
>
>                 System.out.println("Inserting work/creator blocks using "
> + numThreads + " threads...");
>                 try (IndexWriter writer =3D new IndexWriter(destDir, conf=
)){
>                         for (int i =3D 0; i < numThreads; i++) {
>                                 String[] cridRange;
>                                 if (i =3D=3D numThreads - 1) {
>                                         cridRange =3D
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder)=
;
>                                 } else {
>                                         cridRange =3D
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1));
>                                 }
>                                 String id =3D "" + ((char)('A' + i));
>                                 Runnable indexer =3D new IndexRunnable(id=
 ,
> reader, writer, new HashSet<String>(Arrays.asList(cridRange)));
>                                 executor.execute(indexer);
>                         }
>                         executor.shutdown();
>                         executor.awaitTermination(2, TimeUnit.HOURS);
>                 } catch (Exception e) {
>                         executor.shutdownNow();
>                         throw new RuntimeException(e);
>                 } finally {
>                         reader.close();
>                         srcDir.close();
>                         destDir.close();
>                 }
>
>                 System.out.println("Done!");
>         }
>
>         public static class IndexRunnable implements Runnable {
>                 private String id;
>                 private IndexReader reader;
>                 private IndexWriter writer;
>                 private Set<String> crids;
>
>                 public IndexRunnable(String id, IndexReader reader,
> IndexWriter writer, Set<String> crids) {
>                         this.id =3D id;
>                         this.reader =3D reader;
>                         this.writer =3D writer;
>                         this.crids =3D crids;
>                 }
>
>                 @Override
>                 public void run() {
>                         IndexSearcher searcher =3D new IndexSearcher(read=
er);
>
>                         try {
>                                 int count =3D 0;
>                                 for (String crid : crids) {
>                                         List<Document> docs =3D new
> ArrayList<>();
>
>                                         BooleanQuery abidQuery =3D new
> BooleanQuery();
>                                         abidQuery.add(new TermQuery(new
> Term("ABID", crid)), Occur.MUST);
>                                         abidQuery.add(new TermQuery(new
> Term("AGPR", "true")), Occur.MUST);
>
>                                         TermQuery cridQuery =3D new
> TermQuery(new Term("CRID", crid));
>
>                                         TopDocs creatorDocs =3D
> searcher.search(abidQuery, Integer.MAX_VALUE);
>                                         TopDocs workDocs =3D
> searcher.search(cridQuery, Integer.MAX_VALUE);
>
>                                         for (int i =3D 0; i <
> workDocs.scoreDocs.length; i++) {
>
> docs.add(reader.document(workDocs.scoreDocs[i].doc));
>                                         }
>
>                                         if (creatorDocs.scoreDocs.length =
>
> 0) {
>
> docs.add(reader.document(creatorDocs.scoreDocs[0].doc));
>                                         }
>
>                                         writer.addDocuments(docs);
>                                         if (++count % 100 =3D=3D 0) {
>                                                 System.out.println(id + "
> =3D " + count);
>                                                 writer.commit();
>                                         }
>                                 }
>                         } catch (IOException e) {
>                                 throw new RuntimeException(e);
>                         }
>                 }
>         }
>
> I then attempt to perform a block join query as follows:
>
>         private void runToChildBlockJoinQuery(String indexPath) throws
> IOException {
>                 FSDirectory dir =3D FSDirectory.open(new File(indexPath))=
;
>                 IndexReader reader =3D DirectoryReader.open(dir);
>                 IndexSearcher searcher =3D new IndexSearcher(reader);
>
>                 TermQuery parentQuery =3D new TermQuery(new Term("NT",
> "american"));
>                 TermQuery parentFilterQuery =3D new TermQuery(new
> Term("AGTY", "np"));
>                 Filter parentFilter =3D new CachingWrapperFilter(new
> QueryWrapperFilter(parentFilterQuery));
>
>                 ToChildBlockJoinQuery tcbjq =3D new
> ToChildBlockJoinQuery(parentQuery, parentFilter, true);
>
>                 TopDocs worksDocs =3D searcher.search(tcbjq, 20);
>
>                 displayWorks(reader, searcher, worksDocs);
>         }
>
> and I get the following exception:
>
> Exception in thread "main" java.lang.IllegalStateException: parentFilter
> must return FixedBitSet; got org.apache.lucene.util.WAH8DocIdSet@34e671de
>         at
> org.apache.lucene.search.join.ToChildBlockJoinQuery$ToChildBlockJoinWeigh=
t.scorer(ToChildBlockJoinQuery.java:148)
>         at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
>         at
> BlockJoinQueryTester.runToChildBlockJoinQuery(BlockJoinQueryTester.java:7=
3)
>         at BlockJoinQueryTester.main(BlockJoinQueryTester.java:40)
>
> I don't understand what I'm doing wrong and what a "FixedBitSet" is and
> why I don't get one out of my filter.  Is FixedBitSet a special kind of
> OpenBitSet and what does "fixed" mean in this context?  Thanks for any he=
lp.
>
> Jim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org