Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAFppj+VNP+4deAQpFhyd+Hu6f+3pzSatt=gQp_O9KmKp8bPQmQ@mail.gmail.com>
References: 
 <CAFppj+VNP+4deAQpFhyd+Hu6f+3pzSatt=gQp_O9KmKp8bPQmQ@mail.gmail.com>
Date: Tue, 1 Mar 2016 08:03:27 -0500
Message-ID: 
 <CAOxAL61V+umH7q_Cibbd6pVTGVR3JMevAxxKhuR3WhGLvf3yzQ@mail.gmail.com>
Subject: Re: Indexing books, chapters and pages
From: Jack Krupansky <jack.krupansky@gmail.com>
To: solr-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=001a11457788264121052cfc6409

--001a11457788264121052cfc6409
Content-Type: text/plain; charset=UTF-8

To start, what is the form of your input data - is it already divided into
chapters and pages? Or... are you starting with raw PDF files?


-- Jack Krupansky

On Tue, Mar 1, 2016 at 6:56 AM, Zaccheo Bagnati <zaccheob@gmail.com> wrote:

> Hi all,
> I'm searching for ideas on how to define schema and how to perform queries
> in this use case: we have to index books, each book is split into chapters
> and chapters are split into pages (pages represent original page cutting in
> printed version). We should show the result grouped by books and chapters
> (for the same book) and pages (for the same chapter). As far as I know, we
> have 2 options:
>
> 1. index pages as SOLR documents. In this way we could theoretically
> retrieve chapters (and books?)  using grouping but
>     a. we will miss matches across two contiguous pages (page cutting is
> only due to typographical needs so concepts could be split... as in printed
> books)
>     b. I don't know if it is possible in SOLR to group results on two
> different levels (books and chapters)
>
> 2. index chapters as SOLR documents. In this case we will have the right
> matches but how to obtain the matching pages? (we need pages because the
> client can only display pages)
>
> we have been struggling on this problem for a lot of time and we're  not
> able to find a suitable solution so I'm looking if someone has ideas or has
> already solved a similar issue.
> Thanks
>

--001a11457788264121052cfc6409--