lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tricia Williams (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".
Date Mon, 19 Jan 2009 19:13:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665195#action_12665195
] 

Tricia Williams commented on SOLR-380:
--------------------------------------

Hi Laurent,

    Thanks for your interest in my Solr PayloadComponent plugin.  I want to address all of
the questions you pose in your comment, but won't have time until early February.  I apologize
for the inconvenience but my priorities lay elsewhere right now.  Feel free to look at the
code and play in the meantime.  The code that's up there is basically proof of concept.  I've
been slowly working at improving the robustness of the code and improving performance so hopefully
there will be a improved version before the end of March.

    I'm sure there would be many people who would appreciate a Wiki page for this topic. 
Why don't you go ahead and set that up?  I'll be happy to add my two cents when I'm available.

All the best,
Tricia

> There's no way to convert search results into page-level hits of a "structured document".
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-380
>                 URL: https://issues.apache.org/jira/browse/SOLR-380
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Tricia Williams
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-380-XmlPayload.patch, SOLR-380-XmlPayload.patch, xmlpayload-example.zip,
xmlpayload-src.jar, xmlpayload.jar
>
>
> "Paged-Text" FieldType for Solr
> A chance to dig into the guts of Solr. The problem: If we index a monograph in Solr,
there's no way to convert search results into page-level hits. The solution: have a "paged-text"
fieldtype which keeps track of page divisions as it indexes, and reports page-level hits in
the search results.
> The input would contain page milestones: <page id="234"/>. As Solr processed the
tokens (using its standard tokenizers and filters), it would concurrently build a structural
map of the item, indicating which term position marked the beginning of which page: <page
id="234" firstterm="14324"/>. This map would be stored in an unindexed field in some efficient
format.
> At search time, Solr would retrieve term positions for all hits that are returned in
the current request, and use the stored map to determine page ids for each term position.
The results would imitate the results for highlighting, something like:
> <lst name="pages">
> &nbsp;&nbsp;<lst name="doc1">
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">234</int>
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">236</int>
> &nbsp;&nbsp;        </lst>
> &nbsp;&nbsp;        <lst name="doc2">
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">19</int>
> &nbsp;&nbsp;        </lst>
> </lst>
> <lst name="hitpos">
> &nbsp;&nbsp;        <lst name="doc1">
> &nbsp;&nbsp;&nbsp;&nbsp;                <lst name="234">
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                        <int
name="pos">14325</int>
> &nbsp;&nbsp;&nbsp;&nbsp;                </lst>
> &nbsp;&nbsp;        </lst>
> &nbsp;&nbsp;        ...
> </lst>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message