lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-477) AnalysisRequestHandler
Date Tue, 12 Feb 2008 05:22:08 GMT

    [ https://issues.apache.org/jira/browse/SOLR-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567960#action_12567960
] 

Ryan McKinley commented on SOLR-477:
------------------------------------

{quote}
I admit I don't fully understand the interplay between the other writers (JSON, etc.) so help
would be appreciated there.
{quote}

essentially the types supported by TextResponseWriter are automatically supported by the standard
writers.  Check line 109 writeVal() in: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/request/TextResponseWriter.java


{quote}
As for a SearchComponent piece, I'd like to hear more.  Does the SearchComponent  piece handle
ContentStreams?  That is, could I just send my <add>...</add> to it and it would
spit out the tokens?  On the query side of things, I think it would be useful to see how the
query is analyzed, so that makes sense in a SearchComponent.  Perhaps we can find common code?
{quote}

No ContentStreams in the version I'm working with.  I am analyzing stored fields so the client
can link directly to a valid 'filter'.  To see it in action, check:
http://www.digitalcommonwealth.org/browse/archive:C%2FWMARS+Digital+Treasures+Respository/

Note how the subject line gets split into linkable tokens.  Check that stored content "Mass."
actually links to "/subject:Massachusetts/"

I've also found this really useful for debugging what tokens exist for given fields -- of
course it only works for stored fields.

After you finish the handler version, I'll see what can be shared.


> AnalysisRequestHandler
> ----------------------
>
>                 Key: SOLR-477
>                 URL: https://issues.apache.org/jira/browse/SOLR-477
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-477.patch
>
>
> Being able to programmatically access tokenization information can be quite useful not
only in Solr, but in other NLP applications where token vectors are necessary.
> The patch to follow creates an AnalysisRequestHandler which processes a document through
the analysis process and returns a response filled with tokens, their offsets, position inc.,
type and value.
> Patch also adds some character array processing to Xml and adds Token handling to XMLWriter.
> I only implemented Xml output, as I don't know JSON or the other types.  If someone else
is so motivated, they can add those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message