lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4724) TaxonomyReader drops empty string component from CategoryPath
Date Sun, 27 Jan 2013 13:57:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563811#comment-13563811
] 

Michael McCandless commented on LUCENE-4724:
--------------------------------------------

OK I agree: let's disallow empty string at indexing time.

I think this means the CP ctor that takes String... varargs should throw an exception if any
component is the empty string?

Not sure what (if anything?) to do about indices "out there" that already have empty string
... I'm not sure these ever causes problems except to PrintTaxonomyStats ... so I could just
add some robustness to that one tool.

However, I don't really like being "tolerant" to trailing delimiter, multiple delimiters in
a row, etc. (like filesystems are): I would prefer that we are strict and accept only one
form.  That ambiguity can only cause problems/confusion.
                
> TaxonomyReader drops empty string component from CategoryPath
> -------------------------------------------------------------
>
>                 Key: LUCENE-4724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4724
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4724.patch, LUCENE-4724.patch
>
>
> I ran the new PrintTaxonomyStats on a Wikipedia facets index, and it hit an AIOOBE because
there was a child of the /categories path that had only one component ... this was created
because I had added new CategoryPath("categories", "") during indexing.
> I think TaxoReader should preserve and return that empty string from .getPath?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message