If you use the Field.Text(String name, Reader value) version of the
Field.Text constructor, the field is tokenized and indexed but *not*
stored. This means you will be able to search and find that document,
but to know the original contents you will have to store a copy of it
elsewhere.
The Field.Text(String name, String value) version does store the
document String itself, so that's probably the origin of the confusion.
> -----Original Message-----
> From: Luke Shannon [mailto:lshannon@hypermedia.com]
> Sent: donderdag 11 november 2004 20:17
> To: Lucene Users List
> Subject: HTMLParser.getReader returning null
>
> Hello;
>
> Things were working fine. I have been re-organizing my code
> to drop into QA when I noticed I was no longer getting search
> results for my HTML files.
> When I checked things out I confirmed I was still creating
> the Documents but realized no content was being indexed.
>
> HTMLParser parser = new HTMLParser(f);
>
> // Add the tag-stripped contents as a Reader-valued Text
> field so it will
> // get tokenized and indexed.
> doc.add(Field.Text("contents", parser.getReader()));
> System.out.println("The content is " + doc.get("contents"));
>
> The SOP line above outputs a null where the contents used to
> be. Any seen this before?
>
> Thanks,
>
> Luke
>
> ----- Original Message -----
> From: "Will Allen" <wallen@Cyveillance.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Thursday, November 11, 2004 1:59 PM
> Subject: RE: Bug in the BooleanQuery optimizer? ..TooManyClauses
>
>
> Any wildcard search will automatically expand your query to
> the number of
> terms it find in the index that suit the wildcard.
>
> For example:
>
> wild*, would become wild OR wilderness OR wildman etc for
> each of the terms
> that exist in your index.
>
> It is because of this, that you quickly reach the 1024 limit
> of clauses. I
> automatically set it to max int with the following line:
>
> BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE );
>
>
> -----Original Message-----
> From: Sanyi [mailto:need4sid@yahoo.com]
> Sent: Thursday, November 11, 2004 6:46 AM
> To: lucene-user@jakarta.apache.org
> Subject: Bug in the BooleanQuery optimizer? ..TooManyClauses
>
>
> Hi!
>
> First of all, I've read about BooleanQuery$TooManyClauses, so
> I know that it
> has a 1024 Clauses
> limit by default which is good enough for me, but I still
> think it works
> strange.
>
> Example:
> I have an index with about 20Million documents.
> Let's say that there is about 3000 variants in the entire
> document set of
> this word mask: cab*
> Let's say that about 500 documents are containing the word: spectrum
> Now, when I search for "cab* AND spectrum", I don't expect it
> to throw an
> exception.
> It should first restrict the search for the 500 documents
> containing the
> word "spectrum", then it
> should collect the variants of "cab*" withing these
> documents, which turns
> out in two or three
> variants of "cab*" (cable, cables, maybe some more) and the
> search should
> return let's say 10
> documents.
>
> Similar example: When I search for "cab* AND nonexistingword" it still
> throws a TooManyClauses
> exception instead of saying "No results", since there is no
> "nonexistingword" in my document set,
> so it doesn't even have to start collecting the variations of "cab*".
>
> Is there any path for this issue?
> Thank you for your time!
>
> Sanyi
> (I'm using: lucene 1.4.2)
>
> p.s.: Sorry for re-sending this message, I was first sending it as an
> accidental reply to a wrong thread..
>
>
>
> __________________________________
> Do you Yahoo!?
> Check out the new Yahoo! Front Page.
> www.yahoo.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|