lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vanlerberghe, Luc" <Luc.Vanlerber...@bvdep.com>
Subject RE: HTMLParser.getReader returning null
Date Fri, 12 Nov 2004 10:41:34 GMT
If you use the Field.Text(String name, Reader value) version of the
Field.Text constructor, the field is tokenized and indexed but *not*
stored.  This means you will be able to search and find that document,
but to know the original contents you will have to store a copy of it
elsewhere.

The Field.Text(String name, String value) version does store the
document String itself, so that's probably the origin of the confusion.

> -----Original Message-----
> From: Luke Shannon [mailto:lshannon@hypermedia.com] 
> Sent: donderdag 11 november 2004 20:17
> To: Lucene Users List
> Subject: HTMLParser.getReader returning null
> 
> Hello;
> 
> Things were working fine. I have been re-organizing my code 
> to drop into QA when I noticed I was no longer getting search 
> results for my HTML files.
> When I checked things out I confirmed I was still creating 
> the Documents but realized no content was being indexed.
> 
>  HTMLParser parser = new HTMLParser(f);
> 
>     // Add the tag-stripped contents as a Reader-valued Text 
> field so it will
>     // get tokenized and indexed.
>     doc.add(Field.Text("contents", parser.getReader()));
>     System.out.println("The content is " + doc.get("contents"));
> 
> The SOP line above outputs a null where the contents used to 
> be. Any seen this before?
> 
> Thanks,
> 
> Luke
> 
> ----- Original Message -----
> From: "Will Allen" <wallen@Cyveillance.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Thursday, November 11, 2004 1:59 PM
> Subject: RE: Bug in the BooleanQuery optimizer? ..TooManyClauses
> 
> 
> Any wildcard search will automatically expand your query to 
> the number of
> terms it find in the index that suit the wildcard.
> 
> For example:
> 
> wild*, would become wild OR wilderness OR wildman etc for 
> each of the terms
> that exist in your index.
> 
> It is because of this, that you quickly reach the 1024 limit 
> of clauses.  I
> automatically set it to max int with the following line:
> 
> BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE );
> 
> 
> -----Original Message-----
> From: Sanyi [mailto:need4sid@yahoo.com]
> Sent: Thursday, November 11, 2004 6:46 AM
> To: lucene-user@jakarta.apache.org
> Subject: Bug in the BooleanQuery optimizer? ..TooManyClauses
> 
> 
> Hi!
> 
> First of all, I've read about BooleanQuery$TooManyClauses, so 
> I know that it
> has a 1024 Clauses
> limit by default which is good enough for me, but I still 
> think it works
> strange.
> 
> Example:
> I have an index with about 20Million documents.
> Let's say that there is about 3000 variants in the entire 
> document set of
> this word mask: cab*
> Let's say that about 500 documents are containing the word: spectrum
> Now, when I search for "cab* AND spectrum", I don't expect it 
> to throw an
> exception.
> It should first restrict the search for the 500 documents 
> containing the
> word "spectrum", then it
> should collect the variants of "cab*" withing these 
> documents, which turns
> out in two or three
> variants of "cab*" (cable, cables, maybe some more) and the 
> search should
> return let's say 10
> documents.
> 
> Similar example: When I search for "cab* AND nonexistingword" it still
> throws a TooManyClauses
> exception instead of saying "No results", since there is no
> "nonexistingword" in my document set,
> so it doesn't even have to start collecting the variations of "cab*".
> 
> Is there any path for this issue?
> Thank you for your time!
> 
> Sanyi
> (I'm using: lucene 1.4.2)
> 
> p.s.: Sorry for re-sending this message, I was first sending it as an
> accidental reply to a wrong thread..
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Check out the new Yahoo! Front Page.
> www.yahoo.com
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message