lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Many PDFs indexed but only one returned in te Solr-UI
Date Tue, 11 Mar 2014 11:46:00 GMT
Hmmm, that looks OK to me. I'd log out
the id you assign for each document,
it's _possible_ that somehow you're
getting the same ID for all the files
except this line should be preventing that:
 doc.addField("id", document);

Tail the Solr log while you're doing this and
see the update messages to insure that there
are more than one. And I'm assuming that
you've got more than one file in your directory.


BTW, doing the commit after every doc is
generally poor practice in production.I know
you're just testing now, but thought I'd
mention it. Let autocommit handle most of it
and (perhaps) commit once at the end.

Hmmm, silly question perhaps, but are you
absolutely sure that you're querying the same
core you're indexing to? On the same machine?
Sometimes as a sanity check I'll add, say,
a timestamp to the id field (i.e.
doc.add("id", filename + timestamp) just to
have something that changes every run.

Best
Erick

On Tue, Mar 11, 2014 at 6:00 AM, Croci  Francesco Luigi (ID SWS)
<fcroci@id.ethz.ch> wrote:
> I followed the example here (http://searchhub.org/2012/02/14/indexing-with-solrj/) for
indexing all the pdfs in a directory. The process seems to work well, but at the end, when
I go in the Solr-UI and click on "Execute query"(with q=*:*), I get only one entry.
>
> Do I miss something in my code?
>
>     ...
>
>     String[] files = documentDir.list();
>
>
>
>     if (files != null)
>
>     {
>
>       for (String document : files)
>
>       {
>
>         ContentHandler textHandler = new BodyContentHandler();
>
>         Metadata metadata = new Metadata();
>
>         ParseContext context = new ParseContext();
>
>         AutoDetectParser autoDetectParser = new AutoDetectParser();
>
>
>
>         InputStream inputStream = null;
>
>
>
>         try
>
>         {
>
>           inputStream = new FileInputStream(new File(documentDir, document));
>
>
>
>           autoDetectParser.parse(inputStream, textHandler, metadata, context);
>
>
>
>           SolrInputDocument doc = new SolrInputDocument();
>
>           doc.addField("id", document);
>
>
>
>           String content = textHandler.toString();
>
>
>
>           if (content != null)
>
>           {
>
>             doc.addField("fullText", content);
>
>           }
>
>
>
>           UpdateResponse resp = server.add(doc, 1);
>
>
>
>           server.commit(true, true, true);
>
>
>
>           if (resp.getStatus() != 0)
>
>           {
>
>             throw new IDSystemException(LOG, "Document could not be indexed. Status returned:
" + resp.getStatus());
>
>           }
>
>         }
>
>         catch (FileNotFoundException fnfe)
>
>         {
>
>           throw new IDSystemException(LOG, fnfe.getMessage(), fnfe);
>
>         }
>
>         catch (IOException ioe)
>
>         {
>
>           throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
>         }
>
>         catch (SAXException se)
>
>         {
>
>           throw new IDSystemException(LOG, se.getMessage(), se);
>
>         }
>
>         catch (TikaException te)
>
>         {
>
>           throw new IDSystemException(LOG, te.getMessage(), te);
>
>         }
>
>         catch (SolrServerException sse)
>
>         {
>
>           throw new IDSystemException(LOG, sse.getMessage(), sse);
>
>         }
>
>         finally
>
>         {
>
>           if (inputStream != null)
>
>           {
>
>             try
>
>             {
>
>               inputStream.close();
>
>             }
>
>             catch (IOException ioe)
>
>             {
>
>               throw new IDSystemException(LOG, ioe.getMessage(), ioe);
>
>             }
>
>           }
>
>         }
>
>        ...
>
> Thank you for any hint.
>
> Francesco

Mime
View raw message