Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 072131110E for ; Thu, 10 Jul 2014 11:40:11 +0000 (UTC) Received: (qmail 88428 invoked by uid 500); 10 Jul 2014 11:40:10 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 88386 invoked by uid 500); 10 Jul 2014 11:40:10 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 88366 invoked by uid 99); 10 Jul 2014 11:40:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2014 11:40:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ravindra.bajpai@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2014 11:40:06 +0000 Received: by mail-wg0-f51.google.com with SMTP id y10so4040071wgg.34 for ; Thu, 10 Jul 2014 04:39:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=3ch5I2ZKpMLbvwaMlpcTsFu3QOtg3Wi7sbasb+jiIC8=; b=Xdl75VWLcj7evIMPbbyGrMc14lLCBxuSZtWjJR6vBhVHSt4k6mIBWFhY9ED0PocWUi JjqAhdjPI1YikcHsP6PdOuPMdDfg5OAqPQ0G0mjaQPnP7nuZ80Pni3a8w8PYb9/Kpa6M mnQxaAiPOZqxb6t+MVW9uMfsPqPOWhzT9/atEh2hLILPNkaiptjYQS46yZJVBkO5Qtb8 6k7P9tEFsnWrJFxxfHp/GxyhAj5MK8sMybtbW4fAOvD3a/xGIY/5EG1m8jN5oDZbO5Lp KIwWmiMChhydSUpjo2AeHu8V9Rt4VAEB5+x8vrMDeFbxygj5kDUYaUfHSxG43bUyoQVk sdLg== X-Received: by 10.181.11.166 with SMTP id ej6mr17963551wid.75.1404992384887; Thu, 10 Jul 2014 04:39:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.82.71 with HTTP; Thu, 10 Jul 2014 04:39:24 -0700 (PDT) In-Reply-To: <53be76d6.ed21460a.550a.ffffbf1a@mx.google.com> References: <-3192216138359466814@unknownmsgid> <4F0AC08C-2435-4298-A6E3-3A6DAB1820A4@utah.edu> <53be76d6.ed21460a.550a.ffffbf1a@mx.google.com> From: Ravindra Date: Thu, 10 Jul 2014 17:09:24 +0530 Message-ID: Subject: Re: Read file name in an annotator To: user@uima.apache.org Cc: thomas.ginter@utah.edu Content-Type: multipart/alternative; boundary=f46d0435c05603734a04fdd548af X-Virus-Checked: Checked by ClamAV on apache.org --f46d0435c05603734a04fdd548af Content-Type: text/plain; charset=UTF-8 May this help - // Also store location of source document in CAS. This information is critical // if CAS Consumers will need to know where the original document contents are located. // For example, the Semantic Search CAS Indexer writes this information into the // search index that it creates, which allows applications that use the search index to // locate the documents that satisfy their semantic queries. SourceDocumentInformation srcDocInfo = new SourceDocumentInformation(jcas); srcDocInfo.setUri(file.getAbsoluteFile().toURL().toString()); srcDocInfo.setOffsetInSource(0); srcDocInfo.setDocumentSize((int) file.length()); srcDocInfo.setLastSegment(mCurrentIndex == mFiles.size()); srcDocInfo.addToIndexes(); followed by // retrieve the filename of the input file from the CAS FSIterator it = jcas.getAnnotationIndex(SourceDocumentInformation.type).iterator(); File outFile = null; if (it.hasNext()) { SourceDocumentInformation fileLoc = (SourceDocumentInformation) it.next(); File inFile; try { inFile = new File(new URL(fileLoc.getUri()).getPath()); String outFileName = inFile.getName(); if (fileLoc.getOffsetInSource() > 0) { outFileName += ("_" + fileLoc.getOffsetInSource()); } outFileName += ".xmi"; outFile = new File(mOutputDir, outFileName); modelFileName = mOutputDir.getAbsolutePath() + "/" + inFile.getName() + ".ecore"; } catch (MalformedURLException e1) { // invalid URL, use default processing below } } look for SourceDocumentInformation in the examples -- Ravi. *''We do not inherit the earth from our ancestors, we borrow it from our children.'' PROTECT IT !* On Thu, Jul 10, 2014 at 4:49 PM, Debbie Zhang wrote: > Thanks Thomas. May I ask if there is any sample code of UIMA readers that > can provide file name information for developing annotation? I was looking > on the internet today, but couldn't find one. Thanks again for your help - > much appreciated! > > Regards, > > Debbie Zhang > > > -----Original Message----- > > From: Thomas Ginter [mailto:thomas.ginter@utah.edu] > > Sent: Thursday, 10 July 2014 5:00 AM > > To: user@uima.apache.org > > Subject: Re: Read file name in an annotator > > > > Hi Debbie, > > > > The file name is not provided by default in UIMA although I believe the > > UIMA FileReader does populate a SourceDocumentInformation annotation > > with this information. Our group has a set of readers that populate > > our own annotation type to provide location data and other meta- > > information for each record (CAS) being processed. In short you will > > be better off writing your reader to provide that information for you. > > > > Thanks, > > > > Thomas Ginter > > 801-448-7676 > > thomas.ginter@utah.edu > > > > > > > > > > On Jul 9, 2014, at 5:41, Debbie Zhang wrote: > > > > > Hi, > > > > > > Can anyone tell me how to read the file name in an annotator using > > the > > > JCas? It seems the DocumentAnnotation does't contain file name. Thank > > > you! > > > > > > Best regards, > > > > > > Debbie Zhang > > > --f46d0435c05603734a04fdd548af--