asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: loading CSV records with comma in the value
Date Mon, 27 Jul 2015 05:47:13 GMT
I added the following line

("quote"="\"")

to the load statement, but the problem remains: it mistakenly used the
"," in the "authors" field to break the record.

@Taewoo: can you try the simple AQL example I included in this thread
to see if it can parse the quoted field correctly?

Chen

On Sun, Jul 26, 2015 at 1:25 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
> We have test cases for this case. There are located in
> asterix-app/src/test/resources/runtimets/queries/load/.  The documentation
> is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for the
> CSV is fairly simple. You just have two additional parameters - "quote" and
> "header". Refer to the file for more details.
>
>
>
> Best,
> Taewoo
>
> On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <chenli@gmail.com> wrote:
>
>> @Taewoo: I tried it and it has the same problem.  Do you have a test
>> case for this feature?  Also do we have documentation for this syntax?
>>
>> Chen
>>
>> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
>> > The URL is https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
>> >
>> >
>> > It should look like this:
>> >
>> > ////
>> > use dataverse pubs;
>> >
>> > create type PaperType as open {
>> >    id: int32,
>> >    authors: string
>> > }
>> >
>> > create dataset Papers(PaperType) primary key id;
>> >
>> > load dataset Papers using localfs
>> >      using localfs
>> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >    ("format"="delimited-text"),
>> >    ("delimiter"=","));
>> >
>> > for $paper in dataset('Papers')
>> > return $paper;
>> >
>> >
>> >
>> > Best,
>> > Taewoo
>> >
>> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <chenli@gmail.com> wrote:
>> >
>> >> @Taewoo: can you send me the syntax or the documentation URL to show the
>> >> syntax?
>> >>
>> >> Chen
>> >>
>> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
>> >> > Can you try to load it into an internal dataset? I think I have
>> >> implemented
>> >> > the "comma between the comma (delimiter)" when modifying the delimited
>> >> data
>> >> > parser. And Chris also modified that part, too. If it doesn't work,
I
>> can
>> >> > look at the issue.
>> >> >
>> >> > Best,
>> >> > Taewoo
>> >> >
>> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <chenli@gmail.com> wrote:
>> >> >
>> >> >> Not sure if this topic was discussed before.  I was trying to load
an
>> >> >> external CVS file using "," as the delimiter.  But the engine failed
>> to
>> >> >> read a file with the following single record:
>> >> >>
>> >> >> 14, "John Smith, Mary Reeve"
>> >> >>
>> >> >>
>> >> >> use dataverse pubs;
>> >> >>
>> >> >>    create type PaperType as open {
>> >> >>       id: int32,
>> >> >>        authors: string
>> >> >>    }
>> >> >>
>> >> >> create external dataset Papers(PaperType)
>> >> >>    using localfs
>> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >> >>    ("format"="delimited-text"),
>> >> >>    ("delimiter"=","));
>> >> >>
>> >> >> for $paper in dataset('Papers')
>> >> >> return $paper;
>> >> >>
>> >> >> The following is the output, which shows that the comma in the
>> authors
>> >> >> field was incorrectly used to break the field.  Any idea about
how to
>> >> fix
>> >> >> it?
>> >> >>
>> >> >> Output
>> >> >> Results:
>> >> >>
>> >> >> { "id": 14, "authors": " \"John Smith" }
>> >> >>
>> >> >> Duration of all jobs: 0.091 sec
>> >> >>
>> >> >> Success: Query Complete
>> >> >>
>> >>
>>

Mime
View raw message