druid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Y H <yurim2...@gmail.com>
Subject Re: druid can't parse string
Date Fri, 16 Jul 2021 16:57:05 GMT
thanks!
But i still have problem

i success to store string as UTF-8 with inline text ingestion. But when i
try to ingest batch type with csv, it encoded awkword.

the problem seems to happen when read csv. Should i transform csv file to
text file?? and if i ingest batch data with text file, what type of parser
should i choose?(still .*csv ?)



2021년 7월 17일 (토) 오전 1:46, Gian Merlino <gian@apache.org>님이 작성:

> Including the original poster in case they are not on the dev list
> themselves (hello!).
>
> On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino <gian@apache.org> wrote:
>
>> Druid stores strings as UTF-8 and from a storage and query basis, it
>> should work fine with any language. The
>> "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial has
>> strings in a variety of languages (check the "page" field):
>> https://druid.apache.org/docs/latest/tutorials/index.html
>>
>> So I wonder if there is an encoding problem with reading your input data?
>> If it's in a text format, it should be encoded as UTF-8 for Druid to be
>> able to read it properly.
>>
>
>>
>> On Fri, Jul 16, 2021 at 7:51 AM Y H <yurim2220@gmail.com> wrote:
>>
>>> hi, i am using druid for develop analytic-web.
>>> And i found druid can't parse language without english
>>>
>>> [image: image.png]
>>>
>>> is there any option on utf-8 OR way to parse string correctly?
>>>
>>> i attached my druid environment file,
>>> please let me know way to parse string in druid
>>>
>>> thanks.
>>>
>>>
>>>
>>> environment
>>> ___________________________________________________
>>> DRUID_XMS=1g
>>> DRUID_MAXNEWSIZE=250m
>>> DRUID_NEWSIZE=250m
>>> DRUID_MAXDIRECTMEMORYSIZE=6172m
>>>
>>> druid_emitter_logging_logLevel=debug
>>>
>>> druid_extensions_loadList=["druid-stats","druid-histogram",
>>> "druid-datasketches", "druid-lookups-cached-global",
>>> "postgresql-metadata-storage", "druid-kafka-indexing-service",
>>> "druid-kafka-extraction-namespace"]
>>>
>>> druid_zk_service_host=zookeeper
>>>
>>> # kafka config
>>> listeners=PLAINTEXT://211.253.8.155:59092
>>>
>>>
>>> # druid_metadata_storage_host=
>>> druid_metadata_storage_type=postgresql
>>>
>>> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
>>> druid_metadata_storage_connector_user=druid
>>> druid_metadata_storage_connector_password=FoolishPassword
>>>
>>> druid_coordinator_balancer_strategy=cachingCost
>>>
>>> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
>>> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
>>> "-Dfile.encoding=UTF-8",
>>> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
>>> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
>>>
>>> druid_storage_type=local
>>> druid_storage_storageDirectory=/opt/data/segments
>>> druid_indexer_logs_type=file
>>> druid_indexer_logs_directory=/opt/data/indexing-logs
>>>
>>> druid_processing_numThreads=2
>>> druid_processing_numMergeBuffers=2
>>>
>>>
>>> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
>>> status="WARN"><Appenders><Console name="Console"
>>> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
>>> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
>>> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
>>> additivity="false" level="DEBUG"><AppenderRef
>>> ref="Console"/></Logger></Loggers></Configuration>
>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message