Unicode Han Database (Unihan)
CEDICT
CEDICT CEDICT Format Hanzi to Pinyin Online Tool
Chinese to Pinyin
Cjklib
Make sure SQLite 3+
and SQLAlchemy 0.5+
are installed.
The following code:
# -*- coding: utf-8 -*-
import cjklib
from cjklib.characterlookup import CharacterLookup
c = u'好'
cjk = CharacterLookup('T')
readings = cjk.getReadingForCharacter(c, 'Pinyin')
for r in readings:
print r
produce:
hāo
hǎo
hào
>>> from cjklib import characterlookup
>>> cjk = characterlookup.CharacterLookup('C')
>>> cjk.getStrokeOrder(u'说')
[u'\u31d4', u'\u31ca', u'\u31d4', u'\u31d2', u'\u31d1', u'\u31d5', u'\u31d0', u'\u31d3', u'\u31df']
The code for Access a dictionary
in pypi page does work.
>>> from cjklib.dictionary import EDICT
>>> d = EDICT()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/cjklib-0.3.2-py2.7.egg/cjklib/dictionary/__init__.py", line 271, in __init__
% self.DICTIONARY_TABLE)
Stackoverfow has a solution
>>> from cjklib.dictionary import CEDICT
>>> from cjklib.dbconnector import getDBConnector
>>> db = getDBConnector({'sqlalchemy.url': 'sqlite://', 'attach': ['cjklib']})
>>> d=CEDICT(dbConnectInst=db)
>>> it=d.getFor(u'朋友')
>>> it
<itertools.imap object at 0x7fa2d30cb050>
>>> for x in it:
... print x
...
EntryTuple(HeadwordTraditional=u'\u670b\u53cb',
HeadwordSimplified=u'\u670b\u53cb', Reading=u'p\xe9ng you',
Translation=u'/friend/CL:\u500b|\u4e2a[ge4],\u4f4d[wei4]/')
Encoding Conversion
JSON uses UTF-8 by default. For Unicode code points in the range of U+0000 to U+FFFF, a single esape is enough. For code points in the range of U+10000 to U+10FFFF, two escapes using UTF-16 are needed.
Face with tears of joy: 😂
Unicode code point: U+1F602
UTF-16:
encoding: D83D DE02
escapes: \uD83D\uDE02
showed in xxd: 3DD8 02DE
showed in xxd with a BOM: FFFE 3DD8 02DE, BOM is FEFF
UTF-8: F09F9882
showed with xxd -u:
0000000: F09F 9882