Запись кириллицы и других не ASCII символов в формате JSON
На лекции был пример записи данных, которые содержат кириллицу в файл формате JSON. По умолчанию, вместо кириллицы, мы получили строку с кодами Unicode.
Подготовка данных
Пример строки в формате JSON:
In [10]: data = '{"login":"natenka","id":15850513,"avatar_url":"https://avatars0.githubus
...: ercontent.com/u/15850513?v=4","gravatar_id":"","url":"https://api.github.com/use
...: rs/natenka","html_url":"https://github.com/natenka","followers_url":"https://api
...: .github.com/users/natenka/followers","following_url":"https://api.github.com/use
...: rs/natenka/following{/other_user}","gists_url":"https://api.github.com/users/nat
...: enka/gists{/gist_id}","starred_url":"https://api.github.com/users/natenka/starre
...: d{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/natenka/subs
...: criptions","organizations_url":"https://api.github.com/users/natenka/orgs","repo
...: s_url":"https://api.github.com/users/natenka/repos","events_url":"https://api.gi
...: thub.com/users/natenka/events{/privacy}","received_events_url":"https://api.gith
...: ub.com/users/natenka/received_events","type":"User","site_admin":false,"name":"Н
...: аташа Самойленко","company":null,"blog":"https://natenka.github.io/","location":
...: null,"email":"natasha.samoylenko@gmail.com","hireable":null,"bio":null,"public_r
...: epos":11,"public_gists":2,"followers":49,"following":27,"created_at":"2015-11-14
...: T20:32:44Z","updated_at":"2017-09-27T17:27:19Z","private_gists":0,"total_private
...: _repos":0,"owned_private_repos":0,"disk_usage":53691,"collaborators":0,"two_fact
...: or_authentication":false,"plan":{"name":"free","space":976562499,"collaborators"
...: :0,"private_repos":0}}'
Вариант для копирования:
data ='{"login":"natenka","id":15850513,"avatar_url":"https://avatars0.githubusercontent.com/u/15850513?v=4","gravatar_id":"","url":"https://api.github.com/users/natenka","html_url":"https://github.com/natenka","followers_url":"https://api.github.com/users/natenka/followers","following_url":"https://api.github.com/users/natenka/following{/other_user}","gists_url":"https://api.github.com/users/natenka/gists{/gist_id}","starred_url":"https://api.github.com/users/natenka/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/natenka/subscriptions","organizations_url":"https://api.github.com/users/natenka/orgs","repos_url":"https://api.github.com/users/natenka/repos","events_url":"https://api.github.com/users/natenka/events{/privacy}","received_events_url":"https://api.github.com/users/natenka/received_events","type":"User","site_admin":false,"name":"Наташа Самойленко","company":null,"blog":"https://natenka.github.io/","location":null,"email":"natasha.samoylenko@gmail.com","hireable":null,"bio":null,"public_repos":11,"public_gists":2,"followers":49,"following":27,"created_at":"2015-11-14T20:32:44Z","updated_at":"2017-09-27T17:27:19Z","private_gists":0,"total_private_repos":0,"owned_private_repos":0,"disk_usage":53691,"collaborators":0,"two_factor_authentication":false,"plan":{"name":"free","space":976562499,"collaborators":0,"private_repos":0}}'
Для начала, получаем словарь Python из строки с помощью метода loads:
In [11]: import json
In [12]: py_data = json.loads(data)
In [13]: py_data
Out[13]:
{'avatar_url': 'https://avatars0.githubusercontent.com/u/15850513?v=4',
'bio': None,
'blog': 'https://natenka.github.io/',
'collaborators': 0,
'company': None,
'created_at': '2015-11-14T20:32:44Z',
'disk_usage': 53691,
'email': 'natasha.samoylenko@gmail.com',
'events_url': 'https://api.github.com/users/natenka/events{/privacy}',
'followers': 49,
'followers_url': 'https://api.github.com/users/natenka/followers',
'following': 27,
'following_url': 'https://api.github.com/users/natenka/following{/other_user}',
'gists_url': 'https://api.github.com/users/natenka/gists{/gist_id}',
'gravatar_id': '',
'hireable': None,
'html_url': 'https://github.com/natenka',
'id': 15850513,
'location': None,
'login': 'natenka',
'name': 'Наташа Самойленко',
'organizations_url': 'https://api.github.com/users/natenka/orgs',
'owned_private_repos': 0,
'plan': {'collaborators': 0,
'name': 'free',
'private_repos': 0,
'space': 976562499},
'private_gists': 0,
'public_gists': 2,
'public_repos': 11,
'received_events_url': 'https://api.github.com/users/natenka/received_events',
'repos_url': 'https://api.github.com/users/natenka/repos',
'site_admin': False,
'starred_url': 'https://api.github.com/users/natenka/starred{/owner}{/repo}',
'subscriptions_url': 'https://api.github.com/users/natenka/subscriptions',
'total_private_repos': 0,
'two_factor_authentication': False,
'type': 'User',
'updated_at': '2017-09-27T17:27:19Z',
'url': 'https://api.github.com/users/natenka'}
По умолчанию non-ASCII символы записываются как последовательность кодов юникод
Запись словаря в файл в формате JSON
In [16]: with open('try_unicode.json', 'w') as f:
...: json.dump(py_data, f, indent=2)
...:
Итоговый результат:
In [17]: cat try_unicode.json
{
"login": "natenka",
"id": 15850513,
"avatar_url": "https://avatars0.githubusercontent.com/u/15850513?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/natenka",
"html_url": "https://github.com/natenka",
"followers_url": "https://api.github.com/users/natenka/followers",
"following_url": "https://api.github.com/users/natenka/following{/other_user}",
"gists_url": "https://api.github.com/users/natenka/gists{/gist_id}",
"starred_url": "https://api.github.com/users/natenka/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/natenka/subscriptions",
"organizations_url": "https://api.github.com/users/natenka/orgs",
"repos_url": "https://api.github.com/users/natenka/repos",
"events_url": "https://api.github.com/users/natenka/events{/privacy}",
"received_events_url": "https://api.github.com/users/natenka/received_events",
"type": "User",
"site_admin": false,
"name": "\u041d\u0430\u0442\u0430\u0448\u0430 \u0421\u0430\u043c\u043e\u0439\u043b\u0435\u043d\u043a\u043e",
"company": null,
"blog": "https://natenka.github.io/",
"location": null,
"email": "natasha.samoylenko@gmail.com",
"hireable": null,
"bio": null,
"public_repos": 11,
"public_gists": 2,
"followers": 49,
"following": 27,
"created_at": "2015-11-14T20:32:44Z",
"updated_at": "2017-09-27T17:27:19Z",
"private_gists": 0,
"total_private_repos": 0,
"owned_private_repos": 0,
"disk_usage": 53691,
"collaborators": 0,
"two_factor_authentication": false,
"plan": {
"name": "free",
"space": 976562499,
"collaborators": 0,
"private_repos": 0
}
}
Обратите внимание на ключ name:
"name": "\u041d\u0430\u0442\u0430\u0448\u0430 \u0421\u0430\u043c\u043e\u0439\u043b\u0435\u043d\u043a\u043e"
Если этот файл будет использоваться только скриптом, никаких проблем не будет. Мы можем считать его и получить тот же словарь в Python с кириллицей:
In [26]: with open('try_unicode.json') as f:
...: result = json.load(f)
...:
In [27]: result
Out[27]:
{'avatar_url': 'https://avatars0.githubusercontent.com/u/15850513?v=4',
'bio': None,
'blog': 'https://natenka.github.io/',
'collaborators': 0,
'company': None,
'created_at': '2015-11-14T20:32:44Z',
'disk_usage': 53691,
'email': 'natasha.samoylenko@gmail.com',
'events_url': 'https://api.github.com/users/natenka/events{/privacy}',
'followers': 49,
'followers_url': 'https://api.github.com/users/natenka/followers',
'following': 27,
'following_url': 'https://api.github.com/users/natenka/following{/other_user}',
'gists_url': 'https://api.github.com/users/natenka/gists{/gist_id}',
'gravatar_id': '',
'hireable': None,
'html_url': 'https://github.com/natenka',
'id': 15850513,
'location': None,
'login': 'natenka',
'name': 'Наташа Самойленко',
'organizations_url': 'https://api.github.com/users/natenka/orgs',
'owned_private_repos': 0,
'plan': {'collaborators': 0,
'name': 'free',
'private_repos': 0,
'space': 976562499},
'private_gists': 0,
'public_gists': 2,
'public_repos': 11,
'received_events_url': 'https://api.github.com/users/natenka/received_events',
'repos_url': 'https://api.github.com/users/natenka/repos',
'site_admin': False,
'starred_url': 'https://api.github.com/users/natenka/starred{/owner}{/repo}',
'subscriptions_url': 'https://api.github.com/users/natenka/subscriptions',
'total_private_repos': 0,
'two_factor_authentication': False,
'type': 'User',
'updated_at': '2017-09-27T17:27:19Z',
'url': 'https://api.github.com/users/natenka'}
Параметр ensure_ascii
Но, если этот файл нужно будет читать и человеку, то лучше чтобы все non-ASCII символы отображались нормально.
За это отвечает параметр ensure_ascii:
In [28]: with open('try_unicode.json', 'w') as f:
...: json.dump(py_data, f, indent=2, ensure_ascii=False)
...:
Теперь кириллица записана нормально:
In [29]: cat try_unicode.json
{
"login": "natenka",
"id": 15850513,
"avatar_url": "https://avatars0.githubusercontent.com/u/15850513?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/natenka",
"html_url": "https://github.com/natenka",
"followers_url": "https://api.github.com/users/natenka/followers",
"following_url": "https://api.github.com/users/natenka/following{/other_user}",
"gists_url": "https://api.github.com/users/natenka/gists{/gist_id}",
"starred_url": "https://api.github.com/users/natenka/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/natenka/subscriptions",
"organizations_url": "https://api.github.com/users/natenka/orgs",
"repos_url": "https://api.github.com/users/natenka/repos",
"events_url": "https://api.github.com/users/natenka/events{/privacy}",
"received_events_url": "https://api.github.com/users/natenka/received_events",
"type": "User",
"site_admin": false,
"name": "Наташа Самойленко",
"company": null,
"blog": "https://natenka.github.io/",
"location": null,
"email": "natasha.samoylenko@gmail.com",
"hireable": null,
"bio": null,
"public_repos": 11,
"public_gists": 2,
"followers": 49,
"following": 27,
"created_at": "2015-11-14T20:32:44Z",
"updated_at": "2017-09-27T17:27:19Z",
"private_gists": 0,
"total_private_repos": 0,
"owned_private_repos": 0,
"disk_usage": 53691,
"collaborators": 0,
"two_factor_authentication": false,
"plan": {
"name": "free",
"space": 976562499,
"collaborators": 0,
"private_repos": 0
}
}
Leave a Comment