[R: New Features on pinyin] Convert Chinese Characters into Sijiao and Wubi codes

in #utopian-io5 years ago (edited)

Repo

https://github.com/pzhaonet/pinyin

Brief Intro and curriculum

The 'pinyin' package was developed in R language. It can convert Chinese characters in to Latin letters, officially called pinyin, i.e. the romanization system for Standard Chinese in mainland China, Malaysia, Singapore, and Taiwan. An brief introduction can be referred to the post pinyin: an R package that converts Chinese characters into Latin letters.

New Features

What features did I add?

  • Four times faster for converting.
  • At the beginning of the year 2018 I received an issue report by psychelzh about a polyphone error. Now a new pinyin library has been added, which more or less solved the polyphone problem.
  • Convert Chinese characters into Sijiao codes (literally four corner code).
  • and Wubi codes (literally five-stroke).
  • Some minor bugs were fixed.


Figure 1: Test the new features in RStudio IDE

How did I implement them?

  • Following Qu Cheng's suggestions in personal communications, I converted the pinyin library into an environment to accelerate the converting procedure by the pylib() function.
  • A new pinyin library '/inst/lib/zh2.txt' was added and a parameter dic = c('zh', 'zh2') in the pylib() function allows the users to choose a preferable library for polyphone.
  • New functions fclib() and four_corner() imports a four-corner library and converts Chinese characters into four-corner codes, according to Qu Cheng's suggestions.
  • A new function wubi() imports a five-stroke library and converts Chinese characters into five-stroke codes, again according to Qu Cheng's suggestions.
  • The downstream functions bookdown2py(), file.rename2py(), file2py() were updated to support the updates mentioned above.

Each part of the functions are well documented. Other files were updated automatically by compilation.

Link to relevant lines in the code on GitHub can be found mainly in my latest commit (click to see the details):

GitHub Account

https://github.com/pzhaonet

Sort:  

Thank you for your contribution. Converting Chinese Characters to Five-Stroke is very useful and I would suggest you adding the Five Stroke 86 as well - because many others like myself use Five Stroke 86 instead of 98. Anyway, it is a nice piece of work!

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you for your fast review and kind suggestion!

The new features of converting to Four-corner codes and Fire-stroke were required by users, although I myself never use them. Supporting Five-stroke-86 would surely enhance the usage of the pinyin package. In the future version, pinyin will be more flexible and allow users to customize their own dictionaries. I am afraid the pinyin package has to be renamed as 'zidian'.

Thank you.

Thank you for your review, @justyy!

So far this week you've reviewed 1 contributions. Keep up the good work!

Hi @dapeng!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

好久不见大鹏老师😄!问个好啦!

三哥好!经常读三哥的文字,只是没留言。最近太懒,不好意思:)

哈哈!大鹏老师太客气啦😄!

Coin Marketplace

STEEM 0.36
TRX 0.12
JST 0.040
BTC 70446.49
ETH 3571.68
USDT 1.00
SBD 4.73