A language character counting tool
A language character counting tool
According to the latest Utopian rule, any contribution in 'translation' category must translate over a number of words. If the translator works in Crowdin, it is easy to have the word count number. However, if the translator works on Github project directly, it is a bit difficult to count the characters in a particular language as for translation work, ususlly multiple language characters exist in the same file. I have implemented a tool to do this job. It is written in Python and has been tested on Ubuntu 16.
Implementation
The basic idea is to analysis the text and check each character against the unicode values for each language. In principle, the script works with any language - just edit the configuration file. Also, to make it handy for both translators and moderators, the tool support counting for both individual files and all files contained in a folder.
Test
I have written a couple of test to validate if the tool works and all tests pass.
$ python test.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.002s
OK
How to use
First, clone this repository to your PC.
Then modify the first line of wordcounter.py to get your python folder right:
#!/home/yuxi/environments/myenv/bin/python
To count individual file, run:
/YOUR_FOLDER/wordcounter.py FILENAME locale
To count all files within a folder, run:
/YOUR_FOLDER/wordcounter.py FOLDER locale
For example, if samples/1.yml has the following content:
zh-CN:
File: 文件
Edit:编辑
Help:帮助
Then run command:
./wordcounter.py samples/1.yml zh-CN
It returns 6
To run the following command to count Chinese characters within a folder:
./wordcounter.py samples/ zh-CN
The tool is available here: https://github.com/yuxir/wordcounter
To prove it is the work I have done, I have changed the README in github repository, e.g. put my steemit URL in:
Posted on Utopian.io - Rewarding Open Source Contributors
Hey @yuxi I am @utopian-io. I have just upvoted you!
Achievements
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Your contribution cannot be approved yet because it is too basic and the source code for this type of contribution is easily available on the internet. See the Utopian Rules. Please edit your contribution to reapply for approval.
You may edit your post here, as shown below:

You can contact us on Discord.
[utopian-moderator]
The @utopian-io rule doesn't say what is a 'too basic' contribution. Although the basic idea of how to detect language characters in not new, I didnot find a handy tool to do the same job as my script can do. Building a new wheel is a kind of contribution, putting wheels and other parts together is also a contribution.
Thank you for the contribution. It has been approved.
Yes that's true we do not have the rule which says if the contribution is 'too basic' we can reject it, but I will make sure it will be added in the Rule. We do not want a lot of dev contributions which does not have a real life scenario. I agree your dev contribution can be benefit to others but still its too basic as I can find a lot of code on the internet.
You can contact us on Discord.
[utopian-moderator]
It would be great to define a more precise rule then authors will self evaluate their contribution before posting and the moderator can review posts faster. Thanks for your work in utopian and have a nice Xmas.
Thanks for sharing.