A language character counting tool

in #utopian-io7 years ago (edited)

A language character counting tool

According to the latest Utopian rule, any contribution in 'translation' category must translate over a number of words. If the translator works in Crowdin, it is easy to have the word count number. However, if the translator works on Github project directly, it is a bit difficult to count the characters in a particular language as for translation work, ususlly multiple language characters exist in the same file. I have implemented a tool to do this job. It is written in Python and has been tested on Ubuntu 16.

image.png
Image source: pixabay.com

Implementation

The basic idea is to analysis the text and check each character against the unicode values for each language. In principle, the script works with any language - just edit the configuration file. Also, to make it handy for both translators and moderators, the tool support counting for both individual files and all files contained in a folder.

Test

I have written a couple of test to validate if the tool works and all tests pass.

$ python test.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

How to use

First, clone this repository to your PC.

Then modify the first line of wordcounter.py to get your python folder right:

#!/home/yuxi/environments/myenv/bin/python

To count individual file, run:
/YOUR_FOLDER/wordcounter.py FILENAME locale

To count all files within a folder, run:
/YOUR_FOLDER/wordcounter.py FOLDER locale

For example, if samples/1.yml has the following content:

zh-CN:
  File: 文件
  Edit:编辑
  Help:帮助

Then run command:

./wordcounter.py samples/1.yml   zh-CN

It returns 6

To run the following command to count Chinese characters within a folder:

./wordcounter.py samples/   zh-CN

It returns:
image.png

The tool is available here: https://github.com/yuxir/wordcounter

To prove it is the work I have done, I have changed the README in github repository, e.g. put my steemit URL in:

image.png



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Hey @yuxi I am @utopian-io. I have just upvoted you!

Achievements

  • You have less than 500 followers. Just gave you a gift to help you succeed!
  • Seems like you contribute quite often. AMAZING!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

Your contribution cannot be approved yet because it is too basic and the source code for this type of contribution is easily available on the internet. See the Utopian Rules. Please edit your contribution to reapply for approval.

You may edit your post here, as shown below:

You can contact us on Discord.
[utopian-moderator]

The @utopian-io rule doesn't say what is a 'too basic' contribution. Although the basic idea of how to detect language characters in not new, I didnot find a handy tool to do the same job as my script can do. Building a new wheel is a kind of contribution, putting wheels and other parts together is also a contribution.

Thank you for the contribution. It has been approved.

Yes that's true we do not have the rule which says if the contribution is 'too basic' we can reject it, but I will make sure it will be added in the Rule. We do not want a lot of dev contributions which does not have a real life scenario. I agree your dev contribution can be benefit to others but still its too basic as I can find a lot of code on the internet.

You can contact us on Discord.
[utopian-moderator]

It would be great to define a more precise rule then authors will self evaluate their contribution before posting and the moderator can review posts faster. Thanks for your work in utopian and have a nice Xmas.

Thanks for sharing.

Coin Marketplace

STEEM 0.16
TRX 0.15
JST 0.030
BTC 59347.70
ETH 2534.40
USDT 1.00
SBD 2.47