TurkishSuffix 0.3.1 - Major Vowel Exception and Rewrite

in #utopian-io5 years ago

Repository

https://github.com/yokunjon/turkishsuffix

New Features

Major Vowel Exception:
As Turkish has vowel harmony, words also have to preserve harmony with suffixes. For that reason, vowels in suffixes change depending on the last vowel of the word. The whole idea of this library is it. But, there is an exception. There are lots of non-Turkish words which are originated from Arabic, Persian, and French. For that reason, most of them get suffixes with their last vowel is soft. For example, "hayal" is an Arabic originated word. When it gets suffix, instead of a hard letter like "a", soft letter like "e" is used. So, this fools the algorithm. Major vowel exception solves this problem.

(Mostly) Pythonic Rewrite:
The whole library is rewritten from scratch. I'm not saying it is pure Pythonic, it has a complex algorithm and understanding how it is working is not easy. But it is better than before. I will explain what I changed and why I changed in the next section.

Implementation

First things first, I wanted to rewrite config loader. It was using .ini files as config. But as I experimented and learned, it is easier to maintain a .py file as a config if there is no chance for user input. So changed the config according to that.

This time, instead of single file, I used a directory and split the config by two files. One is for algorithm settings and the other is for exceptions. I used exec to load config files. When I first used it, I didn't specify any codec, so that created some problems with Turkish letters. Later, I used "utf-8" in exec to solve the issue.

Then, I implemented the major vowel exception. Implementing major vowel exception was easy. I only had to include a dictionary of exceptions and check words are in it or not.

After than that, I started to rewrite the whole main class from scratch. First things first, I decided to use namedtuples in configs. Using dictionaries was easy, but using dictionaries inside dictionaries wasn't. I changed config loader according to that. Then, I erased the whole suffix class except config imports.

First thing I reimplemented was obviously vowel harmony algorithm. I unpacked optional part of the rule_set with *. Then, instead of a list loop, I used a generator comprehension to find the last vowel. Lastly, I implemented the vowel algorithm which is working with rule_set. In the end, the program was returning the correct vowel.

Then, I felt like it is a mess and modularized it a bit. After the modularization, I did add the soft & hard check. I used guard clause to find hard & soft changes and returned changes if necessary. Later, I implemented the buffer letter. I used guard clause again. Also, I had to implement buffer letter exception to two of the suffixes for two words in Turkish, which are "su" and "ne". After than that, I simply reimplemented major vowel exception. It was quite easy.

And this is where it starts. It became complex because now it was time to add the possessive suffix. The algorithm could handle the rest, but possessive in Turkish is quite a challenge, especially for a computer. I did add possessive person states as a tuple to the settings file. As some of them change word different than the rest, I had to use rule_set to calculate possibilities. I did manage not to repeat myself in possessive method and used the main suffix method to handle suffix part. In the previous version, I was calculating them separately.

I decided to use namedtuple for the return value instead of using lots of getters or properties. By that way, the user of the library could easily select which part to work and wouldn't have to touch main class except suffix method. Then, I did raise several errors to inform what is wrong. I also reimplemented the apostrophe, it was also quite easy.

Now, it was time for bug-testing. There were lots of bugs especially about possessive with "su" and "ne" exception. I did some workarounds to avoid it, tried many things and edited most of the methods like softs&hards to solve them. Also, there was an issue about dotted I, I did implement a turkish_lower function to solve it. To make config loader compatible with module location, I did change os methods with pkg_resources.

Lastly, I updated the examples and added the doctstring for suffix word.

Roadmap

I want to make it more developer friendly. Firstly, I want to make a wrapper class for it. It is kinda hard to do repeated suffixes on the same word at the moment. Also, I plan to change the algorithm if possible or find shortcuts which might prevent issues I solved. I might or might not add other suffixes. My main focus on this library was complex suffixes with major vowels, so it depends.

Contrubition

If you know Turkish language or know how to use python better, I'm open to suggestions. You can simply send a pull request with a proper explanation so I can merge it if it is applicable. I would like to hear some pythonic suggestions which probably I didn't follow, from some of you, python gurus.

GitHub Account

https://github.com/yokunjon

Sort:  
Loading...

Hey, @yokunjon!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Hi @yokunjon!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 21 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 5 SBD worth and should receive 72 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

Congratulations @yokunjon! You received a personal award!

1 Year on Steemit

Click here to view your Board

Do not miss the last post from @steemitboard:

SteemWhales has officially moved to SteemitBoard Ranking
SteemitBoard - Witness Update

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @yokunjon! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.30
TRX 0.12
JST 0.034
BTC 63688.35
ETH 3125.30
USDT 1.00
SBD 3.97