Topic Extraction from Steem TagssteemCreated with Sketch.

in #steemstem6 years ago (edited)

Topic Extraction from Steem Tags

Or: Dimensionality Reduction on the Blockchain
Or: A Not-So-Gentle Introduction to Non-Negative Matrix Factorization

Yesterday, I used covariance to analyze which Steem tags were used together. Today, I'll introduce a different technique and explore the related keywords that it exposes.

NNMF

Non-Negative Matrix Factorization is a dimensionality reduction technique; that is, it compresses data to a smaller number of features. When this process works well, it preserves the original data set, by explaining them as combinations of "hidden" features extracted from the data. All the features and combinations are given positive weights, which can make them easier to interpret compared to a technique like Singular Value Decomposition, which produces both positive and negative weights.

In linear algebra terms, NNMF is just matrix multiplication: V = WH.

V is a m x n matrix that represents the input data (or, an approximation of it.) Typically the rows are observations or data points and the columns are features that have been measured. In my experiment, m = 200000 discussions and n = 5180 common tags. A '1' entry in the matrix means that a particular discussion had that topic specified; otherwise it is '0'. (I had to cut down from my original data set of 1.1 million discussions due to lack of memory on the VM where I performed the analysis.)

W is a m x p matrix that maps each observation to a weighted sum of the "p" computed hidden features or "topics". In practice, a data scientist has to zero in on a good value for p; I choose p = 100 arbitrarily. So, this matrix maps from Steem conversation to the dimensionally-reduced topic set. We know that users use tags that are correlated, so if NNMF does a good job we should find tags that are semantically similar appearing in the same topic.

H is a p x n matrix that maps each topic back to the original features, the tags. You can think of it as an index saying what each topic "means".

The original matrix had about a billion entries though my implementation stores them sparsely: 200000 * 5180 = 1,036,000,000. The NNMF decomposition tries to store the same amount of data in 200000 * 100 + 100 * 5180 = 20,518,000 entries. If all the original measurements were real numbers, this would be a lot less data! But the original data is actually just binary, and the NNMF matrices are filled with floating-point numbers.

The scikit-learn NNMF implementation

I used the Python package scikit-learn which provides an NNMF implementation which supports a couple different algorithms. It takes the matrix V as input and finds the best approximation of V in the decomposed form WH. After the input matrix is set up, it's just a few lines to run the algorithm and extract the desired matrices:

model = NMF( n_components = 100, init = 'random' )
W = model.fit_transform( V )
H = model.components_

>>> V.shape
(200000, 5180)
>>> W.shape
(200000, 100)
>>> H.shape
(100, 5180)
>>> model.reconstruction_err_
532.8368170706713

The reconstruction error says how close to the original V we got using the "Frobenius Norm", which is a generalization of Euclidean distance. We could try different values of p (n_components) to try to find a sweet spot where the number of topics is still small but the error is improved.

Steem topics from tags

The resulting output is quite noisy since the algorithm tries to optimize by making small changes to the weights; most topics have a lot of tags associated with them with very small values. I picked an arbitrary threshold of 0.1 for the data shown below and excluded weights lower than that.

Many weights are greater than 1.0, even though the original data has only 0 and 1! But the W matrix can and does use fractional values, not just to mix topics but also select a single topic at the appropriate weight. The number itself isn't as important as the relative weight within a topic.

Language-based topics

The algorithm did a good job finding tags related to a particular language or nationality and grouping them into topics:

topic numberhighest-weight tagother tags
29kr-newbie, 3.017kr-writing, 0.344 ; kr-life, 0.334 ; kr, 0.125 ; kr-daily, 0.119
66kr, 7.209coinkorea, 0.766 ; kr-life, 0.548 ; kr-writing, 0.483 ; kr-daily, 0.340 ; kr-event, 0.301 ; kr-art, 0.240 ; kr-pen, 0.236 ; kr-coin, 0.228 ; kr-diary, 0.157 ; tooza, 0.135 ; muksteem, 0.132 ; kr-news, 0.113 ; kr-travel, 0.113 ; kr-gazua, 0.112
84jjangjjangman, 5.574muksteem, 0.517 ; kr-life, 0.419 ; kr-event, 0.308 ; kr-gazua, 0.186 ; kr-funfun, 0.184 ; dev, 0.156 ; it-news, 0.154 ; kr-travel, 0.144 ; kr-diary, 0.136 ; sharehows, 0.126 ; kr-overseas, 0.122 ; kr-youth, 0.121 ; kr-series, 0.109 ; kr-art, 0.107 ; kr-food, 0.102 ; kr-easy, 0.101
46cn, 4.038cn-reader, 0.877 ; lizhi, 0.171 ; teammalaysia, 0.155 ; cn-book, 0.155 ; cn-malaysia, 0.153 ; book, 0.136 ; stats, 0.123
79deutsch, 4.702steemit-austria, 0.557 ; german, 0.161 ; politik, 0.113 ; austria, 0.100
86spanish, 5.171venezuela, 0.284 ; steemitvenezuela, 0.120 ; talentclub, 0.114
59castellano, 4.063venezuela, 1.064 ; literatura, 0.178 ; provenezuela, 0.114 ; proconocimiento, 0.105

Single-Tag topics

Tags such as #blog, #steemit, #photo, #phography, #life, are all common enough that the dimensionality reduction created features just for them, without any other significant tags.

Community interests

Photography appears in a couple different topics and NMF did a reasonable job associating semantically similar tags together:

topic numberhighest-weight tagother tags
1photography, 2.670
38photo, 4.016
19colourfulphotography, 4.443bescouted, 0.745 ; thealliance, 0.550 ; portraitphotography, 0.458 ; yourluckyphotos, 0.451 ; streetphotography, 0.430 ; sevendaybnwchallenge, 0.412 ; venezuela, 0.325 ; photocontests, 0.163 ; photoworld, 0.149 ; ru, 0.146 ; vn, 0.140 ; ua, 0.135
25macrophotography, 6.246acehmacro, 0.649 ; photocircle, 0.343 ; macro, 0.264 ; yourluckyphotos, 0.198 ; photocontests, 0.144
35animals, 7.821animalphotography, 1.388 ; cats, 0.852 ; pets, 0.672 ; dogs, 0.486 ; cat, 0.397 ; dog, 0.370 ; dailypetphotography, 0.319 ; homesteading, 0.311 ; cute, 0.288 ; birds, 0.288 ; wildlife, 0.154 ; pet, 0.136 ; caturday, 0.134 ; monomad, 0.102
45landscapephotography, 6.021photocircle, 0.470 ; landscape, 0.364 ; yourluckyphotos, 0.225 ; photocontest, 0.176 ; newbieresteemday, 0.127 ; naturephotography, 0.126 ; mountains, 0.119 ; photofriend, 0.110 ; photomatic, 0.107 ; sunset, 0.106
73animal, 5.792animalphotography, 0.563 ; cat, 0.347 ; dog, 0.161 ; acehmacro, 0.111
21beauty, 4.672girl, 0.441 ; woman, 0.350 ; japanese, 0.258 ; fashion, 0.233 ; fitness, 0.186 ; model, 0.160 ; portraitphotography, 0.115 ; makeup, 0.115 ; bescouted, 0.110 ; photocontest, 0.102
53smartphonephotography, 6.191yourluckyphotos, 0.130 ; smartphotography, 0.110

Porn terms cluster together strongly (suggesting perhaps a lack of originality in tag selection), with some overlap with #hot:

topic numberhighest-weight tagother tags
98nsfw, 2.889sexy, 1.681 ; porn, 1.002 ; sex, 0.678 ; nude, 0.497 ; girl, 0.420 ; girls, 0.311 ; hot, 0.276 ; adult, 0.274 ; woman, 0.250 ; dporn, 0.205 ; hentai, 0.202 ; japanese, 0.182 ; hardcore, 0.172 ; model, 0.165 ; anime, 0.157 ; pussy, 0.155 ; boobs, 0.138 ; asmr, 0.137 ; erotica, 0.117 ; manga, 0.106 ; ass, 0.105
5hot, 4.229trending, 3.951 ; promoted, 2.900 ; girls, 0.181 ; sexy, 0.176 ; asmr, 0.165 ; viral, 0.123

Science-related topics get grouped together too in a couple overlapping topics:

topic numberhighest-weight tagother tags
33technology, 3.760science, 0.351 ; business, 0.155 ; steemstem, 0.120 ; steemdunk, 0.114
78science, 3.635education, 3.413 ; steemstem, 0.751 ; steemiteducation, 0.622 ; people, 0.607 ; space, 0.224 ; sndbox, 0.159 ; psychology, 0.159 ; physics, 0.137 ; philosophy, 0.119 ; stemng, 0.118

The complete output is included below.

Conclusion

Non-negative Matrix Factorization is a standard dimensionality-reduction technique which I have demonstrated on Steem tags and discussions. It is frequently used for this sort of semantic clustering; in fact, that's the example scikit-learn provides! The decomposed form of the original data can be used for other purposes as well, such as similarity detection or recommendations; I am thinking of analyzing upvotes according to these topics rather than the raw tags. A more ambitious project would be to apply NNMF to word frequencies or occurence within the discussions themselves, not just the attached tags.

Complete topic list

topic numberhighest-weight tagother tags
0life, 1.952
1photography, 2.670
2artzone, 1.136slothicorn, 0.110
3untalented, 0.733steemitfamilyph, 0.138
4whalepower, 1.033
5ocd-resteem, 2.955promo-steem, 0.375 ; slothicorn, 0.283 ; homesteading, 0.215 ; ita, 0.173 ; adsactly, 0.149 ; gardening, 0.102
6blockchain, 2.670
7love, 2.222
8bitcoin, 3.287ethereum, 0.207
9introduceyourself, 1.294introducemyself, 0.235
10beautiful, 2.843quotes, 0.735 ; amazing, 0.449 ; words, 0.349 ; minds, 0.334 ; lovefriday, 0.263 ; japanese, 0.249 ; happy, 0.232 ; girl, 0.210 ; woman, 0.155 ; nice, 0.107
11art, 3.709
12food, 3.137recipe, 0.308 ; foodphotography, 0.253 ; cooking, 0.174
13travel, 2.515
14nature, 5.634
15dlive, 1.631dlive-video, 1.560 ; learning, 0.317 ; , 0.172
16dtube, 3.332vlog, 0.277 ; dtubedaily, 0.275 ; onelovedtube, 0.252 ; steempowerment, 0.117
17flower, 2.820
18entertainment, 1.933people, 0.364 ; movies, 0.106
19colourfulphotography, 4.443bescouted, 0.745 ; thealliance, 0.550 ; portraitphotography, 0.458 ; yourluckyphotos, 0.451 ; streetphotography, 0.430 ; sevendaybnwchallenge, 0.412 ; venezuela, 0.325 ; photocontests, 0.163 ; photoworld, 0.149 ; ru, 0.146 ; vn, 0.140 ; ua, 0.135
20meme, 2.519dmania, 1.330 ; memes, 0.139 ; funny, 0.134 ; memeitlol, 0.115
21beauty, 4.672girl, 0.441 ; woman, 0.350 ; japanese, 0.258 ; fashion, 0.233 ; fitness, 0.186 ; model, 0.160 ; portraitphotography, 0.115 ; makeup, 0.115 ; bescouted, 0.110 ; photocontest, 0.102
22music, 5.211comedy, 0.411 ; geme, 0.405 ; openmic, 0.291 ; guitar, 0.159 ; song, 0.157 ; rock, 0.146 ; dance, 0.109 ; hiphop, 0.108 ; musicdiscovery, 0.106
23esteemapp, 3.557information, 0.517 ; religious, 0.456 ; terp, 0.138 ; kmk, 0.102
24crypto, 4.410cryptocurrency, 4.403 ; crypto-news, 0.433
25macrophotography, 6.246acehmacro, 0.649 ; photocircle, 0.343 ; macro, 0.264 ; yourluckyphotos, 0.198 ; photocontests, 0.144
26drawing, 3.151painting, 0.725 ; sketch, 0.335 ; creativity, 0.270 ; slothicorn, 0.153 ; anime, 0.131 ; illustration, 0.111 ; coloredpencil, 0.107
27health, 4.736fitness, 0.300 ; tips, 0.103
28busy, 6.683
29kr-newbie, 3.017kr-writing, 0.344 ; kr-life, 0.334 ; kr, 0.125 ; kr-daily, 0.119
30cervantes, 3.487
31steem, 7.676steempress, 1.107 ; sbd, 0.151
32contest, 6.422challenge, 0.364 ; photobombchallenge, 0.188 ; openmic, 0.174 ; giveaway, 0.145 ; artstorm, 0.144 ; photobomb, 0.122 ; sports, 0.111 ; newbiegames, 0.109 ; monomad, 0.108 ; design, 0.104 ; thealliance, 0.101
33technology, 3.760science, 0.351 ; business, 0.155 ; steemstem, 0.120 ; steemdunk, 0.114
34inspiration, 2.517philosophy, 0.425 ; onequality, 0.360 ; quote, 0.345 ; spirituality, 0.155
35animals, 7.821animalphotography, 1.388 ; cats, 0.852 ; pets, 0.672 ; dogs, 0.486 ; cat, 0.397 ; dog, 0.370 ; dailypetphotography, 0.319 ; homesteading, 0.311 ; cute, 0.288 ; birds, 0.288 ; wildlife, 0.154 ; pet, 0.136 ; caturday, 0.134 ; monomad, 0.102
36games, 3.548start, 1.206 ; steemgar, 1.195 ; steemplayroom, 1.194 ; gameplay, 0.441 ; gamer, 0.438 ; letsplay, 0.407
37story, 3.813fiction, 0.127
38photo, 4.016
39cryptocurrency, 7.017ethereum, 0.536 ; altcoin, 0.247 ; trading, 0.239 ; altcoins, 0.166 ; eos, 0.102
40lifestyle, 5.566fitness, 0.319 ; fashion, 0.253 ; photocontest, 0.200 ; onequality, 0.187 ; steemdunk, 0.131 ; dlive-video, 0.112 ; beach, 0.104
41game, 5.256games, 0.121
42community, 3.597promo-steem, 0.647 ; curation, 0.432 ; teammalaysia, 0.146 ; freedom, 0.133 ; challenge30days, 0.132
43history, 2.932culture, 0.144
44money, 5.134trading, 0.260 ; finance, 0.258 ; altcoins, 0.224 ; business, 0.183 ; investing, 0.153 ; cryptonews, 0.134 ; stocks, 0.118
45landscapephotography, 6.021photocircle, 0.470 ; landscape, 0.364 ; yourluckyphotos, 0.225 ; photocontest, 0.176 ; newbieresteemday, 0.127 ; naturephotography, 0.126 ; mountains, 0.119 ; photofriend, 0.110 ; photomatic, 0.107 ; sunset, 0.106
46cn, 4.038cn-reader, 0.877 ; lizhi, 0.171 ; teammalaysia, 0.155 ; cn-book, 0.155 ; cn-malaysia, 0.153 ; book, 0.136 ; stats, 0.123
47news, 4.077geme, 0.144 ; comedy, 0.132
48blog, 7.167
49steemit, 7.670
50dlive-broadcast, 7.562dlive, 7.505 ; Dlive, 0.504 ; DLiveStar, 0.502 ; DliveStar, 0.323 ; Dlivestar, 0.267 ; fortnite, 0.259 ; dliver, 0.244 ; live, 0.237 ; stream, 0.230 ; dlivegaming, 0.214 ; Dunite, 0.201 ; learning, 0.200 ; of, 0.196 ; DLive, 0.195 ; , 0.182 ; Gaming, 0.174 ; DUnite, 0.171 ; steemgc, 0.149 ; pubg, 0.149 ; Fortnite, 0.146 ; dlivecommunity, 0.132 ; dliverewards, 0.127 ; Pubg, 0.113 ; csgo, 0.109 ; dunite, 0.103 ; league, 0.103
51steepshot, 6.456
52steemph, 4.632philippines, 0.840 ; cebu, 0.312 ; steemph-antipolo, 0.220 ; steemitachievers, 0.214 ; teardrops, 0.201 ; pilipinas, 0.168 ; tilphilippines, 0.165 ; dailyfoodphotography, 0.155 ; steemitdavao, 0.155
53smartphonephotography, 6.191yourluckyphotos, 0.130 ; smartphotography, 0.110
54video, 4.522youtube, 0.313 ; recipe, 0.201 ; people, 0.156 ; vlog, 0.155 ; dmania, 0.143
55aceh, 7.711acehmacro, 0.108
56stach, 5.362onequality, 0.921 ; promo-steem, 0.347 ; teardrops, 0.163 ; steem9ja, 0.122 ; sndbox, 0.114 ; joinsteemit, 0.100
57flowers, 4.427petals, 0.623 ; garden, 0.407 ; gardening, 0.239 ; plants, 0.162 ; summer, 0.130 ; spring, 0.124
58philippines, 4.744steempress, 1.228 ; cebu, 0.289 ; ulog, 0.166 ; redfish, 0.125
59castellano, 4.063venezuela, 1.064 ; literatura, 0.178 ; provenezuela, 0.114 ; proconocimiento, 0.105
60wafrica, 5.362reachout, 0.199
61ico, 6.998ethereum, 2.400 ; bounty, 0.349 ; crowdsale, 0.237 ; tokensale, 0.233 ; eth, 0.217 ; token, 0.153 ; many, 0.132 ; investment, 0.128 ; blokchain, 0.116 ; investing, 0.112 ; review, 0.105
62tr, 3.658cointurk, 1.568 ; destektr, 0.488 ; trliste, 0.477 ; kusadasi, 0.323 ; gif, 0.258 ; failarmy, 0.241 ; hede-io, 0.223 ; information, 0.175 ; anadolu, 0.108
63article, 6.533myanmar, 0.150
64motivation, 4.328onequality, 0.216 ; quote, 0.191 ; success, 0.189 ; quotes, 0.163
65poetry, 4.547poem, 0.868 ; steemitschoolpoetry, 0.209 ; poetsunited, 0.138 ; post, 0.117
66kr, 7.209coinkorea, 0.766 ; kr-life, 0.548 ; kr-writing, 0.483 ; kr-daily, 0.340 ; kr-event, 0.301 ; kr-art, 0.240 ; kr-pen, 0.236 ; kr-coin, 0.228 ; kr-diary, 0.157 ; tooza, 0.135 ; muksteem, 0.132 ; kr-news, 0.113 ; kr-travel, 0.113 ; kr-gazua, 0.112
67writing, 6.765fiction, 0.423 ; freewrite, 0.125 ; myanmar, 0.123
68christian-trail, 3.038steemchurch, 3.035 ; christianity, 2.615 ; religion, 1.764 ; bible, 0.559 ; spirituality, 0.385 ; communitynews, 0.304 ; faith, 0.255 ; flaminghelpers, 0.226 ; jesus, 0.192 ; gospel, 0.127 ; christian, 0.116 ; god, 0.104 ; ghana, 0.101
69esteem, 10.849sevendaybnwchallenge, 0.133
70world, 5.584creativity, 1.384 ; creative, 1.320 ; adventure, 0.738 ; usa, 0.292 ; amazing, 0.242 ; immigration, 0.194 ; of, 0.165 ; future, 0.155 ; archive, 0.142 ; people, 0.137 ; invention, 0.129 ; diy, 0.117 ; film, 0.113 ; cup, 0.101
71good-karma, 4.972time, 0.751 ; religious, 0.243 ; information, 0.240 ; steemdunk, 0.112 ; mso, 0.101
72colorchallenge, 3.949thursdaygreen, 0.127 ; wednesdayyellow, 0.102
73animal, 5.792animalphotography, 0.563 ; cat, 0.347 ; dog, 0.161 ; acehmacro, 0.111
74free, 2.315minnowsupport, 2.190 ; giveaway, 2.006 ; resteem, 0.145
75hot, 4.229trending, 3.951 ; promoted, 2.900 ; girls, 0.181 ; sexy, 0.176 ; asmr, 0.165 ; viral, 0.123
76family, 4.035happy, 0.279 ; muzakirpb, 0.188 ; freedom, 0.147 ; english, 0.135 ; culture, 0.130 ; familyprotection, 0.124 ; kids, 0.121 ; education, 0.120 ; ulog, 0.116 ; children, 0.104
77new, 4.442
78science, 3.635education, 3.413 ; steemstem, 0.751 ; steemiteducation, 0.622 ; people, 0.607 ; space, 0.224 ; sndbox, 0.159 ; psychology, 0.159 ; physics, 0.137 ; philosophy, 0.119 ; stemng, 0.118
79deutsch, 4.702steemit-austria, 0.557 ; german, 0.161 ; politik, 0.113 ; austria, 0.100
80movie, 3.429film, 2.223 ; review, 1.223 ; streaming, 0.804 ; films, 0.677 ; regarder, 0.661 ; movies, 0.539 ; ita, 0.297 ; comedy, 0.215 ; vn, 0.173 ; download, 0.162 ; full, 0.140 ; trailer, 0.127 ; drama, 0.126 ; avengers, 0.126 ; marvel, 0.110 ; romance, 0.108 ; online, 0.101
81natural, 5.424photografy, 0.289 ; healthy, 0.131 ; fruit, 0.116
82minnowsupport, 3.889investment, 3.589 ; bid-bot, 2.870 ; bid-bots, 2.869
83gaming, 7.295steemgc, 0.375 ; fortnite, 0.299 ; gameplay, 0.289 ; steemcraft, 0.281 ; letsplay, 0.264 ; gamer, 0.189 ; review, 0.142 ; votegame, 0.118 ; gamersunited, 0.113 ; ps4, 0.101
84jjangjjangman, 5.574muksteem, 0.517 ; kr-life, 0.419 ; kr-event, 0.308 ; kr-gazua, 0.186 ; kr-funfun, 0.184 ; dev, 0.156 ; it-news, 0.154 ; kr-travel, 0.144 ; kr-diary, 0.136 ; sharehows, 0.126 ; kr-overseas, 0.122 ; kr-youth, 0.121 ; kr-series, 0.109 ; kr-art, 0.107 ; kr-food, 0.102 ; kr-easy, 0.101
85airdrop, 3.320bounty, 1.363 ; token, 0.684 ; free, 0.236 ; airdrops, 0.203 ; upvote, 0.135 ; myanmar, 0.117 ; coin, 0.108
86spanish, 5.171venezuela, 0.284 ; steemitvenezuela, 0.120 ; talentclub, 0.114
87sports, 2.641football, 1.473 ; sport, 0.607 ; basketball, 0.487 ; soccer, 0.371 ; daily, 0.327 ; nba, 0.219 ; betting, 0.206 ; steemsports, 0.196
88indonesia, 8.426ksi, 0.743 ; garudakita, 0.225 ; promo-steem, 0.190 ; bogor, 0.157
89fun, 4.964picture, 0.131 ; meme, 0.119 ; humor, 0.118
90photofeed, 7.138photocircle, 1.444 ; photomatic, 0.722 ; portraitphotography, 0.471 ; yourluckyphotos, 0.412 ; goldenhourphotography, 0.297 ; streetphotography, 0.291 ; monomad, 0.261 ; bescouted, 0.259 ; architecturalphotography, 0.232 ; vehiclephotography, 0.184 ; cityscapephotography, 0.168 ; animalphotography, 0.137 ; photocontest, 0.130 ; photocontests, 0.130 ; petals, 0.116 ; bwphotocontest, 0.110
91nigeria, 4.588onequality, 0.432 ; africa, 0.298 ; ulog, 0.166 ; genesisproject, 0.126
92dlivestar, 5.661steemgc, 1.786 ; dunite, 1.679 ; dliver, 1.260 ; dlivegaming, 1.103 ; dlive, 0.993 ; dlive-broadcast, 0.942 ; fr, 0.492 ; dlive24hour, 0.327 ; fortnite, 0.263 ; lovedlive, 0.239 ; dliverewards, 0.236 ; dlivestarbooster, 0.211 ; adsactly, 0.165 ; dlivebroadcast, 0.143 ; dota2, 0.142 ; dlivecommunity, 0.125 ; bgame, 0.121 ; game, 0.117 ; dgamer, 0.107 ; dlivestreaming, 0.107 ; livestream, 0.105 ; teamdlive, 0.102 ; upvote, 0.101 ; dlivebooster, 0.100
93steemgigs, 3.356teardrops, 3.212 ; ulog, 0.945 ; surpassinggoogle, 0.854 ; ulogs, 0.453 ; thai, 0.405 ; steem-untalented, 0.199 ; gratefulvibes, 0.178 ; untalented-steemgigs, 0.119 ; sndbox, 0.111 ; philippines, 0.108 ; thaipowerup, 0.101
94funny, 6.766comedy, 0.873 ; geme, 0.350 ; humor, 0.331 ; memes, 0.237 ; gif, 0.184 ; joke, 0.179 ; jokes, 0.159 ; punchline, 0.155 ; lol, 0.124
95crypto, 7.408trading, 0.800 ; altcoin, 0.755 ; ethereum, 0.482 ; btc, 0.425 ; currency, 0.214 ; investing, 0.178 ; eos, 0.144 ; market, 0.120 ; eth, 0.119
96india, 3.745pictures, 0.162 ; indiaunited, 0.152 ; venezuela, 0.144 ; design, 0.117 ; culture, 0.101
97dsound, 3.321dsound-original, 1.956 ; ftlob, 0.435 ; musicvoter, 0.416 ; steem-music, 0.314 ; dsound-cover, 0.312 ; dsound-music, 0.306 ; podcast, 0.303 ; music, 0.286 ; dsound-podcast, 0.267 ; dsound-electronic, 0.227 ; dsound-dsound, 0.216 ; dsound-pop, 0.208 ; dsound-instrumental, 0.145 ; dsound-hiphop, 0.138 ; instrumental, 0.105
98nsfw, 2.889sexy, 1.681 ; porn, 1.002 ; sex, 0.678 ; nude, 0.497 ; girl, 0.420 ; girls, 0.311 ; hot, 0.276 ; adult, 0.274 ; woman, 0.250 ; dporn, 0.205 ; hentai, 0.202 ; japanese, 0.182 ; hardcore, 0.172 ; model, 0.165 ; anime, 0.157 ; pussy, 0.155 ; boobs, 0.138 ; asmr, 0.137 ; erotica, 0.117 ; manga, 0.106 ; ass, 0.105
99politics, 3.952businessandcommerce, 0.971 ; hollywoodandmovies, 0.971 ; musicandentertainment, 0.971 ; scienceandtechnology, 0.971 ; freedom, 0.442 ; informationwar, 0.427 ; trump, 0.175 ; anarchy, 0.164 ; conspiracy, 0.160 ; liberty, 0.109
Sort:  
Loading...

Error rates for different values of "p" (# of hidden features):

50 | 597.7057528779269
100 | 531.8807011557097
200 | 464.55805756242535
300 | 422.28096708552204

Congratulations @markgritter! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of posts published

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Do not miss the last post from @steemitboard!


Participate in the SteemitBoard World Cup Contest!
Collect World Cup badges and win free SBD
Support the Gold Sponsors of the contest: @good-karma and @lukestokes


Do you like SteemitBoard's project? Then Vote for its witness and get one more award!

Coin Marketplace

STEEM 0.23
TRX 0.25
JST 0.038
BTC 95317.76
ETH 3302.38
USDT 1.00
SBD 3.31