Topic Extraction from Steem Tags
Topic Extraction from Steem Tags
Or: Dimensionality Reduction on the Blockchain
Or: A Not-So-Gentle Introduction to Non-Negative Matrix Factorization
Yesterday, I used covariance to analyze which Steem tags were used together. Today, I'll introduce a different technique and explore the related keywords that it exposes.
NNMF
Non-Negative Matrix Factorization is a dimensionality reduction technique; that is, it compresses data to a smaller number of features. When this process works well, it preserves the original data set, by explaining them as combinations of "hidden" features extracted from the data. All the features and combinations are given positive weights, which can make them easier to interpret compared to a technique like Singular Value Decomposition, which produces both positive and negative weights.
In linear algebra terms, NNMF is just matrix multiplication: V = WH.
V is a m x n matrix that represents the input data (or, an approximation of it.) Typically the rows are observations or data points and the columns are features that have been measured. In my experiment, m = 200000 discussions and n = 5180 common tags. A '1' entry in the matrix means that a particular discussion had that topic specified; otherwise it is '0'. (I had to cut down from my original data set of 1.1 million discussions due to lack of memory on the VM where I performed the analysis.)
W is a m x p matrix that maps each observation to a weighted sum of the "p" computed hidden features or "topics". In practice, a data scientist has to zero in on a good value for p; I choose p = 100 arbitrarily. So, this matrix maps from Steem conversation to the dimensionally-reduced topic set. We know that users use tags that are correlated, so if NNMF does a good job we should find tags that are semantically similar appearing in the same topic.
H is a p x n matrix that maps each topic back to the original features, the tags. You can think of it as an index saying what each topic "means".
The original matrix had about a billion entries though my implementation stores them sparsely: 200000 * 5180 = 1,036,000,000. The NNMF decomposition tries to store the same amount of data in 200000 * 100 + 100 * 5180 = 20,518,000 entries. If all the original measurements were real numbers, this would be a lot less data! But the original data is actually just binary, and the NNMF matrices are filled with floating-point numbers.
The scikit-learn NNMF implementation
I used the Python package scikit-learn which provides an NNMF implementation which supports a couple different algorithms. It takes the matrix V as input and finds the best approximation of V in the decomposed form WH. After the input matrix is set up, it's just a few lines to run the algorithm and extract the desired matrices:
model = NMF( n_components = 100, init = 'random' )
W = model.fit_transform( V )
H = model.components_
>>> V.shape
(200000, 5180)
>>> W.shape
(200000, 100)
>>> H.shape
(100, 5180)
>>> model.reconstruction_err_
532.8368170706713
The reconstruction error says how close to the original V we got using the "Frobenius Norm", which is a generalization of Euclidean distance. We could try different values of p (n_components
) to try to find a sweet spot where the number of topics is still small but the error is improved.
Steem topics from tags
The resulting output is quite noisy since the algorithm tries to optimize by making small changes to the weights; most topics have a lot of tags associated with them with very small values. I picked an arbitrary threshold of 0.1 for the data shown below and excluded weights lower than that.
Many weights are greater than 1.0, even though the original data has only 0 and 1! But the W matrix can and does use fractional values, not just to mix topics but also select a single topic at the appropriate weight. The number itself isn't as important as the relative weight within a topic.
Language-based topics
The algorithm did a good job finding tags related to a particular language or nationality and grouping them into topics:
topic number | highest-weight tag | other tags |
---|---|---|
29 | kr-newbie, 3.017 | kr-writing, 0.344 ; kr-life, 0.334 ; kr, 0.125 ; kr-daily, 0.119 |
66 | kr, 7.209 | coinkorea, 0.766 ; kr-life, 0.548 ; kr-writing, 0.483 ; kr-daily, 0.340 ; kr-event, 0.301 ; kr-art, 0.240 ; kr-pen, 0.236 ; kr-coin, 0.228 ; kr-diary, 0.157 ; tooza, 0.135 ; muksteem, 0.132 ; kr-news, 0.113 ; kr-travel, 0.113 ; kr-gazua, 0.112 |
84 | jjangjjangman, 5.574 | muksteem, 0.517 ; kr-life, 0.419 ; kr-event, 0.308 ; kr-gazua, 0.186 ; kr-funfun, 0.184 ; dev, 0.156 ; it-news, 0.154 ; kr-travel, 0.144 ; kr-diary, 0.136 ; sharehows, 0.126 ; kr-overseas, 0.122 ; kr-youth, 0.121 ; kr-series, 0.109 ; kr-art, 0.107 ; kr-food, 0.102 ; kr-easy, 0.101 |
46 | cn, 4.038 | cn-reader, 0.877 ; lizhi, 0.171 ; teammalaysia, 0.155 ; cn-book, 0.155 ; cn-malaysia, 0.153 ; book, 0.136 ; stats, 0.123 |
79 | deutsch, 4.702 | steemit-austria, 0.557 ; german, 0.161 ; politik, 0.113 ; austria, 0.100 |
86 | spanish, 5.171 | venezuela, 0.284 ; steemitvenezuela, 0.120 ; talentclub, 0.114 |
59 | castellano, 4.063 | venezuela, 1.064 ; literatura, 0.178 ; provenezuela, 0.114 ; proconocimiento, 0.105 |
Single-Tag topics
Tags such as #blog, #steemit, #photo, #phography, #life, are all common enough that the dimensionality reduction created features just for them, without any other significant tags.
Community interests
Photography appears in a couple different topics and NMF did a reasonable job associating semantically similar tags together:
topic number | highest-weight tag | other tags |
---|---|---|
1 | photography, 2.670 | |
38 | photo, 4.016 | |
19 | colourfulphotography, 4.443 | bescouted, 0.745 ; thealliance, 0.550 ; portraitphotography, 0.458 ; yourluckyphotos, 0.451 ; streetphotography, 0.430 ; sevendaybnwchallenge, 0.412 ; venezuela, 0.325 ; photocontests, 0.163 ; photoworld, 0.149 ; ru, 0.146 ; vn, 0.140 ; ua, 0.135 |
25 | macrophotography, 6.246 | acehmacro, 0.649 ; photocircle, 0.343 ; macro, 0.264 ; yourluckyphotos, 0.198 ; photocontests, 0.144 |
35 | animals, 7.821 | animalphotography, 1.388 ; cats, 0.852 ; pets, 0.672 ; dogs, 0.486 ; cat, 0.397 ; dog, 0.370 ; dailypetphotography, 0.319 ; homesteading, 0.311 ; cute, 0.288 ; birds, 0.288 ; wildlife, 0.154 ; pet, 0.136 ; caturday, 0.134 ; monomad, 0.102 |
45 | landscapephotography, 6.021 | photocircle, 0.470 ; landscape, 0.364 ; yourluckyphotos, 0.225 ; photocontest, 0.176 ; newbieresteemday, 0.127 ; naturephotography, 0.126 ; mountains, 0.119 ; photofriend, 0.110 ; photomatic, 0.107 ; sunset, 0.106 |
73 | animal, 5.792 | animalphotography, 0.563 ; cat, 0.347 ; dog, 0.161 ; acehmacro, 0.111 |
21 | beauty, 4.672 | girl, 0.441 ; woman, 0.350 ; japanese, 0.258 ; fashion, 0.233 ; fitness, 0.186 ; model, 0.160 ; portraitphotography, 0.115 ; makeup, 0.115 ; bescouted, 0.110 ; photocontest, 0.102 |
53 | smartphonephotography, 6.191 | yourluckyphotos, 0.130 ; smartphotography, 0.110 |
Porn terms cluster together strongly (suggesting perhaps a lack of originality in tag selection), with some overlap with #hot:
topic number | highest-weight tag | other tags |
---|---|---|
98 | nsfw, 2.889 | sexy, 1.681 ; porn, 1.002 ; sex, 0.678 ; nude, 0.497 ; girl, 0.420 ; girls, 0.311 ; hot, 0.276 ; adult, 0.274 ; woman, 0.250 ; dporn, 0.205 ; hentai, 0.202 ; japanese, 0.182 ; hardcore, 0.172 ; model, 0.165 ; anime, 0.157 ; pussy, 0.155 ; boobs, 0.138 ; asmr, 0.137 ; erotica, 0.117 ; manga, 0.106 ; ass, 0.105 |
5 | hot, 4.229 | trending, 3.951 ; promoted, 2.900 ; girls, 0.181 ; sexy, 0.176 ; asmr, 0.165 ; viral, 0.123 |
Science-related topics get grouped together too in a couple overlapping topics:
topic number | highest-weight tag | other tags |
---|---|---|
33 | technology, 3.760 | science, 0.351 ; business, 0.155 ; steemstem, 0.120 ; steemdunk, 0.114 |
78 | science, 3.635 | education, 3.413 ; steemstem, 0.751 ; steemiteducation, 0.622 ; people, 0.607 ; space, 0.224 ; sndbox, 0.159 ; psychology, 0.159 ; physics, 0.137 ; philosophy, 0.119 ; stemng, 0.118 |
The complete output is included below.
Conclusion
Non-negative Matrix Factorization is a standard dimensionality-reduction technique which I have demonstrated on Steem tags and discussions. It is frequently used for this sort of semantic clustering; in fact, that's the example scikit-learn provides! The decomposed form of the original data can be used for other purposes as well, such as similarity detection or recommendations; I am thinking of analyzing upvotes according to these topics rather than the raw tags. A more ambitious project would be to apply NNMF to word frequencies or occurence within the discussions themselves, not just the attached tags.
Complete topic list
topic number | highest-weight tag | other tags |
---|---|---|
0 | life, 1.952 | |
1 | photography, 2.670 | |
2 | artzone, 1.136 | slothicorn, 0.110 |
3 | untalented, 0.733 | steemitfamilyph, 0.138 |
4 | whalepower, 1.033 | |
5 | ocd-resteem, 2.955 | promo-steem, 0.375 ; slothicorn, 0.283 ; homesteading, 0.215 ; ita, 0.173 ; adsactly, 0.149 ; gardening, 0.102 |
6 | blockchain, 2.670 | |
7 | love, 2.222 | |
8 | bitcoin, 3.287 | ethereum, 0.207 |
9 | introduceyourself, 1.294 | introducemyself, 0.235 |
10 | beautiful, 2.843 | quotes, 0.735 ; amazing, 0.449 ; words, 0.349 ; minds, 0.334 ; lovefriday, 0.263 ; japanese, 0.249 ; happy, 0.232 ; girl, 0.210 ; woman, 0.155 ; nice, 0.107 |
11 | art, 3.709 | |
12 | food, 3.137 | recipe, 0.308 ; foodphotography, 0.253 ; cooking, 0.174 |
13 | travel, 2.515 | |
14 | nature, 5.634 | |
15 | dlive, 1.631 | dlive-video, 1.560 ; learning, 0.317 ; , 0.172 |
16 | dtube, 3.332 | vlog, 0.277 ; dtubedaily, 0.275 ; onelovedtube, 0.252 ; steempowerment, 0.117 |
17 | flower, 2.820 | |
18 | entertainment, 1.933 | people, 0.364 ; movies, 0.106 |
19 | colourfulphotography, 4.443 | bescouted, 0.745 ; thealliance, 0.550 ; portraitphotography, 0.458 ; yourluckyphotos, 0.451 ; streetphotography, 0.430 ; sevendaybnwchallenge, 0.412 ; venezuela, 0.325 ; photocontests, 0.163 ; photoworld, 0.149 ; ru, 0.146 ; vn, 0.140 ; ua, 0.135 |
20 | meme, 2.519 | dmania, 1.330 ; memes, 0.139 ; funny, 0.134 ; memeitlol, 0.115 |
21 | beauty, 4.672 | girl, 0.441 ; woman, 0.350 ; japanese, 0.258 ; fashion, 0.233 ; fitness, 0.186 ; model, 0.160 ; portraitphotography, 0.115 ; makeup, 0.115 ; bescouted, 0.110 ; photocontest, 0.102 |
22 | music, 5.211 | comedy, 0.411 ; geme, 0.405 ; openmic, 0.291 ; guitar, 0.159 ; song, 0.157 ; rock, 0.146 ; dance, 0.109 ; hiphop, 0.108 ; musicdiscovery, 0.106 |
23 | esteemapp, 3.557 | information, 0.517 ; religious, 0.456 ; terp, 0.138 ; kmk, 0.102 |
24 | crypto, 4.410 | cryptocurrency, 4.403 ; crypto-news, 0.433 |
25 | macrophotography, 6.246 | acehmacro, 0.649 ; photocircle, 0.343 ; macro, 0.264 ; yourluckyphotos, 0.198 ; photocontests, 0.144 |
26 | drawing, 3.151 | painting, 0.725 ; sketch, 0.335 ; creativity, 0.270 ; slothicorn, 0.153 ; anime, 0.131 ; illustration, 0.111 ; coloredpencil, 0.107 |
27 | health, 4.736 | fitness, 0.300 ; tips, 0.103 |
28 | busy, 6.683 | |
29 | kr-newbie, 3.017 | kr-writing, 0.344 ; kr-life, 0.334 ; kr, 0.125 ; kr-daily, 0.119 |
30 | cervantes, 3.487 | |
31 | steem, 7.676 | steempress, 1.107 ; sbd, 0.151 |
32 | contest, 6.422 | challenge, 0.364 ; photobombchallenge, 0.188 ; openmic, 0.174 ; giveaway, 0.145 ; artstorm, 0.144 ; photobomb, 0.122 ; sports, 0.111 ; newbiegames, 0.109 ; monomad, 0.108 ; design, 0.104 ; thealliance, 0.101 |
33 | technology, 3.760 | science, 0.351 ; business, 0.155 ; steemstem, 0.120 ; steemdunk, 0.114 |
34 | inspiration, 2.517 | philosophy, 0.425 ; onequality, 0.360 ; quote, 0.345 ; spirituality, 0.155 |
35 | animals, 7.821 | animalphotography, 1.388 ; cats, 0.852 ; pets, 0.672 ; dogs, 0.486 ; cat, 0.397 ; dog, 0.370 ; dailypetphotography, 0.319 ; homesteading, 0.311 ; cute, 0.288 ; birds, 0.288 ; wildlife, 0.154 ; pet, 0.136 ; caturday, 0.134 ; monomad, 0.102 |
36 | games, 3.548 | start, 1.206 ; steemgar, 1.195 ; steemplayroom, 1.194 ; gameplay, 0.441 ; gamer, 0.438 ; letsplay, 0.407 |
37 | story, 3.813 | fiction, 0.127 |
38 | photo, 4.016 | |
39 | cryptocurrency, 7.017 | ethereum, 0.536 ; altcoin, 0.247 ; trading, 0.239 ; altcoins, 0.166 ; eos, 0.102 |
40 | lifestyle, 5.566 | fitness, 0.319 ; fashion, 0.253 ; photocontest, 0.200 ; onequality, 0.187 ; steemdunk, 0.131 ; dlive-video, 0.112 ; beach, 0.104 |
41 | game, 5.256 | games, 0.121 |
42 | community, 3.597 | promo-steem, 0.647 ; curation, 0.432 ; teammalaysia, 0.146 ; freedom, 0.133 ; challenge30days, 0.132 |
43 | history, 2.932 | culture, 0.144 |
44 | money, 5.134 | trading, 0.260 ; finance, 0.258 ; altcoins, 0.224 ; business, 0.183 ; investing, 0.153 ; cryptonews, 0.134 ; stocks, 0.118 |
45 | landscapephotography, 6.021 | photocircle, 0.470 ; landscape, 0.364 ; yourluckyphotos, 0.225 ; photocontest, 0.176 ; newbieresteemday, 0.127 ; naturephotography, 0.126 ; mountains, 0.119 ; photofriend, 0.110 ; photomatic, 0.107 ; sunset, 0.106 |
46 | cn, 4.038 | cn-reader, 0.877 ; lizhi, 0.171 ; teammalaysia, 0.155 ; cn-book, 0.155 ; cn-malaysia, 0.153 ; book, 0.136 ; stats, 0.123 |
47 | news, 4.077 | geme, 0.144 ; comedy, 0.132 |
48 | blog, 7.167 | |
49 | steemit, 7.670 | |
50 | dlive-broadcast, 7.562 | dlive, 7.505 ; Dlive, 0.504 ; DLiveStar, 0.502 ; DliveStar, 0.323 ; Dlivestar, 0.267 ; fortnite, 0.259 ; dliver, 0.244 ; live, 0.237 ; stream, 0.230 ; dlivegaming, 0.214 ; Dunite, 0.201 ; learning, 0.200 ; of, 0.196 ; DLive, 0.195 ; , 0.182 ; Gaming, 0.174 ; DUnite, 0.171 ; steemgc, 0.149 ; pubg, 0.149 ; Fortnite, 0.146 ; dlivecommunity, 0.132 ; dliverewards, 0.127 ; Pubg, 0.113 ; csgo, 0.109 ; dunite, 0.103 ; league, 0.103 |
51 | steepshot, 6.456 | |
52 | steemph, 4.632 | philippines, 0.840 ; cebu, 0.312 ; steemph-antipolo, 0.220 ; steemitachievers, 0.214 ; teardrops, 0.201 ; pilipinas, 0.168 ; tilphilippines, 0.165 ; dailyfoodphotography, 0.155 ; steemitdavao, 0.155 |
53 | smartphonephotography, 6.191 | yourluckyphotos, 0.130 ; smartphotography, 0.110 |
54 | video, 4.522 | youtube, 0.313 ; recipe, 0.201 ; people, 0.156 ; vlog, 0.155 ; dmania, 0.143 |
55 | aceh, 7.711 | acehmacro, 0.108 |
56 | stach, 5.362 | onequality, 0.921 ; promo-steem, 0.347 ; teardrops, 0.163 ; steem9ja, 0.122 ; sndbox, 0.114 ; joinsteemit, 0.100 |
57 | flowers, 4.427 | petals, 0.623 ; garden, 0.407 ; gardening, 0.239 ; plants, 0.162 ; summer, 0.130 ; spring, 0.124 |
58 | philippines, 4.744 | steempress, 1.228 ; cebu, 0.289 ; ulog, 0.166 ; redfish, 0.125 |
59 | castellano, 4.063 | venezuela, 1.064 ; literatura, 0.178 ; provenezuela, 0.114 ; proconocimiento, 0.105 |
60 | wafrica, 5.362 | reachout, 0.199 |
61 | ico, 6.998 | ethereum, 2.400 ; bounty, 0.349 ; crowdsale, 0.237 ; tokensale, 0.233 ; eth, 0.217 ; token, 0.153 ; many, 0.132 ; investment, 0.128 ; blokchain, 0.116 ; investing, 0.112 ; review, 0.105 |
62 | tr, 3.658 | cointurk, 1.568 ; destektr, 0.488 ; trliste, 0.477 ; kusadasi, 0.323 ; gif, 0.258 ; failarmy, 0.241 ; hede-io, 0.223 ; information, 0.175 ; anadolu, 0.108 |
63 | article, 6.533 | myanmar, 0.150 |
64 | motivation, 4.328 | onequality, 0.216 ; quote, 0.191 ; success, 0.189 ; quotes, 0.163 |
65 | poetry, 4.547 | poem, 0.868 ; steemitschoolpoetry, 0.209 ; poetsunited, 0.138 ; post, 0.117 |
66 | kr, 7.209 | coinkorea, 0.766 ; kr-life, 0.548 ; kr-writing, 0.483 ; kr-daily, 0.340 ; kr-event, 0.301 ; kr-art, 0.240 ; kr-pen, 0.236 ; kr-coin, 0.228 ; kr-diary, 0.157 ; tooza, 0.135 ; muksteem, 0.132 ; kr-news, 0.113 ; kr-travel, 0.113 ; kr-gazua, 0.112 |
67 | writing, 6.765 | fiction, 0.423 ; freewrite, 0.125 ; myanmar, 0.123 |
68 | christian-trail, 3.038 | steemchurch, 3.035 ; christianity, 2.615 ; religion, 1.764 ; bible, 0.559 ; spirituality, 0.385 ; communitynews, 0.304 ; faith, 0.255 ; flaminghelpers, 0.226 ; jesus, 0.192 ; gospel, 0.127 ; christian, 0.116 ; god, 0.104 ; ghana, 0.101 |
69 | esteem, 10.849 | sevendaybnwchallenge, 0.133 |
70 | world, 5.584 | creativity, 1.384 ; creative, 1.320 ; adventure, 0.738 ; usa, 0.292 ; amazing, 0.242 ; immigration, 0.194 ; of, 0.165 ; future, 0.155 ; archive, 0.142 ; people, 0.137 ; invention, 0.129 ; diy, 0.117 ; film, 0.113 ; cup, 0.101 |
71 | good-karma, 4.972 | time, 0.751 ; religious, 0.243 ; information, 0.240 ; steemdunk, 0.112 ; mso, 0.101 |
72 | colorchallenge, 3.949 | thursdaygreen, 0.127 ; wednesdayyellow, 0.102 |
73 | animal, 5.792 | animalphotography, 0.563 ; cat, 0.347 ; dog, 0.161 ; acehmacro, 0.111 |
74 | free, 2.315 | minnowsupport, 2.190 ; giveaway, 2.006 ; resteem, 0.145 |
75 | hot, 4.229 | trending, 3.951 ; promoted, 2.900 ; girls, 0.181 ; sexy, 0.176 ; asmr, 0.165 ; viral, 0.123 |
76 | family, 4.035 | happy, 0.279 ; muzakirpb, 0.188 ; freedom, 0.147 ; english, 0.135 ; culture, 0.130 ; familyprotection, 0.124 ; kids, 0.121 ; education, 0.120 ; ulog, 0.116 ; children, 0.104 |
77 | new, 4.442 | |
78 | science, 3.635 | education, 3.413 ; steemstem, 0.751 ; steemiteducation, 0.622 ; people, 0.607 ; space, 0.224 ; sndbox, 0.159 ; psychology, 0.159 ; physics, 0.137 ; philosophy, 0.119 ; stemng, 0.118 |
79 | deutsch, 4.702 | steemit-austria, 0.557 ; german, 0.161 ; politik, 0.113 ; austria, 0.100 |
80 | movie, 3.429 | film, 2.223 ; review, 1.223 ; streaming, 0.804 ; films, 0.677 ; regarder, 0.661 ; movies, 0.539 ; ita, 0.297 ; comedy, 0.215 ; vn, 0.173 ; download, 0.162 ; full, 0.140 ; trailer, 0.127 ; drama, 0.126 ; avengers, 0.126 ; marvel, 0.110 ; romance, 0.108 ; online, 0.101 |
81 | natural, 5.424 | photografy, 0.289 ; healthy, 0.131 ; fruit, 0.116 |
82 | minnowsupport, 3.889 | investment, 3.589 ; bid-bot, 2.870 ; bid-bots, 2.869 |
83 | gaming, 7.295 | steemgc, 0.375 ; fortnite, 0.299 ; gameplay, 0.289 ; steemcraft, 0.281 ; letsplay, 0.264 ; gamer, 0.189 ; review, 0.142 ; votegame, 0.118 ; gamersunited, 0.113 ; ps4, 0.101 |
84 | jjangjjangman, 5.574 | muksteem, 0.517 ; kr-life, 0.419 ; kr-event, 0.308 ; kr-gazua, 0.186 ; kr-funfun, 0.184 ; dev, 0.156 ; it-news, 0.154 ; kr-travel, 0.144 ; kr-diary, 0.136 ; sharehows, 0.126 ; kr-overseas, 0.122 ; kr-youth, 0.121 ; kr-series, 0.109 ; kr-art, 0.107 ; kr-food, 0.102 ; kr-easy, 0.101 |
85 | airdrop, 3.320 | bounty, 1.363 ; token, 0.684 ; free, 0.236 ; airdrops, 0.203 ; upvote, 0.135 ; myanmar, 0.117 ; coin, 0.108 |
86 | spanish, 5.171 | venezuela, 0.284 ; steemitvenezuela, 0.120 ; talentclub, 0.114 |
87 | sports, 2.641 | football, 1.473 ; sport, 0.607 ; basketball, 0.487 ; soccer, 0.371 ; daily, 0.327 ; nba, 0.219 ; betting, 0.206 ; steemsports, 0.196 |
88 | indonesia, 8.426 | ksi, 0.743 ; garudakita, 0.225 ; promo-steem, 0.190 ; bogor, 0.157 |
89 | fun, 4.964 | picture, 0.131 ; meme, 0.119 ; humor, 0.118 |
90 | photofeed, 7.138 | photocircle, 1.444 ; photomatic, 0.722 ; portraitphotography, 0.471 ; yourluckyphotos, 0.412 ; goldenhourphotography, 0.297 ; streetphotography, 0.291 ; monomad, 0.261 ; bescouted, 0.259 ; architecturalphotography, 0.232 ; vehiclephotography, 0.184 ; cityscapephotography, 0.168 ; animalphotography, 0.137 ; photocontest, 0.130 ; photocontests, 0.130 ; petals, 0.116 ; bwphotocontest, 0.110 |
91 | nigeria, 4.588 | onequality, 0.432 ; africa, 0.298 ; ulog, 0.166 ; genesisproject, 0.126 |
92 | dlivestar, 5.661 | steemgc, 1.786 ; dunite, 1.679 ; dliver, 1.260 ; dlivegaming, 1.103 ; dlive, 0.993 ; dlive-broadcast, 0.942 ; fr, 0.492 ; dlive24hour, 0.327 ; fortnite, 0.263 ; lovedlive, 0.239 ; dliverewards, 0.236 ; dlivestarbooster, 0.211 ; adsactly, 0.165 ; dlivebroadcast, 0.143 ; dota2, 0.142 ; dlivecommunity, 0.125 ; bgame, 0.121 ; game, 0.117 ; dgamer, 0.107 ; dlivestreaming, 0.107 ; livestream, 0.105 ; teamdlive, 0.102 ; upvote, 0.101 ; dlivebooster, 0.100 |
93 | steemgigs, 3.356 | teardrops, 3.212 ; ulog, 0.945 ; surpassinggoogle, 0.854 ; ulogs, 0.453 ; thai, 0.405 ; steem-untalented, 0.199 ; gratefulvibes, 0.178 ; untalented-steemgigs, 0.119 ; sndbox, 0.111 ; philippines, 0.108 ; thaipowerup, 0.101 |
94 | funny, 6.766 | comedy, 0.873 ; geme, 0.350 ; humor, 0.331 ; memes, 0.237 ; gif, 0.184 ; joke, 0.179 ; jokes, 0.159 ; punchline, 0.155 ; lol, 0.124 |
95 | crypto, 7.408 | trading, 0.800 ; altcoin, 0.755 ; ethereum, 0.482 ; btc, 0.425 ; currency, 0.214 ; investing, 0.178 ; eos, 0.144 ; market, 0.120 ; eth, 0.119 |
96 | india, 3.745 | pictures, 0.162 ; indiaunited, 0.152 ; venezuela, 0.144 ; design, 0.117 ; culture, 0.101 |
97 | dsound, 3.321 | dsound-original, 1.956 ; ftlob, 0.435 ; musicvoter, 0.416 ; steem-music, 0.314 ; dsound-cover, 0.312 ; dsound-music, 0.306 ; podcast, 0.303 ; music, 0.286 ; dsound-podcast, 0.267 ; dsound-electronic, 0.227 ; dsound-dsound, 0.216 ; dsound-pop, 0.208 ; dsound-instrumental, 0.145 ; dsound-hiphop, 0.138 ; instrumental, 0.105 |
98 | nsfw, 2.889 | sexy, 1.681 ; porn, 1.002 ; sex, 0.678 ; nude, 0.497 ; girl, 0.420 ; girls, 0.311 ; hot, 0.276 ; adult, 0.274 ; woman, 0.250 ; dporn, 0.205 ; hentai, 0.202 ; japanese, 0.182 ; hardcore, 0.172 ; model, 0.165 ; anime, 0.157 ; pussy, 0.155 ; boobs, 0.138 ; asmr, 0.137 ; erotica, 0.117 ; manga, 0.106 ; ass, 0.105 |
99 | politics, 3.952 | businessandcommerce, 0.971 ; hollywoodandmovies, 0.971 ; musicandentertainment, 0.971 ; scienceandtechnology, 0.971 ; freedom, 0.442 ; informationwar, 0.427 ; trump, 0.175 ; anarchy, 0.164 ; conspiracy, 0.160 ; liberty, 0.109 |
Error rates for different values of "p" (# of hidden features):
50 | 597.7057528779269
100 | 531.8807011557097
200 | 464.55805756242535
300 | 422.28096708552204
Congratulations @markgritter! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
Award for the number of posts published
Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word
STOP
To support your work, I also upvoted your post!
Do not miss the last post from @steemitboard!
Participate in the SteemitBoard World Cup Contest!
Collect World Cup badges and win free SBD
Support the Gold Sponsors of the contest: @good-karma and @lukestokes