Discovering New Authors Faster

in #programming8 years ago

In a discussion with @patrice the other day, she asked me if it's possible to parse the blockchain in way in which we could find authors who made their first post in certain timeframe.

A couple of back and forth messages, and I had something to work on; I was excited to experiment with the recently released steemtools python library by @furion. Many thanks!

The purpose:

  • discover new users faster - from their first blogpost
  • eradicate spam/plagiarism in its infancy - from what I've seen, numerous new authors have been trying to fake it from their first post

This code can be further expanded/optimized for other purposes - come with ideas.

So, here's the rationale behind the algorithm:

  • parse all blocks from the last 24 hours (this could be modified)
  • look for comment operations (blogposts and comments)
  • restrict for blogposts
  • if this is the author's first blogpost, retrieve the link
  • output the link to the screen, and save it to file

The code is redundant on purpose: if it does not get executed to the end (god of code knows why?!), there's no file to look into for the links, so better be safe and output to the screen as you get the links.

However, the output to the screen will contain duplicates (some link may appear more than once). I can remediate the issues but the idea was to get this out immediately.

This does not matter too much though, because once the code is completely executed, the duplicates are removed in the text file, so you get a clean final output.

To run this, you obviously need Python and steemtools with all its dependencies. The guide to steemtools is provided by @furion in the release post.

Here's the code for my algorithm:

from steemtools.blockchain import Blockchain
from steemtools.base import Account, Post

b = Blockchain()

curr_block = b.get_current_block()
st_block = curr_block - 28800

# creating two lists, one will display results with duplicates
# the second list will be the clean one, without the duplicates
# removing duplicates could also be done with the set() function

postlist1 = []
postlist2 = []

# 'replaying' the blockchain for the timeframe established above
# filtering by comments (blogposts and comments)

for event in b.replay(start_block=st_block, end_block=curr_block, filter_by=['comment']):

    post = event['op']['permlink']
    author = event['op']['author']

# the meat of the code
# looking for blogposts from users with their first post in the last 24 hours

    if post[:3] != 're-' and len(Account(author).get_blog())==1:

        try:
            if Post.time_elapsed(Account(author).get_blog()[0])/3600 < 24:
                print(Post.get_url(Account(author).get_blog()[0]))
                postlist1.append(Post.get_url(Account(author).get_blog()[0]))

        except:
            continue

# removing duplicates

for item in postlist1:
    if item not in postlist2:
        postlist2.append(item)

# writing the final results to file

with open('newauthors.txt', 'wt') as f:
    for line in postlist2:
        f.write(line+'\n')

f.close()

You can get this code from my github as well. Use it at your discretion.

If some of you guys want to get to curating right away, I ran the algorithm a couple of hours ago. Here's the output file:

https://steemit.com/health/@auky/the-basic-concept-of-post-partum-hemorrhage
https://steemit.com/spirituality/@nathanhoskins/gnosis-the-knowledge-of-insight
https://steemit.com/politics/@worldwidewojnar/how-america-has-changed-since-9-11
https://steemit.com/technology/@snails/digital-dementia-virus
https://steemit.com/vaccine/@sal1/vaccine-morass
https://steemit.com/news/@moveup/hello-self-driving-cars-goodbye-4-1-million-jobs
https://steemit.com/entrepreneurship/@tygers/why-delegating-responsibilities-is-crucial-for-your-business-or-tygers-magazine
https://steemit.com/photograpy/@jeffborba/photographing-at-home
https://steemit.com/money/@gravity9/in-the-news-bitcoin-technology-block-chain-energy-and-the-electricity-revolution
https://steemit.com/marcopolo/@marcopolo/creation-of-the-new-marco-polo-coming-soon
https://steemit.com/life/@mktom79/if-you-don-t-know-where-to-start-just-start
https://steemit.com/ethreum/@bruteforce/ethreum-minier
https://steemit.com/dance/@elfede/a-dancer-s-story
https://steemit.com/yocoin/@cryptoking619/pay-your-bills-with-yocoin
https://steemit.com/story/@kovalkovich/my-first-flight
https://steemit.com/education/@saynt01/give-education-not-policy
https://steemit.com/bienvenido/@tenet/comenzamos
https://steemit.com/trip/@fer7rules/my-experience-planning-a-trip-to-copenhagen-denmark
https://steemit.com/introducemyself/@msendyart/hello-steemit-visual-artist-endy-introduction
https://steemit.com/hobo/@hobogus/no-regrets
https://steemit.com/zero/@cmc613/baked-pasta-fake-out-1-0-servings-at-195-calories
https://steemit.com/introduceyourself/@fess/hi-steemit-i-m-fess-life-love-magic-and-traveling-journey-in-a-life-itself-this-is-my-blog-where-i-will-share-stories-from-my
https://steemit.com/writing/@pentingharterman/altitude-fair
https://steemit.com/airpods/@mrtobias/wireless-earbuds-why-do-people-fear-they-ll-fall-out-or-apple-s-airpods-are-on-the-right-track
https://steemit.com/introduceyourself/@barcisz/introduction-mindfunk-included
https://steemit.com/equality/@dragana/in-response-to-kate-upton-s-ig-post-and-those-who-have-an-issue-with-colin-kaepernick-s-protests
https://steemit.com/chile/@chile-dog/fear-distrust-and-anarchy-in-chile
https://steemit.com/voluntaryism/@gnosis474/let-justice-be-done-though-the-heavens-fall
https://steemit.com/spanish/@alfredozofio/lo-que-no-te-han-contado-de-la-edad-media
https://steemit.com/art/@digitalart/sharing-digital-art-on-steemit
https://steemit.com/ilovebuttsex/@copycat114/leafy-wants-to-die
https://steemit.com/cryptocurrency/@vaughnpierre/will-the-july-2016-bitcoin-block-halving-set-off-another-bitcoin-bubble
https://steemit.com/introduceyourself/@andreaalexandria/andrea-alexandria-detoxification-educator-now-on-steemit
https://steemit.com/introduction/@thejdah/intro
https://steemit.com/story/@liamobtv/introduction-and-ama-ask-me-anything-about-living-and-working-on-an-ayahuacsa-retreat-centre-in-the-amazon-jungle
https://steemit.com/rant/@raptorpc/my-first-rant-ever-hope-steemit-can-handle-me
https://steemit.com/cn/@harambeisgod/the-beauty-of-chinese
https://steemit.com/wildwill/@wildwill420/music
https://steemit.com/technology/@jeragon775/some-food-for-thought
https://steemit.com/story/@honeywish/the-rainbow-and-the-witch-part-1
https://steemit.com/bitcoin/@gaby64/end-the-blocksize-debate-with-gigabit-internet-access
https://steemit.com/new/@kbailey01/6xb4as
https://steemit.com/aquarium/@reefzone/hello-world
https://steemit.com/dating/@joeghaleb/a-letter-to-an-ex-girlfriend-that-was-never-shared-not-even-to-her-until-now-but-only-to-all-of-you
https://steemit.com/multifamily/@mrbrandondsmith/commercial-real-estate-market-updates
https://steemit.com/introduceyourself/@seansclevername/hello-creators-of-all-types
https://steemit.com/poetry/@itsallgoneleft/long-day
https://steemit.com/government/@sanfords/a-design-for-a-social-network-to-replace-government
https://steemit.com/funny/@lil.missy/this-hundred-year-old-tortoise-had-so-much-sex-he-actually-saved-his-species-from-extinction
https://steemit.com/photography/@ronvalentino/love
https://steemit.com/poetry/@insession/fruit
https://steemit.com/awesomenyl/@simple-mhe20/introduce-yourself
https://steemit.com/runescape/@dethykins/runescape-how-you-can-make-a-living-playing-a-game
https://steemit.com/entrepreneur/@nickcownie/why-it-absolutely-sucks-to-be-an-entrepreneur

I'm always looking forward to getting my hands dirty with code, so if you have suggestions for others statistics and analytics, please let me know!


To stay in touch with me, follow @cristi

#programming #steemit #curie


Cristi Vlad, Self-Experimenter and Author

Sort:  

This post has been linked to from another place on Steem.

Learn more about linkback bot v0.4. Upvote if you want the bot to continue posting linkbacks for your posts. Flag if otherwise.

Built by @ontofractal

Holy shit. That is amazing work man.... well done

thank you!

I love this comes out like this.... best way to address it.... just look at it flaunted!!! That was 24 hours?

yes, the last 24 hours

This is good. This is what is needed to boost new writers.

maybe find good new writers

I think you may have been a little subtle with this line only

eradicate spam/plagiarism in its infancy - from what I've seen, numerous new authors have been trying to fake it from their first post

Showed it to a friend and don't think they connected lol

Really great job with this @cristi, I think this could be very useful going forward.

it could be helpful in the curation process

Completely agree

Indeed great to find new great authors and kill plagiarism and translations from the start (but probably a lot of work for following all of this).

well, if you have the links readily available, it's a lot easier than checking the track record for each user :)

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.029
BTC 61153.73
ETH 3403.85
USDT 1.00
SBD 2.51