Code to Analyze the Top 100 Followed Accounts

in #steemit8 years ago (edited)

I've been wanting to play with the blockchain data for over a month now, and last night I finally carved off some time to get to know the API a little better. Big thanks to @jesta who helped me figure out how to chain calls together when accessing plugin API's like the follow_api.

Here's the basic idea:

curl https://node.steem.ws -d '{"jsonrpc":"2.0","method":"call", "params":[1, "get_api_by_name", ["follow_api"]],"id":0}'
{"id":0,"result":3}
curl https://node.steem.ws -d '{"jsonrpc":"2.0","method":"call", "params":[3, "get_followers", ["lukestokes","",1]],"id":0}'
{"id":0,"result":[{"id":"8.6.3206","follower":"aaronwebb","following":"lukestokes","what":["blog"]}]}

The first call gives you an identifer for the follow_api plugin which is 3.

The second call uses that identifier to then make a call to get_followers. The tricky part is, you can only get 100 at a time and the second parameter (the "" string in my example) isn't a number, it's an account name. You have to keep track of the last account you saw when paginating through the data.

I know this code is really rough as I put it together in just one night, but I wanted to share it with you anyway because I had fun building it out. It's written in PHP and will hopefully give you some ideas as well. I'll be adding it to php-steem-tools on Github shortly.

Simple wrapper to curl for making API calls

function call($method, $params) {
    global $debug;
    $request = getRequest($method, $params);
    $response = curl($request);
    if (array_key_exists('error', $response)) {
        var_dump($response['error']);
        die();
    }
    return $response['result'];
}

function getRequest($method, $params) {
    global $debug;
    $request = array(
        "jsonrpc" => "2.0",
        "method" => $method,
        "params" => $params,
        "id" => 0
        );
    $request_json = json_encode($request);

    if ($debug) { print $request_json . "\n"; }

    return $request_json;
}

function curl($data) {
    global $debug;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, 'https://node.steem.ws');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    $result = curl_exec($ch);

    if ($debug) { print $result . "\n"; }

    $result = json_decode($result, true);

    return $result;
}

Now we can just call call instead of having to create strings like {"jsonrpc":"2.0","method":"call", "params"...

Simple helper method for getting the follow_api identifier

function getFollowAPIID() {
    return getAPIID('follow_api');
}

function getAPIID($api_name) {
    global $apis;
    if (array_key_exists($api_name, $apis)) {
        return $apis[$api_name];
    }
    $response = call('call', array(1,'get_api_by_name',array('follow_api')));
    $apis[$api_name] = $response;
    return $response;
}

Yes, yes, I know, I'm using globals and that's BAD. This is just a quickie script to get the data I want. Note the use of caching in memory to avoid unnecessary calls to the API.

Get followers and follower count

function getFollowerCount($account) {
    $followers = getFollowers($account);
    return count($followers);
}

function getFollowers($account, $start = '') {
    $limit = 100;
    $followers = array();
    $followers = call('call', array(getFollowAPIID(),'get_followers',array($account,$start,$limit)));
    if (count($followers) == $limit) {
        $last_account = $followers[$limit-1];
        $more_followers = getFollowers($account, $last_account['follower']);
        array_pop($followers);
        $followers = array_merge($followers, $more_followers);
    }
    return $followers;
}

This call requires some recursion because the limit is 100. If we go beyond that limit, we call it again using the last account name we saw as the start. We have to pop it off our list so it doesn't get included twice.

Get all accounts on Steemit

function getAllAccounts() {
    $all_accounts = @file_get_contents('all_accounts.txt');
    if ($all_accounts) {
        $all_accounts = unserialize($all_accounts);
        print "Found " . count($all_accounts) . " accounts.\n";
        return $all_accounts;
    }
    $all_accounts = call('lookup_accounts', array('*',-1));
    print "Queried for " . count($all_accounts) . " accounts.\n";
    file_put_contents('all_accounts.txt',serialize($all_accounts));
    return $all_accounts;
}

This might actually exploit a bug because the code comments say the limit there should be 1000, but I was able to pass in -1 and get all 54k+ accounts at once. I store this to a file so I won't have to fetch them again while running this script.

Get accounts with at least one post

function getAccountsWithPosts($all_accounts, $start, $batch_size) {
    $some_accounts = array_slice($all_accounts,$start,$batch_size);
    $accounts_with_info = call('get_accounts', array($some_accounts));
    $active_accounts = filterForActiveAccounts($accounts_with_info);
    $account_account_names = array();
    return $active_accounts;
}

function filterForActiveAccounts($accounts) {
    $filtered_accounts = array();
    foreach($accounts as $account) {
        if ($account['post_count'] > 0) {
            $filtered_accounts[] = $account['name'];
        }
    }
    return $filtered_accounts;
}

When I first started working on this, I realized how many accounts are just miner accounts with no activity at all. Checking their follower count was a huge waste of time so I wrote this to filter them out in batches. I get all the details for a batch of accounts, and then only return the account names of the ones who have posted before.

Save accounts with at least one post to a file

function getAllAccountsWithPosts() {
    $all_accounts = getAllAccounts();
    $total = count($all_accounts);
    $start = @file_get_contents('getAllAccountsWithPosts_start.txt');
    if (!$start) {
        $start = 0;
        file_put_contents('getAllAccountsWithPosts_start.txt',$start);
    }
    $batch_size = 100;
    while($total > $batch_size) {
        file_put_contents('getAllAccountsWithPosts_start.txt',$start);
        $filtered_accounts = getAccountsWithPosts($all_accounts, $start, $batch_size);
        $start += $batch_size;
        $total -= $batch_size;
        print '.';
        foreach ($filtered_accounts as $filtered_account) {
            file_put_contents('getAllAccountsWithPosts_accounts.txt', $filtered_account . "\n", FILE_APPEND);
        }
    }
    $start -= $batch_size;
    $filtered_accounts = getAccountsWithPosts($all_accounts, $start, $total);
    print '.';
    foreach ($filtered_accounts as $filtered_account) {
        file_put_contents('getAllAccountsWithPosts_accounts.txt', $filtered_account . "\n", FILE_APPEND);
    }
}

This is where it gets a little fun. I built this script to dump data to two files getAllAccountsWithPosts_start.txt, which keeps track of where we are in the process so we can stop it at any time and start it where we let off, and getAllAccountsWithPosts_accounts.txt which stores the account names in a file which have at least one post.

Save follower accounts to a file

function saveFollowerCounts() {
    $min_threshold = 0;
    $follower_counts = array();
    // $all_accounts = getAllAccounts();
    $all_accounts = @file('getAllAccountsWithPosts_accounts.txt');
    if (!$all_accounts) {
        getAllAccountsWithPosts();
        $all_accounts = file('getAllAccountsWithPosts_accounts.txt');
    }
    $start = @file_get_contents('saveFollowerCounts_start.txt');
    if (!$start) {
        $start = 0;
        file_put_contents('saveFollowerCounts_start.txt',$start);
    }
    print "Starting at $start\n";
    for ($i = $start; $i<count($all_accounts); $i++) {
        $account = trim($all_accounts[$i]);
        if ($i % 100 == 0) {
            print $i;
        }
        print '.';
        $follower_count = getFollowerCount($account);
        file_put_contents('saveFollowerCounts_start.txt',$i);
        if ($follower_count > $min_threshold) {
            file_put_contents('saveFollowerCounts_counts.txt', $account . ',' . $follower_count . "\n", FILE_APPEND);
        }
    }
}

This method follows a similar pattern as getAllAccountsWithPosts. It can be started and stopped and will continue where it left off.

Finally, we print out the results

function printTopFollowed() {
    $number_of_top_accounts_to_show = 100;
    $accounts = array();
    $file = fopen("saveFollowerCounts_counts.txt","r");
    while(!feof($file)) {
        $line = fgetcsv($file);
        $accounts[trim($line[0])] = trim($line[1]);
    }
    fclose($file);

    arsort($accounts);

    $header = "|    |           Account|    Number of Followers   | \n";
    $header .= "|:--:|:----------------:|:------------------------:|\n";

    print "\n## <center>TOP $number_of_top_accounts_to_show USERS BY FOLLOWER COUNT </center>\n\n";
    print $header;
    $count = 0;
    foreach ($accounts as $account => $follower_count) {
        $count++;
        if ($count > $number_of_top_accounts_to_show) {
            break;
        }
        print '|  ' . $count . '  |' . sprintf('%15s','@'.$account) . ': |   ' . $follower_count . "   |\n";
    }
}

Pretty cool, right? I had so much fun putting this together.

None of this would be possible without the excellent work of @xeroc and @jesta to put together https://steem.ws/ which you can read more about here.

If you enjoy this stuff, follow my blog for more.

The Top 100 Followed Accounts

Sort:  

Thank for sharing this, I upvoted it :-) . Is there any wrapper in Java or Javascript to do the same job you did ? I'm not very good in PHP :-(

You can just use SteemWhales: https://steemwhales.com/?p=1&s=followers

When no devs people see this be like staring at monalisa picture with completely blank mind . Hahahahahaa

Heheh. We all have our talents and skills. I'm just glad mine are appreciated in this ecosystem.

Nice. This kind of API exploration is going to bring out a bunch of interesting results in the next few weeks, I think.

It's so much fun to analyze the data. I keep thinking of new ways to look at it.

Do you have an extensive background in coding, or how did you learn API programming?

I'm curious to learn more about how building apps on blockchain would work, though really don't know much about coding. Wondering the quickest, easiest way to get up to speed...

Seeing this, curious the extent of you coding experience...

I built my first website in 1996, and I majored in computer science at UPENN, so yeah... I have a quite a bit of programming experience. I'm also the primary developer for my company, FoxyCart.

That said, I think everyone should learn to code. It's like learning how to talk in the digital world. There are so many resources online that have gamified the process of learning. Much of what I learned, I taught myself, but it takes time, lots of patience, and a highly determined attitude. It can be really frustrating when things aren't working like you expect them to. It can also be really rewarding when you it all comes together like it did here.

Very creative and thorough. Thank you for the insight and hard work putting this together.

You're welcome!

Is there a way to see who follows you, either in the UI or via a simple script? Haven't been able to find that

Not yet, but you can use http://steem-o-graph.com/

sweet thanks!

Cool! Thanks for posting this. Now I can play arround with blockchains. hoping to create an app or something.

Excellent! Glad you found it useful.

I have bookmarked this page because I want to apply what you have put in here. But, pardon my ignorance, what language is this? It looks like C but C does not have $ signs in front of identifiers. Nor does any function in C, or C++, allow you to 'print "string"' like this. Even python 3 doesn't have this print statement format anymore. Thanks for clarifying, I just didn't see what language this was, and I think that's an important piece of information, even if it is salient.

It's PHP. There are plenty of posts on Steemit about using the API with Python if that's more familiar to you.

Awesome post @lukestokes

Quick question if I may. If I want to run this script, do I have to run it from my own hosting account? Oh before I forget, upvoted!!!

Thanks for the share :)

Wow, I haven't thought about this code in a long time. :)

You can run this locally without a problem. You may just want to use http://steemwhales.com/ though.

Lol I guess I just took you down memory lane :) Thank you for getting back to me @lukestokes and for the link. I think I will use the link you sent me. I LOVE STEEMIT ;)

very nice its written pretty clear easy to follow do you have any more resources on how to how to chain calls together when accessing plugin API's like the follow_api or know of any others?

Coin Marketplace

STEEM 0.30
TRX 0.12
JST 0.033
BTC 64093.86
ETH 3123.80
USDT 1.00
SBD 3.94