[BIUtopian Dev] Analyzing STEEM Languages

in #utopian-io6 years ago (edited)

image.png

Image Source credits to @plushzilla

This is another tool I use to extract data from the STEEM blockchain. Steps to install and use the tool is in the repository's README.

This will serve as a walkthrough of the program steem_languages.js. It extracts the number of posts in different languages by date; this is from the STEEM blockchain as a whole not just Utopian.

Outline

  1. Clone, Install, and Run the Program
  2. Program Walkthrough
  3. Sample CSV Output

1. Clone, Install, and Run the Program

$ git clone [email protected]:eastmaels/biutopian.git
$ cd biutopian
$ npm install
$ node steem_languages.js

This will generate a CSV in the ./analysis folder with filename formatted to languages_yyyymmdd_HHMMss.csv. You can create your reports from this file.

2. Program Walkthrough

2.1. Declare the Dependencies

These are the codes that declare the dependencies.

const sql = require('mssql');
const fs = require('fs');
const dateFormat = require('dateformat');

The tool uses the following npm modules:

  • mssql - for connecting to SteemSQL and run the SQL to get the record set
  • fs - filesystem module to write results to CSV
  • dateformat - for extracting the UTC time and formatting it to a date that is used for the CSV filename

2.2. Database Configuration

This is the configuration needed to connect to SteemSQL.

const config = {
    server:'sql.steemsql.com',
    database:'DBSteem',
    user:'steemit',
    password:'steemit',
    requestTimeout : 1000000,
    stream : true
};

2.3. Get Current Timestamp to use in CSV's filename

Get the timestamp to log to console and to be used later for CSV's filename. This timestamp important for because I use it to mention what time the data was extracted.

const now = new Date();
console.log(now);

2.4. Declare the SQL

const sqlStr = `
select sub.created, sub.lang, count(*) as lang_count from 
(
    select 
    ID,
    convert(date, created) as created,
    json_value(body_language, '$[0].language') as lang
     from Comments (NOLOCK)
        where 
        depth = 0
        and ISJSON(body_language) > 0
        and json_value(body_language, '$[0].isReliable') = 'true'
    union
    select 
    ID,
    convert(date, created) as created,
    json_value(body_language, '$[1].language') as lang
     from Comments (NOLOCK)
        where 
        depth = 0
        and ISJSON(body_language) > 0
        and json_value(body_language, '$[1].isReliable') = 'true'
    union
    select 
    ID,
    convert(date, created) as created,
    json_value(body_language, '$[2].language') as lang
     from Comments (NOLOCK)
        where 
        depth = 0
        and ISJSON(body_language) > 0
        and json_value(body_language, '$[2].isReliable') = 'true'
) sub
group by 
    sub.created, sub.lang
order by 
  sub.created, sub.lang desc

The body_language is not stored on the blockchain. It's injected into SteemSQL. I queried this field and the maximum array I was able to was three, as such there were only three unions used.

2.5. Connect to SteemSQL

// connect to your database
sql.connect(config, function (err) {
...
}

2.6. Check for Errors, Output Number of Records to Console

For debugging purposes, output if any error occurred when connecting to the database. Errors such as no internet connection.

And then for reference later if the number of rows in the CSV is equal to the records extracted.

    if (err) {
      console.log(err)
      return;
    }

    console.log("post count: " + result.recordset.length);

2.7. Declare the Output File and Write the CSV Header

    const filetstamp = dateFormat(now, "UTC:yyyymmdd_HHMMss");
    const outputCsv = `analysis/languages_${filetstamp}.csv`;
    fs.writeFileSync(outputCsv, 'Date,Lang Code,Count\r\n');

2.8. Execute the SQL and Iterate Through the Results

    result.recordset.forEach(function(item) {
      ...
    });

2.8.1. Get the Creation Date and Write the Result to CSV

      const created = dateFormat(item.created, "UTC:yyyy-mm-dd");
      var dataToWrite = `${created},${item.lang},${item.lang_count}\r\n`
      fs.appendFileSync(outputCsv, dataToWrite);

3. Sample CSV Output

Date,Lang Code,Count
1970-01-01,zu,1
1970-01-01,zh-Hant,35
1970-01-01,zh,147
1970-01-01,xx-Qaai,1

4. Github Repository

Codes are available in Github:
https://github.com/eastmaels/biutopian



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Thank you for the contribution. It has been approved.

You can contact us on Discord.
[utopian-moderator]

Hey @eastmael I am @utopian-io. I have just upvoted you!

Achievements

  • Seems like you contribute quite often. AMAZING!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

Looks awesome! What was the conclusion of the analysis?

Hi @maxg! Thank you. The conclusion was that we have a diverse number of languages used to compose posts - 268 languages when the data was acquired. The result of the analysis can be found in here.

Anyway, the char that summarizes it is this:

Hope you like it as much as I enjoyed creating it.

Nice! Very cool. Will this be a recurring analysis?

Thank you for asking. Yes, will do it on a monthly basis, with the possibility of increasing its frequency should the demand for it be high. (^ ^) Thank you for pointing it out.

Oops. Forgot to logout from that account. XD

eastmael!! Thank you, your Post.

You're welcome. (^ ^)

Coin Marketplace

STEEM 0.34
TRX 0.11
JST 0.034
BTC 66344.62
ETH 3214.81
USDT 1.00
SBD 4.37