[BIUtopian Dev] Analyzing STEEM Languages
Image Source credits to @plushzilla
This is another tool I use to extract data from the STEEM blockchain. Steps to install and use the tool is in the repository's README.
This will serve as a walkthrough of the program steem_languages.js
. It extracts the number of posts in different languages by date; this is from the STEEM blockchain as a whole not just Utopian.
Outline
- Clone, Install, and Run the Program
- Program Walkthrough
- Sample CSV Output
1. Clone, Install, and Run the Program
$ git clone [email protected]:eastmaels/biutopian.git
$ cd biutopian
$ npm install
$ node steem_languages.js
This will generate a CSV in the ./analysis
folder with filename formatted to languages_yyyymmdd_HHMMss.csv
. You can create your reports from this file.
2. Program Walkthrough
2.1. Declare the Dependencies
These are the codes that declare the dependencies.
const sql = require('mssql');
const fs = require('fs');
const dateFormat = require('dateformat');
The tool uses the following npm modules:
mssql
- for connecting to SteemSQL and run the SQL to get the record setfs
- filesystem module to write results to CSVdateformat
- for extracting the UTC time and formatting it to a date that is used for the CSV filename
2.2. Database Configuration
This is the configuration needed to connect to SteemSQL.
const config = {
server:'sql.steemsql.com',
database:'DBSteem',
user:'steemit',
password:'steemit',
requestTimeout : 1000000,
stream : true
};
2.3. Get Current Timestamp to use in CSV's filename
Get the timestamp to log to console and to be used later for CSV's filename. This timestamp important for because I use it to mention what time the data was extracted.
const now = new Date();
console.log(now);
2.4. Declare the SQL
const sqlStr = `
select sub.created, sub.lang, count(*) as lang_count from
(
select
ID,
convert(date, created) as created,
json_value(body_language, '$[0].language') as lang
from Comments (NOLOCK)
where
depth = 0
and ISJSON(body_language) > 0
and json_value(body_language, '$[0].isReliable') = 'true'
union
select
ID,
convert(date, created) as created,
json_value(body_language, '$[1].language') as lang
from Comments (NOLOCK)
where
depth = 0
and ISJSON(body_language) > 0
and json_value(body_language, '$[1].isReliable') = 'true'
union
select
ID,
convert(date, created) as created,
json_value(body_language, '$[2].language') as lang
from Comments (NOLOCK)
where
depth = 0
and ISJSON(body_language) > 0
and json_value(body_language, '$[2].isReliable') = 'true'
) sub
group by
sub.created, sub.lang
order by
sub.created, sub.lang desc
The body_language
is not stored on the blockchain. It's injected into SteemSQL. I queried this field and the maximum array I was able to was three, as such there were only three unions used.
2.5. Connect to SteemSQL
// connect to your database
sql.connect(config, function (err) {
...
}
2.6. Check for Errors, Output Number of Records to Console
For debugging purposes, output if any error occurred when connecting to the database. Errors such as no internet connection.
And then for reference later if the number of rows in the CSV is equal to the records extracted.
if (err) {
console.log(err)
return;
}
console.log("post count: " + result.recordset.length);
2.7. Declare the Output File and Write the CSV Header
const filetstamp = dateFormat(now, "UTC:yyyymmdd_HHMMss");
const outputCsv = `analysis/languages_${filetstamp}.csv`;
fs.writeFileSync(outputCsv, 'Date,Lang Code,Count\r\n');
2.8. Execute the SQL and Iterate Through the Results
result.recordset.forEach(function(item) {
...
});
2.8.1. Get the Creation Date and Write the Result to CSV
const created = dateFormat(item.created, "UTC:yyyy-mm-dd");
var dataToWrite = `${created},${item.lang},${item.lang_count}\r\n`
fs.appendFileSync(outputCsv, dataToWrite);
3. Sample CSV Output
Date,Lang Code,Count
1970-01-01,zu,1
1970-01-01,zh-Hant,35
1970-01-01,zh,147
1970-01-01,xx-Qaai,1
4. Github Repository
Codes are available in Github:
https://github.com/eastmaels/biutopian
Posted on Utopian.io - Rewarding Open Source Contributors
Thank you for the contribution. It has been approved.
You can contact us on Discord.
[utopian-moderator]
Thank you.
Hey @eastmael I am @utopian-io. I have just upvoted you!
Achievements
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Thank you.
Looks awesome! What was the conclusion of the analysis?
Hi @maxg! Thank you. The conclusion was that we have a diverse number of languages used to compose posts - 268 languages when the data was acquired. The result of the analysis can be found in here.
Anyway, the char that summarizes it is this:
Hope you like it as much as I enjoyed creating it.
Nice! Very cool. Will this be a recurring analysis?
Thank you for asking. Yes, will do it on a monthly basis, with the possibility of increasing its frequency should the demand for it be high. (^ ^) Thank you for pointing it out.
Oops. Forgot to logout from that account. XD
eastmael!! Thank you, your Post.
You're welcome. (^ ^)