Plotting distrowatch statistics data

in #utopian-io8 years ago (edited)

Recently in my previous post I published a script that gathers data about all current Distrowatch Linux distributions.

But what's next - we should do something with this data? We're not going to make a website clone, but rather semi-automatically generate some fancy gnuplot charts.

So this post is 2-in-1: small gnuplot tutorial with samples and analytical post with recent analysis results. Let's begin!

Start with the repo

  1. Clone it: git clone https://github.com/sxiii/distrowatch-scraper - hope you have git.
  2. Change directory to inside folder: cd distrowa*
  3. Gather data with ./parse.sh script as stated - just run it.
  4. Reformat gathered data with ./results-parse.sh
  5. Run ./plotall.sh
  6. That's it! You should have 4 fresh PNG graphics laying in the folder.

Requirements

  • gnuplot installed (sudo apt install gnuplot for ubuntu)
  • text editor (to edit the scripts)
  • graphics viewer (to see the result)
  • imagemagick installed (convert command to rotate one of the images)
  • ubuntu, manjaro or any other recent distro

Why you might need this

  • You would like to see current actual data on linux distros
  • You like to survey or find information about distributions
  • You're writing a diploma or analytical work
  • You want to learn gnuplot
  • You're curious on stastistics
  • You're studying how to write scripts and/or crawlers/scrapers
  • Your own reason?

You might also manually draw countries heatmap with openheatmap.com. This function wasn't implemented with gnuplot because of overweighting of the project.

If something goes wrong

Please check the scripts source code, installed dependencies. If nothing helps, please fill me an issue here in github. Thanks.

Analysis results

After we follow the small tutorial, here what we should got.

Linux Architectures

Arch

Distros families that linux are often based of

BasedOn

Statuses of Linux distros from Distrowatch.com

Statuses

Countries where distros are more often produced

Contries

Widely used Linux desktop environments

Desktops

Sample results (file arch.list)

This is how .list or .csv files should look like for plotting scripts to work.

arc, 5
arm, 53
arm64, 8
armhf, 28
i386, 407
i486, 119
i586, 53
i686, 95
powerpc, 4
x86_64, 127

Sample plot file that takes the data

set title 'Architectures of Linux'
set terminal png enhanced font "Ubuntu" 16

set grid
set tics out nomirror
set border 3 front linetype black linewidth 1.0 dashtype solid

set xrange [-1:10]
set xtics 1 rotate by 90 offset 0,-4
set bmargin 6

set yrange [0:500]

set style line 1 linecolor rgb '#0060ad' linetype 1 linewidth 2
set style histogram clustered gap 1 title offset character 0, 0, 0
set style data histograms

set boxwidth 1.0 absolute
set style fill solid 5.0 border -1

set output 'arch.png'

plot 'arch.list' using 2:xtic(1) notitle

Based on sample data

Arch, 31
CentOS, 17
Debian, 447
Fedora, 92
FreeBSD, 19
Gentoo, 29
Independent, 124
KNOPPIX, 48
LFS, 10
Mandriva, 22
OpenBSD, 6
openSUSE, 10
PCLinuxOS, 9
Puppy, 9
RedHat, 46
Slackware, 65
Solaris, 9
Ubuntu, 161

Sample plot file that takes the data

set title 'Which distros are mostly used to make your own'
set terminal png enhanced font "Ubuntu" 14

set grid
set tics out nomirror
set border 3 front linetype black linewidth 1.0 dashtype solid

set xrange [-1:18]
set xtics 1 rotate by 90 offset 0,-5
set bmargin 7

set yrange [0:500]

set style histogram clustered gap 1 title offset character 0, 0, 0
set style data histograms

set boxwidth 1.0 absolute
set style fill solid 5.0 border -1

set output 'basedon.png'

plot 'basedon.list' using 2:xtic(1) notitle lc rgb "green"

Linux distribution statuses sample data

Status,Number
Active,306
Dormant,51
Discontinued,511

Linux statuses plot script

#!/usr/bin/gnuplot -persist
reset

dataname = 'status.list'
set datafile separator ','

# get STATS_sum (sum of column 2) and STATS_records 
stats dataname u 2 noout    

# define angles and percentages
ang(x)=x*360.0/STATS_sum        # get angle (grades)
perc(x)=x*100.0/STATS_sum       # get percentage

set title 'Linux Distribution Statuses'
set terminal png enhanced font "Ubuntu" 16

set output 'status.png'

set label 1 "Pie Chart" at graph 00.1,0.95 left

set xrange [-1.5:2.5]     # length (2.5+1.5) = 4
set yrange [-2:2]         # length (2+2) = 4
set style fill solid 1

# unset border            # remove axis
unset tics                # remove tics on axis
unset colorbox            # remove palette colorbox 
unset key                 # remove titles

# some parameters 
Ai = 15.0;                # init angle
mid = 0.0;                # mid angle

# this defines the colors yellow~FFC90E, and blue~1729A8
# set palette defined (1 '#FFC90E', 2 '#1729A8')      # format '#RRGGBB'
set palette defined (1 1 0.788 0.055, 2 0.090 0.161 0.659) # format R G B (scaled to [0,1])


plot for [i=1:STATS_records] dataname u (0):(0):(1):(Ai):(Ai=Ai+ang($2)):(i) every ::i::i with circle linecolor palette,\
     dataname u (mid=(Ai+ang($2)), Ai=2*mid-Ai, mid=mid*pi/360.0, -0.5*cos(mid)):(-0.5*sin(mid)):(sprintf('%02d', $2, perc($2))) ever\
y ::1 w labels center font ',10',\
     for [i=1:STATS_records] dataname u (1.45):(i*0.25):1 every ::i::i with labels left,\
     for [i=1:STATS_records] '+' u (1.3):(i*0.25):(i) pt 5 ps 4 lc palette    

unset output

Interested in having additional information on linux distros?

Please make your own data-gathering script and .plot script (gnuplot); then commit them to this project!

Futher plans

  • Automate the script even more
  • Make automatically generated recent data on the Linux world
  • Publish the plots live on special website
  • Your idea? Please drop me an issue on github!

Useful links on the topic

Yours, independent steemit and golos author,

Den Ivanov aka @sxiii from Rostov-on-Don



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Your contribution cannot be approved because it is not as informative as other contributions. See the Utopian Rules. Contributions need to be informative and descriptive in order to help readers and developers understand them.

Hi @sxiii, great work on the scraper script! However, showing the current state of distrowatch is not sufficient as an analysis contribution. One one hand, the data is about distrowatch and not about your scraper. I couldn't find distrowatch as open source project on github. On the other hand your contribution shows the plots as they pop out of your tool as a snapshot of the current state without a relation to prior or expected future states. So the analysis aspect is a bit short.

You can contact us on Discord.
[utopian-moderator]

Hi @crokkon! Thanks for your specifying the reasons. I will think about making the data slices for different periods and then making a larger contribution with analysis. Maybe few month later :)

Thanks for your understanding! Better make sure distrowatch moves to github until then :) It's a bit a corner case for an analysis of your tool, since the data is actually about distrowatch's database and not directly related to your tool.

Coin Marketplace

STEEM 0.04
TRX 0.32
JST 0.074
BTC 64438.63
ETH 1681.29
USDT 1.00
SBD 0.42