Plotting distrowatch statistics data
Recently in my previous post I published a script that gathers data about all current Distrowatch Linux distributions.
But what's next - we should do something with this data? We're not going to make a website clone, but rather semi-automatically generate some fancy gnuplot charts.
So this post is 2-in-1: small gnuplot tutorial with samples and analytical post with recent analysis results. Let's begin!
Start with the repo
- Clone it:
git clone https://github.com/sxiii/distrowatch-scraper- hope you have git. - Change directory to inside folder:
cd distrowa* - Gather data with
./parse.shscript as stated - just run it. - Reformat gathered data with
./results-parse.sh - Run
./plotall.sh - That's it! You should have 4 fresh PNG graphics laying in the folder.
Requirements
- gnuplot installed (
sudo apt install gnuplotfor ubuntu) - text editor (to edit the scripts)
- graphics viewer (to see the result)
- imagemagick installed (convert command to rotate one of the images)
- ubuntu, manjaro or any other recent distro
Why you might need this
- You would like to see current actual data on linux distros
- You like to survey or find information about distributions
- You're writing a diploma or analytical work
- You want to learn gnuplot
- You're curious on stastistics
- You're studying how to write scripts and/or crawlers/scrapers
- Your own reason?
You might also manually draw countries heatmap with openheatmap.com. This function wasn't implemented with gnuplot because of overweighting of the project.
If something goes wrong
Please check the scripts source code, installed dependencies. If nothing helps, please fill me an issue here in github. Thanks.
Analysis results
After we follow the small tutorial, here what we should got.
Linux Architectures
Distros families that linux are often based of
Statuses of Linux distros from Distrowatch.com
Countries where distros are more often produced
Widely used Linux desktop environments
Sample results (file arch.list)
This is how .list or .csv files should look like for plotting scripts to work.
arc, 5
arm, 53
arm64, 8
armhf, 28
i386, 407
i486, 119
i586, 53
i686, 95
powerpc, 4
x86_64, 127
Sample plot file that takes the data
set title 'Architectures of Linux'
set terminal png enhanced font "Ubuntu" 16
set grid
set tics out nomirror
set border 3 front linetype black linewidth 1.0 dashtype solid
set xrange [-1:10]
set xtics 1 rotate by 90 offset 0,-4
set bmargin 6
set yrange [0:500]
set style line 1 linecolor rgb '#0060ad' linetype 1 linewidth 2
set style histogram clustered gap 1 title offset character 0, 0, 0
set style data histograms
set boxwidth 1.0 absolute
set style fill solid 5.0 border -1
set output 'arch.png'
plot 'arch.list' using 2:xtic(1) notitle
Based on sample data
Arch, 31
CentOS, 17
Debian, 447
Fedora, 92
FreeBSD, 19
Gentoo, 29
Independent, 124
KNOPPIX, 48
LFS, 10
Mandriva, 22
OpenBSD, 6
openSUSE, 10
PCLinuxOS, 9
Puppy, 9
RedHat, 46
Slackware, 65
Solaris, 9
Ubuntu, 161
Sample plot file that takes the data
set title 'Which distros are mostly used to make your own'
set terminal png enhanced font "Ubuntu" 14
set grid
set tics out nomirror
set border 3 front linetype black linewidth 1.0 dashtype solid
set xrange [-1:18]
set xtics 1 rotate by 90 offset 0,-5
set bmargin 7
set yrange [0:500]
set style histogram clustered gap 1 title offset character 0, 0, 0
set style data histograms
set boxwidth 1.0 absolute
set style fill solid 5.0 border -1
set output 'basedon.png'
plot 'basedon.list' using 2:xtic(1) notitle lc rgb "green"
Linux distribution statuses sample data
Status,Number
Active,306
Dormant,51
Discontinued,511
Linux statuses plot script
#!/usr/bin/gnuplot -persist
reset
dataname = 'status.list'
set datafile separator ','
# get STATS_sum (sum of column 2) and STATS_records
stats dataname u 2 noout
# define angles and percentages
ang(x)=x*360.0/STATS_sum # get angle (grades)
perc(x)=x*100.0/STATS_sum # get percentage
set title 'Linux Distribution Statuses'
set terminal png enhanced font "Ubuntu" 16
set output 'status.png'
set label 1 "Pie Chart" at graph 00.1,0.95 left
set xrange [-1.5:2.5] # length (2.5+1.5) = 4
set yrange [-2:2] # length (2+2) = 4
set style fill solid 1
# unset border # remove axis
unset tics # remove tics on axis
unset colorbox # remove palette colorbox
unset key # remove titles
# some parameters
Ai = 15.0; # init angle
mid = 0.0; # mid angle
# this defines the colors yellow~FFC90E, and blue~1729A8
# set palette defined (1 '#FFC90E', 2 '#1729A8') # format '#RRGGBB'
set palette defined (1 1 0.788 0.055, 2 0.090 0.161 0.659) # format R G B (scaled to [0,1])
plot for [i=1:STATS_records] dataname u (0):(0):(1):(Ai):(Ai=Ai+ang($2)):(i) every ::i::i with circle linecolor palette,\
dataname u (mid=(Ai+ang($2)), Ai=2*mid-Ai, mid=mid*pi/360.0, -0.5*cos(mid)):(-0.5*sin(mid)):(sprintf('%02d', $2, perc($2))) ever\
y ::1 w labels center font ',10',\
for [i=1:STATS_records] dataname u (1.45):(i*0.25):1 every ::i::i with labels left,\
for [i=1:STATS_records] '+' u (1.3):(i*0.25):(i) pt 5 ps 4 lc palette
unset output
Interested in having additional information on linux distros?
Please make your own data-gathering script and .plot script (gnuplot); then commit them to this project!
Futher plans
- Automate the script even more
- Make automatically generated recent data on the Linux world
- Publish the plots live on special website
- Your idea? Please drop me an issue on github!
Useful links on the topic
- https://distrowatch.com - the data source;
- http://gnuplot.info/demos - gnuplot demos;
- https://sourceforge.net/projects/gnuplot - gnuplot sources;
- https://distrowatch.com/weekly.php?issue=current - this week's statistics;
- https://github.com/FabioLolix/LinuxTimeline - linux timeline;
- https://en.wikipedia.org/wiki/Linux_distribution - general info on linux distros.
Yours, independent steemit and golos author,
Den Ivanov aka @sxiii from Rostov-on-Don
Posted on Utopian.io - Rewarding Open Source Contributors





Your contribution cannot be approved because it is not as informative as other contributions. See the Utopian Rules. Contributions need to be informative and descriptive in order to help readers and developers understand them.
Hi @sxiii, great work on the scraper script! However, showing the current state of distrowatch is not sufficient as an analysis contribution. One one hand, the data is about distrowatch and not about your scraper. I couldn't find distrowatch as open source project on github. On the other hand your contribution shows the plots as they pop out of your tool as a snapshot of the current state without a relation to prior or expected future states. So the analysis aspect is a bit short.
You can contact us on Discord.
[utopian-moderator]
Hi @crokkon! Thanks for your specifying the reasons. I will think about making the data slices for different periods and then making a larger contribution with analysis. Maybe few month later :)
Thanks for your understanding! Better make sure distrowatch moves to github until then :) It's a bit a corner case for an analysis of your tool, since the data is actually about distrowatch's database and not directly related to your tool.