Dimensionality Reduction, subtle art of Principal Component Analysis (PCA)

in #mathematics7 years ago (edited)

We used to live in 2D or 3D world and we fell comfortable with this.

It's not difficult, even for a child to read some 2D or a 3D plot without any special training.


Source

But how can we represent 5 or 10 dimensions?


Let's take the example - this flower called iris:


https://www.americanmeadows.com/about-irises

And we could measure multiple parameters: height, width, the color of each individual segment... The same for leaves, and the whole plant. So it's easy to have some dataset with 100 samples (plants) and 10 dimensions (parameters for each)

How to imagine 10-dimensional plot?


Coordinates, X-Y-Z, those are 3...
We could have different colors for the 4 th dimension.
A different temperature for the 5 th.
Or a different texture (smooth vs rough) for the 6th dimension...

You name other 4 dimensions (smell, moisture, solidity...).

But yes I agree that it's impossible to do something useful with such representation

How can we reduce the dimensionality?


Imagine the simple XY scatter-plot, like this one:

Each dot can be represented with two coordinates, X and Y.

But...

We could rotate the axes in such a manner that we "fit" the values on the new X-axis.

In that case, we will have high variability along that new X-axis and basically no variability along the new Y-axis.

In other words, the dimensionality was reduced from 2 dimensions to 1 dimension.

Why do we need this?


Let's see the example from my old paper.

We wanted to see how the elements are distributed in plants.

So, we did XRF spectroscopy with imaging and we got the data for various elements

But how those elements are connected?
Do some elements appear together?

We can see that K and Cl are together, as well as P and Ca and the Mn, Cu and Zn, while the Fe is the outsider.

And we can observe this from 2D plot, althoug initially we had 8-dimensional dataset

Similar analyses?


  • ICA
  • MCR-ALS
  • PARAFAC
  • ICALab

References


Dučić, Tanja, et al. "Enhancement in statistical and image analysis for in situ µSXRF studies of elemental distribution and co-localization, using Dioscorea balcanica." Journal of synchrotron radiation 20.2 (2013): 339-346. pdf

Kaiser, Henry F. "The varimax criterion for analytic rotation in factor analysis." Psychometrika 23.3 (1958): 187-200. pdf

Sort:  

The @OriginalWorks bot has determined this post by @alexs1320 to be original material and upvoted it!

ezgif.com-resize.gif

To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!

Coin Marketplace

STEEM 0.20
TRX 0.13
JST 0.030
BTC 66794.56
ETH 3501.55
USDT 1.00
SBD 2.71