WHR (2017): Principal Component Analysis

Introduction

The first World Happiness Report was published in April, 2012, in support of the UN High Level Meeting on happiness and well-being. Since then the world has come a long way. Increasingly, happiness is considered to be the proper measure of social progress and the goal of public policy. In June 2016 the OECD committed itself “to redefine the growth narrative to put people’s well-being at the center of governments’ efforts”. In February 2017, the United Arab Emirates held a full-day World Happiness meeting, as part of the World Government Summit. Now on World Happiness Day, March 20th, we launch the World Happiness Report 2017, once again back at the United Nations, again published by the Sustainable Development Solutions Network, and now supported by a generous three-year grant from the Ernesto Illy Foundation.

Report link: http://worldhappiness.report/

Objectif

Find a relationship between the variable “country” and other indicators to understand the factors of the general trend of happiness.

Import necessary packages

library(FactoMineR)
library(factoextra)

## Loading required package: ggplot2

## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ

library(ggplot2)
library(summarytools)

Import Data

Happy = read.csv("data/principal_component_analysis/2017.csv")
head(Happy)

##       Country Happiness.Rank Happiness.Score Whisker.high Whisker.low
## 1      Norway              1           7.537     7.594445    7.479556
## 2     Denmark              2           7.522     7.581728    7.462272
## 3     Iceland              3           7.504     7.622030    7.385970
## 4 Switzerland              4           7.494     7.561772    7.426227
## 5     Finland              5           7.469     7.527542    7.410458
## 6 Netherlands              6           7.377     7.427426    7.326574
##   Economy..GDP.per.Capita.   Family Health..Life.Expectancy.   Freedom
## 1                 1.616463 1.533524                0.7966665 0.6354226
## 2                 1.482383 1.551122                0.7925655 0.6260067
## 3                 1.480633 1.610574                0.8335521 0.6271626
## 4                 1.564980 1.516912                0.8581313 0.6200706
## 5                 1.443572 1.540247                0.8091577 0.6179509
## 6                 1.503945 1.428939                0.8106961 0.5853845
##   Generosity Trust..Government.Corruption. Dystopia.Residual
## 1  0.3620122                     0.3159638          2.277027
## 2  0.3552805                     0.4007701          2.313707
## 3  0.4755402                     0.1535266          2.322715
## 4  0.2905493                     0.3670073          2.276716
## 5  0.2454828                     0.3826115          2.430182
## 6  0.4704898                     0.2826618          2.294804

Add Region Column (Python Code)

import pandas as pd 
from hdx.location.country import Country
df = pd.read_csv("2017.csv")
regions = []
for i in range(len(df)):
    country = df.Country[i]
    code = Country.get_iso3_country_code_fuzzy(country)[0]
    info = Country.get_country_info_from_iso3(code)
    region = info['Sub-region Name']
    regions.append(region)
df['Region'] = regions

Happy = read.csv("data/principal_component_analysis/data_2017.csv")
rownames(Happy) = Happy$Country
Happy = Happy[c("Region","Happiness.Score","Whisker.high","Whisker.low","Economy..GDP.per.Capita.","Family","Health..Life.Expectancy.","Freedom","Generosity","Trust..Government.Corruption.","Dystopia.Residual")]
attach(Happy)
head(Happy)

##                      Region Happiness.Score Whisker.high Whisker.low
## Norway      Northern Europe           7.537     7.594445    7.479556
## Denmark     Northern Europe           7.522     7.581728    7.462272
## Iceland     Northern Europe           7.504     7.622030    7.385970
## Switzerland  Western Europe           7.494     7.561772    7.426227
## Finland     Northern Europe           7.469     7.527542    7.410458
## Netherlands  Western Europe           7.377     7.427426    7.326574
##             Economy..GDP.per.Capita.   Family Health..Life.Expectancy.
## Norway                      1.616463 1.533524                0.7966665
## Denmark                     1.482383 1.551122                0.7925655
## Iceland                     1.480633 1.610574                0.8335521
## Switzerland                 1.564980 1.516912                0.8581313
## Finland                     1.443572 1.540247                0.8091577
## Netherlands                 1.503945 1.428939                0.8106961
##               Freedom Generosity Trust..Government.Corruption.
## Norway      0.6354226  0.3620122                     0.3159638
## Denmark     0.6260067  0.3552805                     0.4007701
## Iceland     0.6271626  0.4755402                     0.1535266
## Switzerland 0.6200706  0.2905493                     0.3670073
## Finland     0.6179509  0.2454828                     0.3826115
## Netherlands 0.5853845  0.4704898                     0.2826618
##             Dystopia.Residual
## Norway               2.277027
## Denmark              2.313707
## Iceland              2.322715
## Switzerland          2.276716
## Finland              2.430182
## Netherlands          2.294804

Descriptive statistics

dfSummary(Happy)

## Data Frame Summary   
## Happy     
## N: 155   
## --------------------------------------------------------------------------------------------------------------------------------------------------
## No   Variable                         Stats / Values                   Freqs (% of Valid)    Text Graph                         Valid    Missing  
## ---- -------------------------------- -------------------------------- --------------------- ---------------------------------- -------- ---------
## 1    Region                           1. Australia and New Zealand      2 ( 1.3%)                                               155      0        
##      [factor]                         2. Central Asia                   5 ( 3.2%)            I                                  (100%)   (0%)     
##                                       3. Eastern Asia                   6 ( 3.9%)            I                                                    
##                                       4. Eastern Europe                10 ( 6.5%)            II                                                   
##                                       5. Latin America and the Car     22 (14.2%)            IIII                                                 
##                                       6. Northern Africa                6 ( 3.9%)            I                                                    
##                                       7. Northern America               2 ( 1.3%)                                                                 
##                                       8. Northern Europe               10 ( 6.5%)            II                                                   
##                                       9. South-eastern Asia             8 ( 5.2%)            I                                                    
##                                       10. Southern Asia                 8 ( 5.2%)            I                                                    
##                                       [ 4 others ]                     76 (49.0%)            IIIIIIIIIIIIIIII                                     
## 
## 2    Happiness.Score                  mean (sd) : 5.35 (1.13)          151 distinct values             : .                      155      0        
##      [numeric]                        min < med < max :                                            .   : :                      (100%)   (0%)     
##                                       2.69 < 5.28 < 7.54                                         . : : : : : .                                    
##                                       IQR (CV) : 1.6 (0.21)                                    : : : : : : : : :                                  
##                                                                                              . : : : : : : : : :                                  
## 
## 3    Whisker.high                     mean (sd) : 5.45 (1.12)          155 distinct values             : .                      155      0        
##      [numeric]                        min < med < max :                                                : :                      (100%)   (0%)     
##                                       2.86 < 5.37 < 7.62                                         . : : : : . .                                    
##                                       IQR (CV) : 1.59 (0.21)                                   : : : : : : : : :                                  
##                                                                                              . : : : : : : : : :                                  
## 
## 4    Whisker.low                      mean (sd) : 5.26 (1.15)          155 distinct values             : .                      155      0        
##      [numeric]                        min < med < max :                                            .   : :                      (100%)   (0%)     
##                                       2.52 < 5.19 < 7.48                                         . : : : : . .                                    
##                                       IQR (CV) : 1.63 (0.22)                                   . : : : : : : : :                                  
##                                                                                              . : : : : : : : : :                                  
## 
## 5    Economy..GDP.per.Capita.         mean (sd) : 0.98 (0.42)          155 distinct values             .   :                    155      0        
##      [numeric]                        min < med < max :                                                : : :                    (100%)   (0%)     
##                                       0 < 1.06 < 1.87                                              . : : : :                                      
##                                       IQR (CV) : 0.65 (0.43)                                   : : : : : : : .                                    
##                                                                                              : : : : : : : : : .                                  
## 
## 6    Family                           mean (sd) : 1.19 (0.29)          155 distinct values               : .                    155      0        
##      [numeric]                        min < med < max :                                                  : :                    (100%)   (0%)     
##                                       0 < 1.25 < 1.61                                                  : : :                                      
##                                       IQR (CV) : 0.37 (0.24)                                         : : : :                                      
##                                                                                                  . : : : : :                                      
## 
## 7    Health..Life.Expectancy.         mean (sd) : 0.55 (0.24)          155 distinct values               :                      155      0        
##      [numeric]                        min < med < max :                                                  :   :                  (100%)   (0%)     
##                                       0 < 0.61 < 0.95                                                  . : : :                                    
##                                       IQR (CV) : 0.35 (0.43)                                     . : . : : : :                                    
##                                                                                              : : : : : : : : : :                                  
## 
## 8    Freedom                          mean (sd) : 0.41 (0.15)          155 distinct values           :                          155      0        
##      [numeric]                        min < med < max :                                              : .                        (100%)   (0%)     
##                                       0 < 0.44 < 0.66                                            . . : :                                          
##                                       IQR (CV) : 0.21 (0.37)                                     : : : : .                                        
##                                                                                              . : : : : : :                                        
## 
## 9    Generosity                       mean (sd) : 0.25 (0.13)          155 distinct values       :                              155      0        
##      [numeric]                        min < med < max :                                        . :                              (100%)   (0%)     
##                                       0 < 0.23 < 0.84                                          : : .                                              
##                                       IQR (CV) : 0.17 (0.55)                                 : : : : .                                            
##                                                                                              : : : : : .                                          
## 
## 10   Trust..Government.Corruption.    mean (sd) : 0.12 (0.1)           155 distinct values     :                                155      0        
##      [numeric]                        min < med < max :                                        :                                (100%)   (0%)     
##                                       0 < 0.09 < 0.46                                        : : :                                                
##                                       IQR (CV) : 0.1 (0.83)                                  : : : .                                              
##                                                                                              : : : : . : :   . .                                  
## 
## 11   Dystopia.Residual                mean (sd) : 1.85 (0.5)           155 distinct values         :                            155      0        
##      [numeric]                        min < med < max :                                            :                            (100%)   (0%)     
##                                       0.38 < 1.83 < 3.12                                           : :                                            
##                                       IQR (CV) : 0.55 (0.27)                                     . : :                                            
##                                                                                                . : : : :                                          
## ---------------------------------------------------------------------------------------------------------------------------------

Boxplot

ggplot(Happy, aes(Region,Happiness.Score))+
  geom_boxplot(width=0.4) +
  ylab('Happiness Score') + 
  xlab('Regions') 

Boxplot

Scree Plot

The elbow effect appears from the 1st component. This indicates that the projection of our dataset on the first factorial plan (1,2) captures most of the information and allows us to better describe our data (We added the 2nd component to better visualize the analysis on a plan).
```
pca=PCA( X = Happy, scale.unit = T, 
       ncp = 3, quanti.sup = 2,quali.sup = 1,graph=F)
fviz_screeplot(pca, ncp = 10)
```

Individuals cloud

We notice the 1st axis captures 52.9% of the information.

cos2 = rowSums(pca$ind$cos2[ ,1:2] )
fviz_pca_ind(pca, geom = "text" , col.ind = "cos2" ) + scale_color_gradient2( low = "grey" ,mid = "blue" , high = "red" , midpoint = median( cos2) , space = "Lab" )

Individuals cloud

We notice that the countries of Central-Europe and Eastern-Europe and the countries of Latin-America are very close to the center. So they have an indicator of happiness close to the average.

fviz_pca_ind(pca, geom = "text" , habillage = Region, addEllipses = TRUE,ellipse.level =0.95)

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

Individuals pca

Contribution of the variables in the construction of the first axis.

“Whisker.high”, “Whisker.low”, “Economy..GDP.per.Capita.”, “Health..Life.Expectancy.” and “Family” contribute the most in the construction of the first axis.

round(sort(pca$var$contrib[,1]),3)

##                    Generosity             Dystopia.Residual 
##                         0.725                         1.695 
## Trust..Government.Corruption.                       Freedom 
##                         5.492                         8.534 
##                        Family      Health..Life.Expectancy. 
##                        13.283                        14.573 
##      Economy..GDP.per.Capita.                  Whisker.high 
##                        15.699                        19.952 
##                   Whisker.low 
##                        20.046

fviz_pca_contrib(pca, choice = "var", axes = 1)

## Warning in fviz_pca_contrib(pca, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib()
## which can handle outputs of PCA, CA and MCA functions.

Contribution first axis

Contribution of the variables in the construction of the second axis.

“Generosity”,“Trust..Government.Corruption.” ,“Dystopia.Residual” and “Freedom” contribute the most in the construction of the second axis.

round(sort(pca$var$contrib[,2]),3)

##                        Family                   Whisker.low 
##                         0.908                         1.123 
##                  Whisker.high      Health..Life.Expectancy. 
##                         1.176                         1.249 
##      Economy..GDP.per.Capita.                       Freedom 
##                         1.473                        14.220 
##             Dystopia.Residual Trust..Government.Corruption. 
##                        17.092                        22.417 
##                    Generosity 
##                        40.343

fviz_pca_contrib(pca, choice = "var", axes = 2)

## Warning in fviz_pca_contrib(pca, choice = "var", axes = 2): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib()
## which can handle outputs of PCA, CA and MCA functions.

Contribution second axis

Correlation circle for variables

From the circle of correlation, we notice that all the active variables are well represented.
- The first main component is predominant, it summarizes 52.9% of the total inertia. “Whisker.high”, “Whisker.low”, “Economy..GDP.per.Capita.”, “Health..Life.Expectancy.” and “Family” are strongly correlated with the first axis. They are also positively correlated with “Happiness.Score”. This allows us to conclude that these variables mainly affect the indicator of the happiness of the countries.
- The second component is relatively important because it summarizes 15.7% of the total inertia.“Generosity”,“Trust..Government.Corruption.” ,“Dystopia.Residual” and “Freedom” are positively correlated with the second axis while the variable “Dystopia.Residual” is negatively correlated with the second axis.

fviz_pca_var(pca, geom = c( "text" , "arrow" ) , col.var = "cos2" )

Correlation circle for variables

Biplot

This graph shows that:
- The countries of ‘Latin America and Caribbean’ show more generosity and less corruption. According to the variables cloud according to the regions these countries are close to the average happiness score.
- The countries of ‘Western Europe’ and ‘North America’ have the highest scores of happiness thanks to the social, economic and health indicators.

fviz_pca_biplot(pca, label = "var" , habillage = Region, addEllipses = TRUE,ellipse.level = 0.95)

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

Biplot

Conclusion

Indicators that largely affect a country’s ranking are :
- ‘technical’ based primarily on the economy, health and politics.
- ‘moral’ based essentially on ethics and human values.