WHR (2017): Principal Component Analysis


Introduction

The first World Happiness Report was published in April, 2012, in support of the UN High Level Meeting on happiness and well-being. Since then the world has come a long way. Increasingly, happiness is considered to be the proper measure of social progress and the goal of public policy. In June 2016 the OECD committed itself “to redefine the growth narrative to put people’s well-being at the center of governments’ efforts”. In February 2017, the United Arab Emirates held a full-day World Happiness meeting, as part of the World Government Summit. Now on World Happiness Day, March 20th, we launch the World Happiness Report 2017, once again back at the United Nations, again published by the Sustainable Development Solutions Network, and now supported by a generous three-year grant from the Ernesto Illy Foundation.

Objectif

Find a relationship between the variable “country” and other indicators to understand the factors of the general trend of happiness.

Import necessary packages

library(FactoMineR)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
library(ggplot2)
library(summarytools)

Import Data

Happy = read.csv("data/principal_component_analysis/2017.csv")
head(Happy)
##       Country Happiness.Rank Happiness.Score Whisker.high Whisker.low
## 1      Norway              1           7.537     7.594445    7.479556
## 2     Denmark              2           7.522     7.581728    7.462272
## 3     Iceland              3           7.504     7.622030    7.385970
## 4 Switzerland              4           7.494     7.561772    7.426227
## 5     Finland              5           7.469     7.527542    7.410458
## 6 Netherlands              6           7.377     7.427426    7.326574
##   Economy..GDP.per.Capita.   Family Health..Life.Expectancy.   Freedom
## 1                 1.616463 1.533524                0.7966665 0.6354226
## 2                 1.482383 1.551122                0.7925655 0.6260067
## 3                 1.480633 1.610574                0.8335521 0.6271626
## 4                 1.564980 1.516912                0.8581313 0.6200706
## 5                 1.443572 1.540247                0.8091577 0.6179509
## 6                 1.503945 1.428939                0.8106961 0.5853845
##   Generosity Trust..Government.Corruption. Dystopia.Residual
## 1  0.3620122                     0.3159638          2.277027
## 2  0.3552805                     0.4007701          2.313707
## 3  0.4755402                     0.1535266          2.322715
## 4  0.2905493                     0.3670073          2.276716
## 5  0.2454828                     0.3826115          2.430182
## 6  0.4704898                     0.2826618          2.294804

Add Region Column (Python Code)

import pandas as pd 
from hdx.location.country import Country
df = pd.read_csv("2017.csv")
regions = []
for i in range(len(df)):
    country = df.Country[i]
    code = Country.get_iso3_country_code_fuzzy(country)[0]
    info = Country.get_country_info_from_iso3(code)
    region = info['Sub-region Name']
    regions.append(region)
df['Region'] = regions
Happy = read.csv("data/principal_component_analysis/data_2017.csv")
rownames(Happy) = Happy$Country
Happy = Happy[c("Region","Happiness.Score","Whisker.high","Whisker.low","Economy..GDP.per.Capita.","Family","Health..Life.Expectancy.","Freedom","Generosity","Trust..Government.Corruption.","Dystopia.Residual")]
attach(Happy)
head(Happy)
##                      Region Happiness.Score Whisker.high Whisker.low
## Norway      Northern Europe           7.537     7.594445    7.479556
## Denmark     Northern Europe           7.522     7.581728    7.462272
## Iceland     Northern Europe           7.504     7.622030    7.385970
## Switzerland  Western Europe           7.494     7.561772    7.426227
## Finland     Northern Europe           7.469     7.527542    7.410458
## Netherlands  Western Europe           7.377     7.427426    7.326574
##             Economy..GDP.per.Capita.   Family Health..Life.Expectancy.
## Norway                      1.616463 1.533524                0.7966665
## Denmark                     1.482383 1.551122                0.7925655
## Iceland                     1.480633 1.610574                0.8335521
## Switzerland                 1.564980 1.516912                0.8581313
## Finland                     1.443572 1.540247                0.8091577
## Netherlands                 1.503945 1.428939                0.8106961
##               Freedom Generosity Trust..Government.Corruption.
## Norway      0.6354226  0.3620122                     0.3159638
## Denmark     0.6260067  0.3552805                     0.4007701
## Iceland     0.6271626  0.4755402                     0.1535266
## Switzerland 0.6200706  0.2905493                     0.3670073
## Finland     0.6179509  0.2454828                     0.3826115
## Netherlands 0.5853845  0.4704898                     0.2826618
##             Dystopia.Residual
## Norway               2.277027
## Denmark              2.313707
## Iceland              2.322715
## Switzerland          2.276716
## Finland              2.430182
## Netherlands          2.294804

Descriptive statistics

dfSummary(Happy)
## Data Frame Summary   
## Happy     
## N: 155   
## --------------------------------------------------------------------------------------------------------------------------------------------------
## No   Variable                         Stats / Values                   Freqs (% of Valid)    Text Graph                         Valid    Missing  
## ---- -------------------------------- -------------------------------- --------------------- ---------------------------------- -------- ---------
## 1    Region                           1. Australia and New Zealand      2 ( 1.3%)                                               155      0        
##      [factor]                         2. Central Asia                   5 ( 3.2%)            I                                  (100%)   (0%)     
##                                       3. Eastern Asia                   6 ( 3.9%)            I                                                    
##                                       4. Eastern Europe                10 ( 6.5%)            II                                                   
##                                       5. Latin America and the Car     22 (14.2%)            IIII                                                 
##                                       6. Northern Africa                6 ( 3.9%)            I                                                    
##                                       7. Northern America               2 ( 1.3%)                                                                 
##                                       8. Northern Europe               10 ( 6.5%)            II                                                   
##                                       9. South-eastern Asia             8 ( 5.2%)            I                                                    
##                                       10. Southern Asia                 8 ( 5.2%)            I                                                    
##                                       [ 4 others ]                     76 (49.0%)            IIIIIIIIIIIIIIII                                     
## 
## 2    Happiness.Score                  mean (sd) : 5.35 (1.13)          151 distinct values             : .                      155      0        
##      [numeric]                        min < med < max :                                            .   : :                      (100%)   (0%)     
##                                       2.69 < 5.28 < 7.54                                         . : : : : : .                                    
##                                       IQR (CV) : 1.6 (0.21)                                    : : : : : : : : :                                  
##                                                                                              . : : : : : : : : :                                  
## 
## 3    Whisker.high                     mean (sd) : 5.45 (1.12)          155 distinct values             : .                      155      0        
##      [numeric]                        min < med < max :                                                : :                      (100%)   (0%)     
##                                       2.86 < 5.37 < 7.62                                         . : : : : . .                                    
##                                       IQR (CV) : 1.59 (0.21)                                   : : : : : : : : :                                  
##                                                                                              . : : : : : : : : :                                  
## 
## 4    Whisker.low                      mean (sd) : 5.26 (1.15)          155 distinct values             : .                      155      0        
##      [numeric]                        min < med < max :                                            .   : :                      (100%)   (0%)     
##                                       2.52 < 5.19 < 7.48                                         . : : : : . .                                    
##                                       IQR (CV) : 1.63 (0.22)                                   . : : : : : : : :                                  
##                                                                                              . : : : : : : : : :                                  
## 
## 5    Economy..GDP.per.Capita.         mean (sd) : 0.98 (0.42)          155 distinct values             .   :                    155      0        
##      [numeric]                        min < med < max :                                                : : :                    (100%)   (0%)     
##                                       0 < 1.06 < 1.87                                              . : : : :                                      
##                                       IQR (CV) : 0.65 (0.43)                                   : : : : : : : .                                    
##                                                                                              : : : : : : : : : .                                  
## 
## 6    Family                           mean (sd) : 1.19 (0.29)          155 distinct values               : .                    155      0        
##      [numeric]                        min < med < max :                                                  : :                    (100%)   (0%)     
##                                       0 < 1.25 < 1.61                                                  : : :                                      
##                                       IQR (CV) : 0.37 (0.24)                                         : : : :                                      
##                                                                                                  . : : : : :                                      
## 
## 7    Health..Life.Expectancy.         mean (sd) : 0.55 (0.24)          155 distinct values               :                      155      0        
##      [numeric]                        min < med < max :                                                  :   :                  (100%)   (0%)     
##                                       0 < 0.61 < 0.95                                                  . : : :                                    
##                                       IQR (CV) : 0.35 (0.43)                                     . : . : : : :                                    
##                                                                                              : : : : : : : : : :                                  
## 
## 8    Freedom                          mean (sd) : 0.41 (0.15)          155 distinct values           :                          155      0        
##      [numeric]                        min < med < max :                                              : .                        (100%)   (0%)     
##                                       0 < 0.44 < 0.66                                            . . : :                                          
##                                       IQR (CV) : 0.21 (0.37)                                     : : : : .                                        
##                                                                                              . : : : : : :                                        
## 
## 9    Generosity                       mean (sd) : 0.25 (0.13)          155 distinct values       :                              155      0        
##      [numeric]                        min < med < max :                                        . :                              (100%)   (0%)     
##                                       0 < 0.23 < 0.84                                          : : .                                              
##                                       IQR (CV) : 0.17 (0.55)                                 : : : : .                                            
##                                                                                              : : : : : .                                          
## 
## 10   Trust..Government.Corruption.    mean (sd) : 0.12 (0.1)           155 distinct values     :                                155      0        
##      [numeric]                        min < med < max :                                        :                                (100%)   (0%)     
##                                       0 < 0.09 < 0.46                                        : : :                                                
##                                       IQR (CV) : 0.1 (0.83)                                  : : : .                                              
##                                                                                              : : : : . : :   . .                                  
## 
## 11   Dystopia.Residual                mean (sd) : 1.85 (0.5)           155 distinct values         :                            155      0        
##      [numeric]                        min < med < max :                                            :                            (100%)   (0%)     
##                                       0.38 < 1.83 < 3.12                                           : :                                            
##                                       IQR (CV) : 0.55 (0.27)                                     . : :                                            
##                                                                                                . : : : :                                          
## ---------------------------------------------------------------------------------------------------------------------------------

Boxplot

ggplot(Happy, aes(Region,Happiness.Score))+
  geom_boxplot(width=0.4) +
  ylab('Happiness Score') + 
  xlab('Regions') 

Boxplot

Scree Plot

Individuals cloud

cos2 = rowSums(pca$ind$cos2[ ,1:2] )
fviz_pca_ind(pca, geom = "text" , col.ind = "cos2" ) + scale_color_gradient2( low = "grey" ,mid = "blue" , high = "red" , midpoint = median( cos2) , space = "Lab" )

Individuals cloud

fviz_pca_ind(pca, geom = "text" , habillage = Region, addEllipses = TRUE,ellipse.level =0.95)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

Individuals pca

Contribution of the variables in the construction of the first axis.

round(sort(pca$var$contrib[,1]),3)
##                    Generosity             Dystopia.Residual 
##                         0.725                         1.695 
## Trust..Government.Corruption.                       Freedom 
##                         5.492                         8.534 
##                        Family      Health..Life.Expectancy. 
##                        13.283                        14.573 
##      Economy..GDP.per.Capita.                  Whisker.high 
##                        15.699                        19.952 
##                   Whisker.low 
##                        20.046
fviz_pca_contrib(pca, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(pca, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib()
## which can handle outputs of PCA, CA and MCA functions.

Contribution first axis

Contribution of the variables in the construction of the second axis.

round(sort(pca$var$contrib[,2]),3)
##                        Family                   Whisker.low 
##                         0.908                         1.123 
##                  Whisker.high      Health..Life.Expectancy. 
##                         1.176                         1.249 
##      Economy..GDP.per.Capita.                       Freedom 
##                         1.473                        14.220 
##             Dystopia.Residual Trust..Government.Corruption. 
##                        17.092                        22.417 
##                    Generosity 
##                        40.343
fviz_pca_contrib(pca, choice = "var", axes = 2)
## Warning in fviz_pca_contrib(pca, choice = "var", axes = 2): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib()
## which can handle outputs of PCA, CA and MCA functions.

Contribution second axis

Correlation circle for variables

fviz_pca_var(pca, geom = c( "text" , "arrow" ) , col.var = "cos2" )

Correlation circle for variables

Biplot

fviz_pca_biplot(pca, label = "var" , habillage = Region, addEllipses = TRUE,ellipse.level = 0.95)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

Biplot

Conclusion