The Convergence of Sovereign Environmental, Social and Governance Ratings

This paper studies sovereign environmental, social, and governance (ESG) ratings from the qualitative and quantitative angles. First, it introduces the landscape for sovereign ESG ratings. Second, it provides a comparison with the history of credit ratings, factoring in that ESG ratings are in an early development stage. Third, the paper reviews different actors, key issues, including taxonomy, models and data from different providers. The paper provides a qualitative assessment of the convergence of ratings among providers by introducing a factor attribution method, that maps all providers' ratings into a common taxonomy defined by the United Nations-supported Principles for Responsible Investment (UNPRI). Then, a quantitative analysis of the convergence is performed by regressing the scores on variables from the World Bank sovereign ESG database. A noticeable contribution to the literature is a high level of explanatory power of these variables across all rating methodologies, with a R2 ranging between 0.78 and 0.98. An analysis of the importance of variables using a lasso regression exhibits the preponderance of the governance factor and the limited role of demographic shifts for all providers.


Introduction
Environmental, social and governance (ESG) indicators are becoming more integrated into assessments of the risk exposure for sovereign and corporate entities, as a complement to standard financial indicators. The number of data providers has increased over the last decade, and each of them has been developing a proprietary methodology to provide ESG ratings to investors 1 .
The standardization of data and methodologies has been an ongoing discussion among practitioners, either as a source of diversity or confusion (Berg, Kölbel and Rigobon, 2019).
Financial markets and investors have been accustomed to the use of credit ratings that conveniently summarize the impact of all financial information (and governance to some extent) in one score reflecting the probability of default. However, ESG factors have an impact at both macroeconomic (Nordhaus, 1977) and microeconomic levels, especially during times of crisis (Lins and Servaes, 2017;Dietz, Gollier and Kessler, 2018). In response to the growing appetite of the financial sector for sovereign ESG ratings, the data source, the taxonomy and models have been evolving, as demonstrated by the recent change in their sovereign models by leading ESG data providers (e.g. MSCI from 2016 to 2019, and Sustainalytics in 2019).
Whether these changes have led to a larger consensus among ratings is still an open question.
The level of correlations between ESG scores for corporates has been demonstrated to be low, probably between 0.4 and 0.6 (Bender et al, 2018; Berg, Kölbel and Rigobon, 2019), however should we expect higher correlation for sovereign ESG scores due to closer methodologies among providers? Moreover, due to the specific nature of governments, it is reasonable to assess the link between credit ratings (probability of default) and ESG ratings.
The development of credit rating agencies and scores during the last century provides an interesting comparison. The convergence of credit ratings has been supported by regulation, industry investments in applied research and the development of standardized data. The recent acquisition of ESG specialized firms by major rating agencies has sent a strong signal from the financial industry in terms of willingness to invest resources and capital -e.g. Beyond Ratings by London Stock Exchange Group (LSEG) in 2019, and Vigeo Eiris by Moody's in 2019. One can legitimately ask if the ESG rating industry will follow a similar path as the credit rating one and if ESG rating scores will become mainstream. Regulation may play a similar role as it did for credit, noting that initiatives from supervising entities have started but without legal enforcement at this stage (NGFS, 2019; BIS, 2020).
The next section provides a comparison with the history of credit ratings, factoring in that ESG ratings are in an early development stage. The third section reviews sovereign ESG rating characteristics from different providers (MSCI, Sustainalytics, Beyond Ratings, RepRisk), and identifies the possible sources of discrepancies between ratings. Specifically, scores could diverge because of differences in scope, taxonomy, peer groups, or indicators. Then, we study the quantitative convergence using the World Bank sovereign ESG database. The understanding of discrepancies and bias is key for investors to ensure that the chosen ratings are aligned with their 'ethical' utility function. The fourth section concludes.

Credit versus ESG rating: An historical perspective
Credit rating agencies play an important economic role: they support the debt market in providing reliable information about the risks of default of an issuer or a type of debt. The perceived risk determines the yield of the financial instrument, and the ratings participate in this pricing mechanism. Do ESG ratings play a similar role? Calling them 'ratings' may imply that ESG scores could be complementary in assessing both financial and non-financial risks.
Interestingly, ESG ratings may be combined with credit ratings to measure sovereign risk exposure and viability of the debt.
In this context, a review of the characteristics of credit rating activity could be useful to define a path for responsible investing scoring. The history of credit agencies started in the early 1900s and went through many steps before becoming mainstream and central for financial markets.
As of today, the global credit rating industry is highly concentrated, with three agencies -S&P The beginning of the century has been characterized by an increase of the regulatory pressure e.g. EU directives, such as the Capital Requirements Directive of 2006, Credit Rating Agency Reform Act of 2006, and the 2008 global financial crisis. However, even before regulatory changes that have been impacting CRA, their business practices and their disclosure requirements, they were already largely used by the financial markets. In a similar way, ESG rating agencies could take a central role in financial markets even before significant regulatory changes.
Four dimensions can be used to compare credit and ESG indicators: (i) the level of discrepancy between scores or ratings, (ii) the methodology and associated bias, (iii) the facility to replicate indicators and their sensitivity to business cycles, and (iv) the existence of an asset pricing factor e.g. credit or ESG factors. The literature on these topics has been extensive for credit ratings (Powell, 2013).
Firstly, one of the most common critics regarding ESG ratings is their lack of convergence. However, the earlier stages of development of CRA exhibit differences. For example, while the correlation is quite high between sovereign ratings, the top three credit agencies were in disagreement about 50% of the time for a 10-year historical analysis (Powell and Martinez, 2008), even if the difference is usually limited to one notch. Moreover, convergence has been evolving through time. We analyzed cross-sectional and time-series correlation between the top three CRA using annual data from 1949 to 2019 for 163 countries ( Table 1). The average crosssectional correlations are higher than the average time-series correlations over the full period.
The average cross-countries correlation of Fitch with Moody's and S&P are significantly higher than the one between Moody's and S&P. Given that Fitch data are only available since 1994, it would tend to support that the correlation was lower at the beginning of the sample and higher after 1994 (Figure 1). A closer look at the correlation between Moody's and S&P before 1994 exhibits a structural break in 1983, that could reflect both a significant discrepancy and the impact of having a small number of countries rated ( Figure 2). Moreover, time-series correlation over the full period is high (about 80%) but does not demonstrate a full convergence. We obtain similar results if we look at the ratings of the first six countries rated by two agencies 4 Electronic copy available at: https://ssrn.com/abstract=3807618  (Appendix A.2).

Discrepancies and convergence of ESG data
The number of ESG ratings providers has been increasing. Some actors seem to have taken the lead, new comers have an increasing popularity, while some acquisitions have also impacted the trend. We observe three groups: • The big players: all major rating agencies have either acquired or launched an ESG data activity and many major financial players have positioned themselves on ESG e.g. S&Ps, Moody's, Bloomberg, FTSE Russell, MSCI, Thomson Reuters, Morningstar.
• The ESG specialists with focus: data providers that focus on one or more aspects of ESG, but not all three e.g. Carbon Disclosure Project (CDP) for climate change and water, Trucost for environmental risks, ISS for governance.
All providers acquire a large quantity of data to cover the several dimensions of an ESG rating. The core data for sovereigns are usually sourced from both public and private databases, including international institutions e.g. the World Bank 2 , the International Monetary Fund, the World Health Organization, the Food and Agriculture Organization, etc. Sovereign data generally come from more robust and standardized sources than corporates. Most providers have leveraged on big data and machine learning tools.

Data set and taxonomy
Our qualitative analysis includes MSCI, Sustainalytics, FTSE/Beyond Ratings, Vigeo-Eiris and RepRisk 3 . Using Google trends on a set of five ESG rating providers, we could observe changes in popularity ( Figure 3). The characteristics of our dataset are summarized in Table 2. For the qualitative assessment of the five ESG rating providers, we used both public and private data, and contacted them to confirm our understanding using a questionnaire listing six characteristics identified during the process of comparison: the source of data (questionnaire or public data), the type of analysis (quantitative or qualitative), the objective of the rating (risk exposure or impact), the aggregation methodology (equally weighted, materiality based e.g. SASB 4 , model based), the inclusion of controversies, and the type of output (absolute or relative) -see Appendix A.4. The results are summarized 5,6 in Table 3.
The intersection of the four providers used for the quantitative analysis contains 139 countries for 2017. Figure 4 represents the set of rated countries or local authorities for each provider.   The cross-sectional correlations have been estimated between the four selected providers (Table 4). The correlations have been computed using normalized data ('z-scored'). The first key difference with corporate ratings is the high level of correlation, ranging from 0.72 between RepRisk and MSCI to 0.95 between Beyond Ratings and Sustainalytics. RepRisk has the lowest correlation with other providers, probably due to its specific approach. A first conclusion is that sovereigns exhibit significantly higher correlated ESG ratings than corporates.  Table 5. All the coefficients are positive with their sum close or equal to one.
13  The stepwise regression is performed using OLS and AIC as criterion for variable selection.
The ridge regression penalizes the residual sum of squares as follows; with y i the ESG rating for country i, x ij the j th explanatory variable for country i, β 0 the constant, β j the loading for the j th explanatory variable, N the number of observations, K the number of explanatory variables and λ the parameter that controls the degree of shrinkage.
Alternatively the lasso parameters are obtained through a different penalization method (L 1 for lasso versus L 2 for ridge): Finally, another penalty function called elastic-net offers a mix between ridge and lasso methods: with α controlling the weight between the L 1 and L 2 penalty functions. The principal component regression (pcr) consists in regressing the ys on the orthogonal factors obtained through principal component analysis. Table 7 reports the R 2 values of the test data for the five models: stepwise, principal components, ridge, lasso and elastic-net regressions. The R 2 are impressively high, ranging between 0.78 and 0.98. The explanatory power is the highest for the score from Beyond Ratings, followed by Sustainalytics and MSCI. Reprisk exhibits the lowest explanatory power. Table 8 details the estimated hyperparameters for the four models: principal components, ridge, lasso and elastic-net regressions. The parameter λ is the regularization penalty and was selected using a 10-fold cross validation. Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. In k-fold cross-validation, the original sample is randomly partitioned into k subsamples of equal size. A single subsample is retained as the validation data for testing the model, while the k − 1 remaining subsamples constitute the training data. The cross-validation process is then repeated k times with each of the k subsamples used exactly once as the validation data. A value of λ close to 0 indicates that model is very close to OLS, while a large value corresponds to a higher penalization of coefficients' size. It is interesting to note that the λs of the lasso regression are close to 0 which confirms a proximity to the linear regression and, in particular for Sustainalytics and Beyond Ratings.    Importance of a pillar. Since all variables have been normalized, we can interpret the coefficients of the lasso regression as the elasticity of our rating with respect to the explanatory variable. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease. The coefficient value indicates how much the mean of the dependent variable changes given a one-unit shift in the independent variable while holding other variables in the model constant. These elasticities can be positive or negative and to assess their weight we use their absolute value, which we will call the importance of the variable.
The importance of a pillar P is defined as I P = m P n P |β i |/ 32 1 |β i | where i is the index of the coefficients of the regression and n P and m P the ranks of the initial and last coefficients related to pillar P (that can take the values Environment, Social or Governance). The analysis of this importance (Table 10) shows the preponderance of the governance factor for all providers.
While it reaches 49% for Sustainalytics, it represents 79% for MSCI. The environment is the least represented pillar: from 7% for MSCI to 22% for Sustainalytics. Table 10: Importance of each pillar in the methodology of four providers. The importance of a pillar P is defined as I P = m P n P |β i |/ 32 1 |β i | where i is the index of the coefficients of the regression and n P and m P the ranks of the initial and last coefficients related to pillar P . The numbers are reported for the lasso regression. The specific case of Reprisk. There is a convergence in the sign (positive or negative) of the coefficients among all providers except for Reprisk. Reprisk has a negative coefficient for indicators like terrestrial and marine protected area or food production index and even mortality rate or the number of seats held by women in the parliament. This highlights the specificity of the Reprisk score. As explained on their website "the purpose of RepRisk's dataset is not to provide ESG ratings, but to systematically identify and assess material ESG risks." It is a tool focusing on controversies which gives Reprisk an unique position in this environment. The non-selected explanatory variables. Some of the variables are neither used or barely significant when looking at the results. It may be the result of collinearity with other factors or may highlight that they are not included at that stage in the ESG ratings: Fertility rate, total (births per woman), Net migration, Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total). The demographic shifts may be underestimated in the existing models, unless captured by other data.

Conclusion
The rapid growth of ESG integration requires to understand taxonomy and scores provided by the ESG data industry for sovereigns. The contribution of the paper to the literature is twofold. available for 68 countries, but reduces down to 64 when put in common with ratings. Figure 5 clearly shows a negative relationship: the higher the ESG score, the lower the spread. A spline interpolation may suggest a non-linear relationship for at least two providers.

A.2 ESG rating and Google Trends
This section reports the popularity of 'ESG' searches on Google since 2010 ( Figure 6). A rapid increase is observed after 2017, that is consistent with the perceived interest in ESG integration both from risk and opportunity perspectives. Google trends information was extracted at a more granular level, looking for 'ESG data', 'ESG scoring', 'ESG integration' and 'ESG rating' searches. The latter seems to be the most popular together with 'ESG data' (Figure 8).

A.5 UNPRI Taxonomy for Sovereign Debt
Environment Natural resources the availability and quality of biodiversity, water, air and soil; and land use (urban, agricultural and forests).
Physical risks the physical effects of climate change (such as weather volatility, sea-level rise) and natural disaster risks (volcanic eruptions and earthquakes).
Energy transition risk regulatory factors and technological developments associated with the global energy transition to a less carbon-intensive global economy Energy security the availability and management of (non)-renewable energy resources; and resource depletion.
Social Demographic change population trends; age distribution; and rates of immigration Education and human capital availability of and access to education; quality of educational attainment; and employment rights.
Living standards and income inequality respect for human rights (including the right to life, the right to freedom of association and the right to health); measures of poverty and income inequality; gender inequality; unemployment rates; public sector wages; availability of and access to healthcare, personal safety and housing; food security and obesity Social cohesion political freedom and representation; levels of trust in institutions and politicians; social inclusion and mobility; prevalence of civic organisations; degree of social order; and capacity of political institutions to respond to societal priorities.

Governance
Institutional Strength strength of institutional and regulatory frameworks; independence of institutions; quality and availability of public data; prevalence of corruption; rule of law; ease of doing business; and business climate.
Political stability political rights and civil liberties; political upheaval and violence in society; freedom of expression; press freedom; and freedom of information and speech.
Government effectiveness quality of bureaucracy and administration; policy planning and implementation capabilities; and independence of the civil service from political interference Regulatory effectiveness efficiency of regulatory systems and policy implementation; predictability of policy making; ease of doing business; and business climate. -Access to Services * Access to clean fuels and technologies for cooking (% of population) * Access to electricity (% of population) * People using safely managed drinking water services (% of population) * People using safely managed sanitation services (% of population) • Governance Pillar -Human Rights * Strength of legal rights index (0=weak to 12=strong) * Voice and Accountability: Estimate