JOBS GUIDE Issue 6 Global Jobs Indicators Database (JOIN) Manual: Methodology and Quality Checks April 2021 GLOBAL JOBS INDICATORS DATABASE (JOIN) MANUAL: METHODOLOGY AND QUALITY CHECKS Michael Weber and Jörg Langbein Global Jobs Indicator Database (JOIN): Manual on Methodology and Quality Checks1 April 22, 2021 Abstract The Global Jobs Indicators Database (JOIN) provides information on labor market outcomes from countries across all income groups with a focus on low- and middle-income countries. The sources are in most cases Labor Force Surveys (LFSs), but other types of household surveys that include labor market information are also added. The information on the different labor market outcomes is disaggregated by gender, urban or rural area, age group of worker, and education level. All indicators are derived from a World Bank repository of harmonized household surveys. The indicators can be subsumed into four topics: sociodemographics, labor force and employment status, employment by sector and occupation, and labor market outcomes, including earnings. To ensure data reliability, a series of quality checks to both the indicators and the micro-data at the cross-sectional survey level and at the survey time-series level are conducted. Results are, among others, corroborated using statistics provided by the International Labour Organization (ILO) or the World Bank’s World Development Indicators (WDI) as well as through outlier detection and consistency checks. As a result, JOIN provides only indicators for surveys that surpassed a quality check threshold. JOIN contains about 1,430 household surveys conducted in 160 countries. A JOIN benchmarking tool enables further customization options including interactions of key indicators for more granular analysis. This tool allows users to compare the labor market in their country of interest to a series of up to 10 other countries. Together, JOIN, the JOIN benchmarking tool, and the entire Jobs Diagnostic toolkit enable a thorough analysis of the labor market supply side at both a nationally aggregated level and the micro level. 1 Authors: Jörg Langbein and Michael Weber. For further information, please email jobsccsa@worldbank.org. We would like to thank Elena Casanovas for the development of the quality checks that are included in JOIN. We are also grateful for helpful suggestions and improvements on the manuscript from Mario Gronert, Hild Rygnestad, and Sidiki Soubeiga. 2 Contents 1. Background........................................................................................................................ 4 2. Constructing the Global Jobs Indicator Database ..................................................................... 4 3. Checking Survey Data Quality ............................................................................................... 5 4. Filtering Survey Data ........................................................................................................... 8 5. The JOIN Benchmarking Tool ...............................................................................................11 6. Using JOIN as a Stepping-stone to Generate Labor Market Insights ...........................................12 Annex A...................................................................................................................................13 A. General Definition of Terms .............................................................................................13 B. Variable Definitions ........................................................................................................16 C. JOIN Data Quality Checks .................................................................................................21 D. Summary of Surveys and Countries Included in JOIN............................................................24 3 1. Background The Global Jobs Indicators Database (JOIN) provides easily accessible and standardized labor market indicators. Better labor policy requires an evidence-based approach involving high-quality data. JOIN’s indicators can be easily accessed through a central data repository and are commonly used in Jobs Diagnostics, which allows better insights into the labor market across many countries. Indicators of labor force participation, sectoral patterns of employment, occupations, and wages are derived from a harmonized set of household surveys and labor force surveys covered in the International Income Distribution Database (I2D2) or the Global Labor Database (GLD). JOIN covers more than 160 countries, particularly in low- and middle-income groups. The current version of JOIN is built using 2,150 surveys totaling about 228 million observations. These surveys come from 164 countries of which 74 percent are low- and middle-income countries. 2 Yet, 45 percent of the total surveys are from high-income countries, given that these countries conduct the surveys in shorter intervals. All household surveys included have a labor module with every fourth survey being a labor force survey (LFS) (27.3 percent). Additional types of surveys included are, for example, the Household Income and Expenditure Surveys (10.2 percent) and the Living Standard and Measurement Surveys. More than 100 indicators are constructed for each survey and further disaggregated by subgroups such as gender, area, education, or age. The jobs indicators can be classified into six broad categories: information on the data set, sociodemographic, labor force and employment, sectors and occupations, working hours, and wages, as well as information on education. All indicators are weighted to be nationally representative. Besides reporting for the overall population, the indicators are also reported for the following subgroups: young (15–24 years) and old (25–64 years) workers, urban population and rural population, females, and males, as well as lower educated (primary education and below) and higher educated (above primary education). 3 This distinguishes the database explicitly from other databases where aggregated labor market indicators are only available at a national level. 2. Constructing the Global Jobs Indicator Database First, a repository of harmonized household surveys is collected for the construction of the database. In a collaborative effort, the I2D2 team in the World Bank harmonized a set of household surveys and stored them in the datalibweb database. Although the surveys cover different household survey types, all contain a range of variables that can be harmonized for each individual. Examples are labor force status, age, and employment status. In a second step, the indicators are constructed for each survey. There are 105 indicators constructed within the following sections: • Data description, for example, country, survey type, and year of survey • Sociodemographic, for example, share of urban population or workers in population • Labor force and employment, for example, labor force participation rate, share of wage workers, or share of informal workers 2 See Annex A, part D, for a list of all included surveys and countries. 3 See Annex A, part A, for the definition of the indicators. 4 • Sectors and occupations, for example, share of workers in agricultural sector, share of craft workers • Wages and working hours, for example, average working hours, share of underemployed workers, and median earnings • Education, for example, share of employees with primary education. The construction of the indicators follows the definitions that are described in Annex A. Additional steps that had to be taken for some indicators are described in the following: • For the wage and earnings information, hourly as well as monthly median wages for wage workers are reported. First, hourly wages are calculated using the number of hours worked in the last week as reported by the workers in the survey. The calculation of the monthly wage may vary across surveys, but it is typically reported for the last month. Second, minimum and maximum thresholds for wages are processed by winsorization. This process assigns wages of the 100th percentile the value of the wages of the 99th percentile and wages of the 0 percentile that of the 1st percentile. Such a procedure removes outliers. Third, the wages are deflated to the 2011 national currency level. This is done using the consumer price index as reported in the World Development Indicators (WDI) database and allows for a national comparison over time. Fourth, the wages are adjusted for purchasing power parity (PPP) using the PPP conversion factor for private consumption as reported in the WDI. • For informality, the following definition is applied: an informally employed worker can be an unpaid worker, an employer without social security, a wage worker without social security, or a self-employed worker without social security. Note that in some cases, information on social security is missing. This is explicitly reported as a category in the indicators itself and feeds in prominently in the quality checks. In addition to the informality variable itself, the share of workers with social security, the share of workers with health insurance, and the share of workers with a contract as additional proxies for informality are reported. • The dependency values —such as youth and old-age dependency—and female labor force participation are only reported for the whole sample but not disaggregated by the subgroups. 3. Checking Survey Data Quality Data quality checks are conducted on each survey separately and on groups of surveys. Different tests are performed on each survey to detect differences with other data sources, internal coherence of the indicators, realistic values of the data, and missing values in the micro-data. Across years of the survey, different types of outliers are inspected. To raise a flag, a comparison of the result of the test to the distribution of results in the database is made, and the results are grouped into five categories, depending on the gravity of the quality issue. In total, 115 quality checks are conducted, each flagging data quality from no or few problems to potentially significant issues in the data. 5 The checks cover two distinct topics of data quality: internal and external validity. The quality checks examine three questions:4 (a) How does JOIN compare with external data sources? For a subset of 78 JOIN indicators, a direct comparison with the International Labor Organization (ILO) and WDI is feasible. This external check helps identify indicators that depart substantially from the external data source. In this context, it is important to highlight that JOIN only provides data for years in which surveys were conducted. The ILO, in contrast, models labor market indicators for years without surveys. Comparing wage employment from JOIN and the ILO shows that the majority of values are similar: a plot of indicator values from the two databases puts most along the diagonal, that is, they are similar if not the same (Figure 1, left side above). In a next step, a score is assigned for each of the JOIN surveys. The score depends on the distance from the diagonal, weighted by the standard deviation. The greater the distance of a survey from the diagonal, the higher is the score, ranging from 0 to 4. A value of 0 indicates no or hardly any difference between JOIN and ILO estimates, while 4 indicates a significant difference (Figure 1, right side above). 4 See Annex C for further information. 6 Figure 1: Example for the ILO and JOIN comparisons for wage and agriculture employees, including penalization value Source: JOIN and ILO. (b) Are the indicators internally consistent? Internal consistency is checked in three different ways: • Some indicators, such as unemployment rate, can be defined as a combination of other indicators. For example, for female labor force participation, it would be the share of female employed workers plus the share of female unemployed workers. For constructing the flag, the pointwise distance between the original indicator and the recalculated indicator is measured, before standardizing the distance and flagging the corresponding survey. The flags can be from 0 to 4 and depend on the standard deviation of the distance of the indicator to the recalculated indicator. Figure 2 is an example of such a check in case of the employment rate. • In addition, it is checked whether all indicators in shares sum up to 1. For example, the share of unemployed, employed, and inactive workers should add up to 1. 7 • Conversions of incomes into US dollars with PPP are redone using the nominal, national income provided in JOIN. Flags are assigned depending on the difference against the values calculated in JOIN. Figure 2: Consistency checks Source: JOIN. (c) Are the indicators consistent over time? This check considers each series of data in JOIN and flags observations (surveys) that depart substantially from the trend. A series is an indicator available over a number of years, for a particular country. To detect outliers, the series is regressed linearly on the time variable, then the studentized (jackknifed) residuals are calculated. Depending on the value of the studentized residual, a flag from 0 to 4 is assigned. Finally, it is important to note that before publication of the surveys and the constructed indicators, the results of the above data quality checks are used for a final filtering process, as described below. 4. Filtering Survey Data The results of the data quality checks are used for filtering the surveys before their publication. Overall, 2,185 surveys are included in JOIN. Most of the surveys are from Latin America (Figure 3, a ), and in most cases, there is more than one survey per country (Figure 3, b). In 76 year and country combinations, there are more than one survey per country and year and in most cases several surveys per year. To provide a coherent data series, only one survey per year is included for each country. For this, each survey receives indicator quality scores based on the number of flags. 5 In years with several surveys per country, 6 the survey with the better quality is chosen. 5 A data quality score is depicted in a number of flags. On the process of how the flags are calculated as well as technical details on all filtering steps, see Annex C. 6 This is also referred to as a ‘ duplicate ’ survey, although it is not the same survey but several surveys in the same year and country. 8 Figure 3: Coverage of surveys (a, left) and average number of surveys in JOIN (b, right) a. b. Source: JOIN. Out of 421 duplicate cases, 345 surveys were removed because these were conducted in the same years as other, better-scored surveys (Figure 4). Simply using the number of flags as a quality score in each duplicate takes care of the largest share of duplicates. More explicitly, 345 surveys are removed when looking at duplicates in a year and country and selecting the one with the better quality score, that is, less number of flags (Figure 4). The remaining 76 surveys are removed due to the identical duplicates in a second step. From the remaining surveys, 76 received the same quality score. One explanation is that those surveys are identical duplicates. This could be, for example, due to the integration of I2D2 surveys into the global monitoring database (GMD). Sometimes, both surveys are stored in datalibweb. In the third step, 306 surveys that do not pass a quality threshold are removed from the database for further inspection. In this step, the number of flags are added, and surveys from the database that do not pass a quality threshold are removed. To determine the quality threshold, the distribution of the flags is observed and then set at 90 percent for both the core indicator flags and the overall indicator flags. In other words, if the survey is either above the 90th percent for the core indicators or above the 90th percent for the flags, the survey is dropped. In the fourth step, an outlier detection approach removes another 28 surveys and leaves 1,430 surveys in the database. For each country, the overall mean for each indicator is calculated and then set in relation to the indicator in each survey. Surveys with outliers (3 or higher standard deviations) in core variables, that is, population size, labor force size, sector composition, share of urbanization, share of working age, employment rate, wage employment, and working hours are excluded. 7 7 See Annex D for an overview of all countries and the respective survey starting and ending. 9 Figure 4: Data quality filtering process for the surveys in JOIN As an example, the cases of Ethiopia and Malawi are presented to show how the four steps in the algorithm ensure better and more reliable data quality. Before the data quality filtering process is applied, there are 30 data sets spanning the years 1995–2016 for Ethiopia. In most cases, there is more than one survey per year included, and the surveys are from different types. They include an LFS, a welfare monitoring survey, household income survey, consumption and expenditure survey, or regular urban employment and unemployment surveys. Most of the surveys do not meet all the requirements for the inclusion in JOIN. For example, the urban employment and unemployment surveys are not nationally representative. The algorithm picks up on those, and eventually the data set for Ethiopia is automatically reduced to the filtered surveys only. The result is a smooth development of sectoral shares (Figure 5). A different example is the one for Malawi, where one outlier can clearly be detected visually. After filtering, the outlier is dropped and the trend becomes comparable (Figure 6). 10 Figure 5: Example of Ethiopia Source: JOIN. Figure 6: Example of Malawi Source: JOIN. Surveys that are filtered out because of quality checks are further scrutinized. Surveys that do not pass a threshold enter a separate inspection. It is often unclear where issues are on the results and, hence, a close inspection is warranted for each respective survey. In this inspection, the harmonization procedure for the surveys is scrutinized from the text in the questionnaire to the harmonization coding. Depending on the outcome of the inspection, the survey may be reintroduced into JOIN in a next update. 5. The JOIN Benchmarking Tool An Excel-based JOIN benchmarking tool gives users the option to customize their own analysis while respecting data-sharing agreements. The tool is based on disaggregated data from JOIN. Users can choose the country they want to inspect, the timeline, and the survey types. Several predefined pivot tables are used to nest the data by several subgroups including gender, urban/rural, age groups, employment status, employment types, sectors, occupations, and education level. To enable further analyses, users can customize these pivot tables, for example, by changing the subgroups of the data set 11 they want to look at from urban versus rural to education level by gender. 8 Although the level of analysis is more granular than in JOIN, the sampling strategy in the original survey may limit the representativity of the results. Therefore, users need to be careful with the interpretation as many surveys are only representative at a large level of analysis. It should be noted that there is no violation in the data-sharing agreements, and privacy is kept at a high standard. The JOIN benchmarking toolkit follows the guided enquiry while allowing comparisons with other countries. The guided enquiry was developed by the jobs group and gives an overview of five steps to follow when conducting a Jobs Diagnostic. JOIN provides information on all those five steps. 9 With the JOIN benchmarking tool, users can further customize the standard outputs from the guided enquiry while comparing their chosen country against up to 10 other countries. 6. Using JOIN as a Stepping-stone to Generate Labor Market Insights While JOIN provides first insights into labor markets, other Jobs Diagnostic tools complement the results to gain deeper insights. JOIN allows a first snapshot of labor markets that can be used for proposals or concept notes or simply for gaining an understanding of jobs-related challenges in a country. It can be complemented with the JOIN benchmarking tool that allows for comparisons between indicators and preselected countries. Others that want to dig deeper could make use of the Jobs Diagnostic labor market supply-side tool. 10 It comes as a STATA package and yields a predefined selection of indicators, over time comparisons and regression results based on the original surveys. Overall, the global jobs indicators database (JOIN) together with the JOIN benchmarking and labor market supply-side tools provides comprehensive information on a wide range of analytical questions to better understand labor markets using household survey data. 8 An instruction video can be found here: https://datatopics.worldbank.org/jobsdiagnostics/jobs-tools.html. 9 The Jobs Diagnostic step-by-step guide (guided enquiry) can be found here: http://documents1.worldbank.org/curated/en/411811583268288485/pdf/Jobs-Diagnostics-A-Step-by-Step-Guide.pdf. 10 This tool can be found at the Jobs Group Diagnostic Tool homepage: https://datatopics.worldbank.org/JobsDiagnostics/jobs- tools.html. 12 Annex A A. General Definition of Terms Age dependency: Following World Development Indicators (WDI), age dependency ratio is the ratio of dependents, for example, people younger than 15 or older than 64, to the working-age population ages 15–64. Note that dependency ratios capture variations in the proportions of children, elderly people, and working-age people in the population that imply the dependency burden that the working-age population bears in relation to children and the elderly. But, dependency ratios show only the age composition of a population, not economic dependency. Some children and elderly people are part of the labor force, and many working-age people are not. Area: The area of the household is either urban or rural. It is important to clarify that this is not based on a unified definition but rather on each country’s understanding of the term. That is, a person classified as urban dweller in country A may be a rural dweller in country B. Labor force: All persons are considered active in the labor force if they presently have a job (formal or informal, that is, are employed) or do not have a job but are actively seeking work (that is, unemployed). See below for a definition of employment and unemployment. Unemployment: A person is defined as unemployed if he or she is presently not working but was available for a job in the previous week and is seeking a job. The formal ILO definition of unemployed includes , in addition to availability and seeking a job, the factor to be able to accept a job. This question was asked in a minority of surveys and is, thus, not incorporated in the present definition. In line with the ILO, a person presently not working but waiting to start a new job is considered to be unemployed. Wage employment: Following International Classification of Status in Employment (ICSE-93), a paid employee includes anyone whose basic remuneration is not directly dependent on the revenue of the unit he or she works for, typically remunerated by wages and salaries but may be paid for piece work or in kind. Contrary to ICSE-93, continuous employment is not used as additional criteria since data are often absent and due to country specificity. Unpaid: Following ICSE-93, unpaid workers include family workers and self-employment jobs in a market- oriented establishment. The establishment is operated by a person living in the same household. This person cannot be regarded as a partner at a level comparable to that of the head of the establishment because of the person’s degree of commitment to the operations of the establishment in terms of working time or other factors. Employer: Following ICSE-93, an employer is a business owner (whether alone or in partnership) with employees on a continuous basis. If the only people working in the business are the owner and contributing family workers, the person is not considered an employer (as he/she has no employees) and is instead classified as self-employed. Self-employment: Following ICSE-93, own account or self-employment includes jobs where remuneration is directly dependent on the goods and services produced (where home consumption is considered to be part of the profits) and any permanent employees are not engaged to work for them on a continuous basis during the reference period. Contrary to ICSE-93, members of producers’ cooperatives are not a category of their own but regarded as self-employed. 13 Informal employment: Informal employment is defined as wage employment without social security or a contract. Formal employment, on the contrary, is defined as wage-employed worker with either social security or a contract. Sectors [reduced sectors]: The codes for the main job are given here based on the United Nations (UN) International Standard Industrial Classification (ISIC) (revision 3.1, ISIC-3.1). In the case of different classifications (former Soviet Union republics, for example), recoding has been done to best match the ISIC-3.1 codes. Values in square brackets indicate the composed categories Agriculture, Industry, and Services. The main categories subsume the following codes: • Agriculture, Hunting, Fishing (ISIC 01-05) [Agriculture] • Mining (ISIC 10-14) [Industry] • Manufacturing (ISIC 15-37) [Industry] • Electricity and Utilities (ISIC 40-41) [Industry] • Construction (ISIC 45) [Industry] • Commerce (ISIC 50-55) [Services] • Transportation, Storage and Communication (ISIC 60-64) [Services] • Financial, Insurance and Real Estate (ISIC 65-74) [Services] • Services: Public Administration (ISIC 75) [Services] • Other Services (ISIC 80-99) and unspecified categories or items [Services]. Occupation: Classifies the main job of any individual and is missing otherwise. As most surveys collected detailed information and then coded it, and the original data are not in the databases, no attempt has been made to correct or check the original coding. The classification is based on the International Standard Classification of Occupations (ISCO) 88. In the case of different classifications , recoding has been done to best match ISCO-88. Underemployment: Underemployment is defined as a situation when the hours of work of a person are insufficient in relation to an alternative employment situation in which the person is willing and available to engage and work less than 35 hours per week. If due to data restriction it is not always clear if the person wants to engage in additional work, 35 hours of work per week is taken as a criterion. Excessive working hours: This follows ILO definitions for excessive working hours. Most countries have statutory limits of weekly working hours of 48 hours or less, and the hours actually worked per week in most countries are less than the 48-hour standard established in ILO conventions. These limits serve to promote higher productivity while safeguarding workers’ physical and mental health. Wages/earnings: Earnings are reported for wage workers only. The standard output reports median earnings, although all figures give additional mean earnings and earnings for all workers. Earnings are all reported in nominal values and deflated using the consumer price index and PPP adjusted using the PPP conversion factor as reported in the WDI. Additionally, earnings are disaggregated by industry sector. To address issues with outliers, reported earnings are also winsorized (at the 0 to 1 and 100 to 99 percentiles). 14 Female to male gender wage gaps: The female to male gender wage gap provides the ratio of female median wages to male median wages. It is calculated for the group of wage workers only. Public to private wage gap: The public to private wage gap provides the ratio of public median wages to private median wages. It is calculated for the group of wage workers only. Education: The variable is country specific as not all countries require the same number of school years to complete a given level. Primary completed implies that one completed the stipulated primary education by undertaking an exam or test, where this exists. Otherwise, education refers to having completed the highest grade in this level of education. Post-secondary complete refers to teachers’ colleges and one- or two-year programs of technical nature and include university education level. University education level refers to any higher education after successfully completing secondary level of education regardless of whether this was completed. This includes university and graduate studies. 15 B. Variable Definitions Variable name Short definition Countryname Name of the respective country. Country Code ISO – 3 country code. Region World Bank region. Sub-sample Determines the sub-group used for the analysis. The following sub-groups exist: Urban shows the results for only the urban population, rural shows the results for the rural population, youth shows the results for the young population between 15 to 24 years, old shows the results for the adults aged between 25 to 64 years, male shows the results for the male population and female shows the results for the female population. Low educated shows the results for those with primary education or less and high educated shows the results for everyone who obtained a higher education. Income Level Name Puts the country in one of the four World Bank income level classifications: High income, upper middle income, lower middle income, and low income. Income Level Code Abbreviations of the income level classification. Year of Survey Start year of survey. Survey Data Source Identifies the sample from which the results are drawn. Survey Type Identifies the survey type, for example a Labor Force Survey (LFS). This can give some indication on the quality of the survey. Total flags in survey Score generated by the aggregate number of flags the survey has raised, across all checks. A flag in a single check will have a value of 1 to 4. Percentile (100=max), total Percentile determined by the “Total flags in survey” of the survey. flags Share of flags Defined by the “Total flags in survey” divided over the four times the number of checks performed to that survey. That is, total flags obtained over maximum number of flags. Percentile (100=max), Percentile determined by the “Flags over max number of flags” of the survey. share of flags Total Population Total number of inhabitants in the country. Children, aged 0-14 Share of children within the total population that are aged between 0 to 14 years. The shares of children, youth, adult and elderly must add up to 100 percent. Youth, aged 15-24 Share of youth within the total population that are aged between 15 to 24 years. The shares of children, youth, adult and elderly must add up to 100 percent. Adult, aged 25-64 Share of adults within the total population that are aged between 15 to 24 years. The shares of children, youth, adult and elderly must add up to 100 percent. Elderly, aged 65+ Share of elderly within the total population that are older than 65 years. The shares of children, youth, adult and elderly must add up to 100 percent. Urban Population Share of individuals within the total population that are living in urban areas. Working age population, Share of individuals within the population that are in working age, defined as aged aged 15-64 between 15 to 64 years. Dependency rate, all Share of individuals younger than 15 years or older than 64 years compared to compared to 15-64 individuals in working age (15-64 years), calculated only for the total sample and not sub-groups. 16 Variable name Short definition Youth Dependency Rate, Share of individuals younger than 15 years compared to individuals in working age younger than 15 compared (15-64 years), calculated only for the total sample and not sub-groups. to 15-64 Old Age Dependency Rate, Share of individuals older than 64 years compared to individuals in working age (15- older than 64 compared to 64 years), calculated only for the total sample and not sub-groups. 15-64 Labor Force, aged 15-64 Number of individuals in the labor force and in working age (15-64 years). Labor Force Participation Share of individuals within all individuals in working age participating in the labor Rate, aged 15-64 force. Female Labor Force Share of female individuals within all female individuals in working age participating Participation Rate, aged 15- in the labor force. 64 Not in labor force or Share of young individuals within young individuals aged 15 to 24 that are neither education rate among participating in the labor force nor are in education. youth, aged 15-24 Employment to Population Share of employed individuals within working age (15-64). Ratio, aged 15-64 Share of workers (aged 15- Share of workers in employment within working-age that had more than one job in 64) with more than one the last week. jobs in last week Employment rate, aged 15- Share of employed individuals participating in the active labor force in working age 64 (15-64). Must add to 100 percent with share of unemployed individuals participating in the active labor force in working age (15-64). Unemployment rate, aged Share of unemployed individuals participating in the active labor force in working 15-64 age (15-64). Must add to 100 percent with share of employed individuals participating in the active labor force in working age (15-64). Youth employment rate, Share of employed young individuals participating in the active labor force aged 15- aged 15-24 24. Must add to 100 percent with share of employed young individuals participating in the active labor force aged 15-24. Youth unemployment rate, Share of unemployed young individuals participating in the active labor force aged aged 15-24 15-24. Must add to 100 percent with share of employed young individuals participating in the active labor force aged 15-24. Wage employees, aged 15- Share of wage employed within employed individuals in working age (15-64). Must 64 add to 100 percent with self-employed, unpaid and employers. Self-employed, aged 15-64 Share of self-employed within employed individuals in working age (15-64). Must add to 100 percent with wage employees, unpaid and employers. Unpaid, aged 15-64 Share of unpaid individuals within employed individuals in working age (15-64). Must add to 100 percent with wage employees, self-employed and employers. Employers, aged 15-64 Share of employers within employed individuals in working age (15-64). Must add to 100 percent with wage employees, self-employed and unpaid. Unpaid or self-employed, Share of unpaid or self-employed individuals within employed individuals in working age 15-64 age (15-64). Informal jobs, aged 15-64 Share of employed individuals working in an informal job in working age. Share of work contract, Share of employed individuals with a work contract in working age. aged 15-64 17 Variable name Short definition Share of health insurance, Share of employed individuals with a health insurance in working age. aged 15-64 Share of social security, Share of employed individuals with social security in working age. aged 15-64 Public sector employment, Share of employed individuals that work in the public sector in working age. aged 15-64 Agriculture, aged 15-64 Share of employed individuals working in agriculture aged 15-64. Industry, aged 15-64 Share of employed individuals working in industry sector aged 15-64. Services, aged 15-64 Share of employed individuals working in service sector aged 15-64. Female in non-agricultural Within the group of employed women in working age (15-64) the share working in employment, aged 15-64 non-agricultural employment. Youth in non-agricultural Within the group of employed youth aged 15-24 the share working in non- employment, aged 15-64 agricultural employment. Mining, aged 15-64 Share of employed individuals working in the mining sector, aged 15-64. Manufacturing, aged 15-64 Share of employed individuals working in the manufacturing sector, aged 15-64. Public utilities, aged 15-64 Share of employed individuals working in public utilities sector, aged 15-64. Construction, aged 15-64 Share of employed individuals working in the construction sector, aged 15-64. Commerce, aged 15-64 Share of employed individuals working in the commerce sector, aged 15-64. Transport & Share of employed individuals working in the Transport & Communication sector, Communication, aged 15- aged 15-64. 64 Financial and Business Share of employed individuals working in the Financial and Business Services sector, Services, aged 15-64 aged 15-64. Public Administration, aged Share of employed individuals working in the Public Administration sector, aged 15 - 15-64 64. Other services, aged 15-64 Share of employed individuals working in the Other services sector, aged 15-64. Senior Officials, aged 15-64 Share of employed individuals working in the Senior Officials occupation group, aged 15-64. Professionals, aged 15-64 Share of employed individuals working in the Professionals occupation group, aged 15-64. Technicians, aged 15-64 Share of employed individuals working in the Technicians occupation group, aged 15-64. Clerks, aged 15-64 Share of employed individuals working in the Clerks occupation group, aged 15-64. Service and Market Sales, Share of employed individuals working in the Service and Market Sales occupation aged 15-64 group, aged 15-64. Skilled Agriculture, aged 15- Share of employed individuals working in the Skilled Agriculture occupation group, 64 aged 15-64. Craft workers, aged 15-64 Share of employed individuals working in the craft workers occupation group, aged 15-64. Machine Operators, aged Share of employed individuals working in the Machine Operators occupation group, 15-64 aged 15-64. Elementary occupations, Share of employed individuals working in the Elementary Occupations occupation aged 15-64 group, aged 15-64. 18 Variable name Short definition Armed Forces, aged 15-64 Share of employed individuals working in the Armed Forces occupation group, aged 15-64. Average weekly working Mean of working hours for employed individuals aged 15-64. hours Underemployment, <35 Share of employed individuals aged 15-64 with working hours less than 35 hours per hours per week week. Excessive working Share of employed individuals aged 15-64 with working hours higher than 48 hours hours,>48 hours per week per week. Median earnings for wage Median earnings for wage workers per hour aged 15-64 reported in local currency workers per hour, local values. Currency values are typically reported for the year when the survey was currency conducted. Median earnings for wage Median earnings for wage workers per month aged 15-64 reported in local currency workers per month, local values. Currency values are typically reported for the year when the survey was currency conducted. Median earnings for wage Median earnings in the agricultural sector for wage workers per month aged 15-64 workers per month in reported in local currency values. Currency values are typically reported for the year agriculture, local currency when the survey was conducted. Median earnings for wage Median earnings in the industry sector for wage workers per month aged 15-64 workers per month in reported in local currency values. Currency values are typically reported for the year industry, local currency when the survey was conducted. Median earnings for wage Median earnings in the service sector for wage workers per month aged 15-64 workers per month in reported in local currency values. Currency values are typically reported for the year service, local currency when the survey was conducted. Median earnings for wage Median earnings for wage workers per hour aged 15-64, inflation corrected to 2011 workers per hour, deflated values. to 2011 Median earnings for wage Median earnings for wage workers per hour aged 15-64, inflation corrected to 2011 workers per hour, deflated values and adjusted for purchasing power parity using WDI provided exchange to 2011 and PPP adjusted values. Median earnings for wage Median earnings for wage workers per month aged 15-64, inflation corrected to workers per month, 2011 values. deflated to 2011 Median earnings for wage Median earnings for wage workers per month aged 15-64, inflation corrected to workers per month, 2011 values and adjusted for purchasing power parity using WDI provided exchange deflated to 2011 and PPP values. adjusted Female to Male gender Female to Male gender wage gap for wage workers aged 15-64. Reports the ratio of wage gap female wage workers earnings to male wage workers earnings. Public to Private wage gap Public to Private wage gap for wage workers aged 15-64. Reports the ratio of public wage workers earnings to private wage workers earnings. No education Share of individuals that have no education. Primary education Share of individuals that have passed primary education levels but no higher education levels. Secondary education Share of individuals that have passed secondary education levels but no higher education levels. 19 Variable name Short definition Post-secondary education Share of individuals that have passed post-secondary education levels but no higher education levels. Non-Agricultural Share of informal workers in non-agricultural sectors among all workers in non- Informality, aged 15-64 agricultural sectors. Non-Agricultural wage Share of wage workers in non-agricultural sector employment within all workers in employment, aged 15-64 employment. Non-Agricultural unpaid Share of unpaid workers in non-agricultural sector employment within all workers in employment, aged 15-64 employment. Non-Agricultural employer, Share of employers in non-agricultural sector employment within all workers in aged 15-64 employment. Non-Agricultural self- Share of self-employed workers in non-agricultural sector employment within all employed, aged 15-64 workers in employment. Youth Non-Agricultural Share of young wage workers, aged 15-24, in non-agricultural sector employment wage employment, aged within all young workers, aged 15-24, in employment. 15-24 Wage employment in Share of wage workers in agriculture, aged 15-64, within all workers that work in agriculture, aged 15-64 the agricultural sector. Wage employment in Share of wage workers in industry, aged 15-64, within all workers that work in the industry, aged 15-64 industry sector. For the definition of the industry sector see the general definitions of terms. Wage employment in Share of wage workers in services, aged 15-64, within all workers that work in the services, aged 15-64 service sector. For the definition of the service sector see the general definitions of the term sector [services]. Mean number of years of Average number of completed years in formal education for all individuals that are education completed, aged aged 17 or older. 17 and older Enrollment rate for Share of young individuals, aged 6-16, currently enrolled in school. individuals aged 6-16 Mean age of worker aged Average age of all workers between 15 and 64. 15-64 Mean age of worker in Average age of all workers in agriculture between 15 and 64. agriculture, aged 15-64 Mean age of worker in Average age of all workers, aged 15-64, in the industry sector. industry, aged 15-64 Mean age of worker in Average age of all workers, aged 15-64, in the services sector. services, aged 15-64 Mean age of wage worker, Average age of wage workers, aged 15-64. aged 15-64 Mean age of self-employed Average age of self-employed or unpaid, aged 15-64. or unpaid, aged 15-64 Mean age of employer, Average age of employer, aged 15-64. aged 15-64 20 C. JOIN Data Quality Checks To ensure the reliability of the data, a series of automatized quality checks are applied to both the indicators and the micro-data, at the cross-sectional survey level and at the survey time-series level. Results are, for example, corroborated using statis tics provided by the ILO or the World Bank’s WDI as well as newly developed techniques. In total, 115 quality checks are conducted, each of which may result in a flag indicating a potential issue in the data. The checks are applied both to individual surveys and to series of surveys. In the individual surveys, differences with other data sources, internal coherence of the indicators, realistic values of the data, and missing values in the micro-data are tested for. At the survey time-series level, different types of outlier checks are conducted. The quality check raises a flag following a comparison between the result of the test with the distribution of values in the database. The flag can take a value on a scale between 0 and 4. The available checks can be divided into the following categories: (a) Comparison to external data sources Our first check consists of comparing JOIN data to data from other sources, such as WDI or ILO. As reported in the comparison module analytics file (details below), there are 97 indicators in JOIN, 78 of which have been attempted to match with variables from external sources. Compare JOIN to other sources with a three-step process: (i) Generate an overview comparing the two sets of variables. (ii) Produce a graphical representation of the bivariate relationship of the two variables . (iii) Conduct the check itself, flagging observations that depart substantially from the external data. As part of the check, histograms show the distribution of residuals upon the base of the check. (b) Derivations The next check deals with indicators that can be defined as a combination of other indicators included in JOIN. The goal in this check is to flag the surveys where an indicator differs from its definition, in terms of other indicators included in JOIN. Thus, this check seeks to detect inconsistencies between indicators included in JOIN and their definition. Three outputs come out of this check: • Scatterplots presenting the bivariate relationship between indicators and their definition in terms of other JOIN indicators • Histograms of the standardized absolute value of the distribution of distances between indicators and their definition • The check itself, that is, a data file with flags on a scale from 0 to 4 indicating whether there is a potential issue with the indicator. 21 The methodology followed to produce this check is the following: • Measure the pointwise distance between the original indicator and the recreated indicator and take the absolute value of the resulting vector. Notice that some indicators are systematically under/overestimated, and therefore, taking the absolute value will not modify these vectors of distances. • Standardize the distances, dividing by their standard deviations. • Finally, flag the surveys that present a large discrepancy between the indicator and its definition, for each tested indicator. To do this, follow the same rule as in the comparison checks (previous section): o Surveys with distance equal to or greater than to 4 standard deviations are marked with a 4. o Surveys with distance between 3 and 4 (including 3) are marked with a 3. o Surveys with distance between 2 and 3 (including 2) are marked with a 2. o Surveys with distance between 1 and 2 (including 1) are marked with a 1. o Surveys with distance smaller than standard deviation of 1 between JOIN and the external source variable are not considered outliers and thus marked with a 0. o Additionally, surveys with an absolute (not standardized) distance below 0.02 will be unflagged, overwriting the previous rules. This will avoid flagging surveys that belong to a distribution with very minimal errors and thus an insignificant standard deviation (for instance, Working age and Employment Rate). (c) Price Conversions The purpose of this check is to double-check these conversions and detect potential issues in the translation from one currency and time unit to another one. To test the conversions, the indicator is manually calculated and compared to the JOIN indicator. The JOIN indicator is flagged when the comparison returns a significant difference. The difference between the manually computed indicators (y) and the JOIN indicators (x) is calculated in the following way: • Calculate the relative difference between both indicators [(x−y)/y] • Cap the distance at 1, that is, values greater than 1 are set equal to 1. This is to preserve symmetry between over and underestimations and avoid standard deviations that are uninformatively large. • Standardize the distribution of differences. 22 • Flag large distances with values from 1 to 4, following the same rule as in previous checks: o Surveys with a score equal to or greater that 4 standard deviations are flagged with a 4. o Surveys with a score between 3 and 4 (including 3) are flagged with a 3. o Surveys with a score between 2 and 3 (including 2) are flagged with a 2. o Surveys with a score between 1 and 2 (including 1) are flagged with a 1. o Surveys with a score below 1 are not flagged. o If the relative (unstandardized) distance between both indicators is below 0.1, the survey is unflagged regardless of the number of standard deviations that is less than 0.1 equivalent. (d) Outliers This check considers each series of data in JOIN and flags observations (surveys) that depart substantially from the trend. A series is an indicator in a collection of years, for a particular country. For instance, the population in Burkina Faso, for all years available in JOIN. To detect outliers, a linear regression is conducted on the series on the time variable, calculating the studentized (jackknifed) residuals for the STATA command predict. This measure is the most useful for these purposes, after testing polynomial regressions and other metrics to calculate the prediction errors. Once having a measure for the residuals, the following rule to flag each survey and indicator on a scale from 1 to 4 is applied: • If the absolute value of the studentized residual is greater than 15, flag it with a 4. • If the absolute value of the studentized residual is between 8 and 15 (including 8), flag it with a 3. • If the absolute value of the studentized residual is between 5 and 8 (including 5), flag it with a 2. • If the absolute value of the studentized residual is between 3 and 5 (including 3), flag it with a 1. • If the absolute value of the studentized residual is less than 3, the survey is not flagged. (e) Shares This check tests whether groups of indicators calculated as shares of a total do sum up to 1. 23 D. Summary of Surveys and Countries Included in JOIN First survey Last survey Country name Region Income-level name year year Afghanistan SA Low income 2003 2013 Albania ECA Upper middle income 2002 2012 Angola SSA Lower middle income 2000 2008 Argentina LAC High income 1980 2014 Armenia ECA Upper middle income 2007 2018 Australia EAP High income 2001 2015 Austria ECA High income 1995 2016 Azerbaijan ECA Upper middle income 2011 2015 Bahamas, The LAC High income 2001 2001 Bangladesh SA Lower middle income 1999 2013 Barbados LAC High income 1996 1996 Belarus ECA Upper middle income 1999 2016 Belgium ECA High income 1992 2016 Belize LAC Upper middle income 1993 1999 Benin SSA Low income 2007 2015 Bhutan SA Lower middle income 2003 2017 Bolivia LAC Lower middle income 1992 2015 Bosnia and Herzegovina ECA Upper middle income 2001 2007 Botswana SSA Upper middle income 2002 2015 Brazil LAC Upper middle income 1981 2015 Bulgaria ECA Upper middle income 2001 2016 Burkina Faso SSA Low income 1994 2009 Burundi SSA Low income 1998 2013 Cabo Verde SSA Lower middle income 2000 2007 Cambodia EAP Lower middle income 1997 2012 Cameroon SSA Lower middle income 2001 2014 Canada NA High income 1981 2001 Central African Republic SSA Low income 2008 2008 Chad SSA Low income 2003 2011 Chile LAC High income 1987 2017 China EAP Upper middle income 2002 2013 Colombia LAC Upper middle income 1996 2017 Comoros SSA Low income 2004 2013 Congo, Dem. Rep. SSA Low income 2004 2012 Congo, Rep. SSA Lower middle income 2005 2011 Costa Rica LAC Upper middle income 1989 2015 Côte d'Ivoire SSA Lower middle income 2002 2002 Croatia ECA High income 2011 2016 Cyprus ECA High income 2000 2016 Czech Republic ECA High income 1997 2016 Denmark ECA High income 1992 2016 24 First survey Last survey Country name Region Income-level name year year Djibouti MNA Lower middle income 1996 2012 Dominican Republic LAC Upper middle income 1996 2015 Ecuador LAC Upper middle income 1994 2015 Egypt, Arab Rep. MNA Lower middle income 1988 2005 El Salvador LAC Lower middle income 1991 2014 Estonia ECA High income 1998 2016 Eswatini SSA Lower middle income 2000 2016 Ethiopia SSA Low income 1995 2015 Fiji EAP Upper middle income 2008 2008 Finland ECA High income 1995 2016 France ECA High income 1992 2016 Gabon SSA Upper middle income 2005 2017 Gambia, The SSA Low income 1998 2015 Georgia ECA Lower middle income 1998 2013 Germany ECA High income 2002 2011 Ghana SSA Lower middle income 2005 2012 Greece ECA High income 1992 2016 Guatemala LAC Upper middle income 2000 2011 Guinea SSA Low income 2002 2012 Guinea-Bissau SSA Low income 1993 2010 Guyana LAC Upper middle income 1992 1992 Haiti LAC Low income 2001 2012 Honduras LAC Lower middle income 1991 2016 Hungary ECA High income 1996 2016 Iceland ECA High income 2004 2015 India SA Lower middle income 1983 2011 Indonesia EAP Lower middle income 1990 2014 Iran, Islamic Rep. MNA Upper middle income 2016 2016 Iraq MNA Upper middle income 2006 2006 Ireland ECA High income 1992 2016 Italy ECA High income 1992 2016 Jamaica LAC Upper middle income 1990 1999 Jordan MNA Upper middle income 2002 2016 Kazakhstan ECA Upper middle income 2013 2017 Kenya SSA Lower middle income 1997 2005 Kiribati EAP Lower middle income 2006 2006 Korea, Rep. EAP High income 2001 2017 Kosovo ECA Lower middle income 2003 2014 Kyrgyz Republic ECA Lower middle income 2002 2017 Lao PDR EAP Lower middle income 2002 2007 Latvia ECA High income 1998 2016 Lesotho SSA Lower middle income 2018 2018 Liberia SSA Low income 2007 2016 25 First survey Last survey Country name Region Income-level name year year Lithuania ECA High income 1998 2016 Luxembourg ECA High income 1992 2016 Madagascar SSA Low income 1993 2012 Malawi SSA Low income 1997 2016 Malaysia EAP Upper middle income 2007 2009 Maldives SA Upper middle income 1998 2016 Mali SSA Low income 1994 2010 Malta MNA High income 2009 2016 Marshall Islands EAP Upper middle income 1999 1999 Mauritania SSA Lower middle income 2000 2014 Mauritius SSA Upper middle income 2006 2012 Mexico LAC Upper middle income 1992 2012 Micronesia, Fed. Sts. EAP Lower middle income 2000 2013 Moldova ECA Lower middle income 2000 2017 Mongolia EAP Lower middle income 2002 2014 Montenegro ECA Upper middle income 2002 2010 Morocco MNA Lower middle income 1991 2013 Mozambique SSA Low income 1996 2014 Myanmar EAP Lower middle income 2005 2010 Namibia SSA Upper middle income 2009 2015 Nepal SA Low income 1995 2014 Netherlands ECA High income 1992 2016 Nicaragua LAC Lower middle income 1993 2014 Niger SSA Low income 1995 2014 Nigeria SSA Lower middle income 2003 2012 North Macedonia ECA Upper middle income 1999 2017 Norway ECA High income 2004 2016 Pakistan SA Lower middle income 1992 2014 Palau EAP High income 2000 2000 Panama LAC High income 1989 2015 Papua New Guinea EAP Lower middle income 1996 2009 Paraguay LAC Upper middle income 1997 2017 Peru LAC Upper middle income 1997 2015 Philippines EAP Lower middle income 1997 2014 Poland ECA High income 1998 2016 Portugal ECA High income 1992 2016 Puerto Rico LAC High income 1970 2005 Romania ECA Upper middle income 1999 2013 Russian Federation ECA Upper middle income 2005 2016 Rwanda SSA Low income 2000 2016 São Tomé and Príncipe SSA Lower middle income 2000 2017 Senegal SSA Low income 2005 2011 Serbia ECA Upper middle income 2004 2016 26 First survey Last survey Country name Region Income-level name year year Seychelles SSA High income 2006 2013 Sierra Leone SSA Low income 2003 2014 Slovak Republic ECA High income 1998 2016 Slovenia ECA High income 1997 2016 Solomon Islands EAP Lower middle income 2005 2013 Somalia SSA Low income 2013 2013 South Africa SSA Upper middle income 1995 2017 South Sudan SSA Low income 2009 2009 Spain ECA High income 1992 2016 Sri Lanka SA Lower middle income 1992 2016 Sudan SSA Lower middle income 2009 2014 Sweden ECA High income 2004 2016 Switzerland ECA High income 2012 2015 Tajikistan ECA Low income 2013 2013 Tanzania SSA Low income 2000 2014 Thailand EAP Upper middle income 1977 2011 Timor-Leste EAP Lower middle income 2001 2010 Togo SSA Low income 2006 2011 Tonga EAP Upper middle income 1996 2009 Trinidad and Tobago LAC High income 1980 2000 Tunisia MNA Lower middle income 1997 2015 Turkey ECA Upper middle income 2000 2014 Tuvalu EAP Upper middle income 2010 2010 Uganda SSA Low income 1992 2016 Ukraine ECA Lower middle income 2004 2013 United Kingdom ECA High income 1992 2016 United States NA High income 1970 2018 Uruguay LAC High income 1989 2015 Uzbekistan ECA Lower middle income 2000 2003 Vanuatu EAP Lower middle income 2010 2010 Venezuela, RB LAC Upper middle income 1989 2006 Vietnam EAP Lower middle income 1992 2016 Yemen, Rep. MNA Low income 1998 2005 Zambia SSA Lower middle income 1998 2014 Zimbabwe SSA Low income 2001 2017 Note: EAP abbreviates East Asia and Pacific, ECA stands for Europe and Central Asia, LAC covers Latin America and Caribbean, MNA is Middle East and North Africa, NA North Africa, SA South Asia and SSA represents Sub -Saharan Africa. A list of the countries included in the regions can be found here. 27 Address: 1776 G St, NW, Washington, DC 20006 Website: http://www.worldbank.org/en/topic/jobsanddevelopment Twitter: @WBG_Jobs Blog: https://blogs.worldbank.org/jobs/