Policy Research Working Paper 9956 Do Judges Favor Their Own Ethnicity and Gender? Evidence from Kenya Daniel Chen Jimmy Graham Manuel Ramos-Maqueda Shashank Singh Development Economics Development Impact Evaluation Group March 2022 Policy Research Working Paper 9956 Abstract Evidence from high-income countries suggests that judges points more likely to win if they share the judge’s ethnicity. often exhibit in-group bias, favoring litigants that share an The paper finds that the written judgements are on average identity with the judge. However, there is little evidence on shorter and less likely to be cited when defendants who are this phenomenon from the Global South. Collecting the of the same gender or ethnicity as the judge win their case. available universe of High Court decisions in Kenya, this This is consistent with in-group biased decisions being of paper leverages the random assignment of cases to judges lower quality. In addition, the findings show that female to evaluate the existence of in-group bias along gender and defendants are less likely to win the case if the judge exhibits ethnic lines. It finds that, relative to a baseline win rate of stereotypical or negative attitudes towards women in their 43 percent, defendants are 4 percentage points more likely writings. to win if they share the judge’s gender and 5 percentage This paper is a product of the Development Impact Evaluation Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at mramosmaqueda@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Do Judges Favor Their Own Ethnicity and Gender? Evidence from Kenya∗ Daniel Chen, Jimmy Graham, Manuel Ramos-Maqueda, and Shashank Singh† ∗ The authors are thankful to participants of the World Bank lightning seminar, Arianna Legovini, an anonymous reviewer, and DIME Analytics for their invaluable feedback. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. † Authors listed in alphabetical order. 1 Introduction Judges often exhibit bias in decision-making. One particular form of judicial bias documented in recent years is in-group bias, wherein judges are more likely to rule in favor of plaintiffs or defendants that share a certain identity with the judge (Shayo and Zussman 2011; Gazal-Ayal and Sulitzeanu-Kenan 2010; Knepper 2018; Sloan 2020). However, there are still many unknowns regarding the scope and determinants of judicial in-group bias. Indeed, the phenomenon has been studied in relatively few contexts and rarely in the Global South. Judicial bias in general, and in-group bias in particular, have far-reaching negative consequences. Deci- sions may be biased against groups that are marginalized, which can exacerbate existing inequalities. This is especially true for in-group bias since privileged groups may be more likely to represent a higher proportion of judges. Moreover, bias could undermine the effectiveness, legitimacy, and inclusivity of courts, which are widely recognized as a key component of a well-functioning economy (Rodrik 2000; Visaria 2009; Ponti- celli and Alencar 2016; World Bank 2017). In civil courts–which are the focus of this paper–in-group bias may negatively affect these marginalized groups’ rights over land property, business contracts, and family disputes, which have important effects over their economic rights and well-being.1 This paper aims to determine the extent and predictors of bias along gender and ethnic lines in judicial decisions in the higher courts in Kenya. The focus is on in-group bias but we also extend the analysis to gender bias more broadly. Kenya provides an ideal setting for studying judicial bias for several reasons. First, political groups in Kenya are sharply divided along ethnic lines (Asingo et al. 2018), which may increase ethnic bias in society. According to the most recent census, Kenya has over 40 ethnic groups, and the largest group (the Kikuyu) accounts for only about 17 percent of the population (KNBS 2019). Economic inequalities and political allegiance are distributed across regions and ethnicity, with political parties and coalitions created along clear ethnic lines (Asingo et al. 2018, Friedrich-Ebert-Stiftung 2012). In this context of diffuse ethnic groups and ethnic-based politics, ethnicity is a highly salient topic. Second, certain ethnic groups are underrepresented in the judiciary (see below), and there is a high degree of gender inequality across a number of dimensions, including representation in the judiciary and a variety of socioeconomic outcomes (IDLO 2020; UNDP 2020).2 If in-group bias is widespread, it may disproportionately harm these underrepresented groups. Third, there is an ongoing debate regarding the extent to which co-ethnic bias affects decision-making in the context of Africa generally and Kenya specifically (Berge et al. 2015). We employ several data sources to examine the extent and determinants of judicial bias in Kenya. Our main data source is the Kenyan Judiciary’s publicly available database for Superior Courts cases over the period 1976-2020.3 Our analysis focuses on cases in which there is both a human defendant and human litigant (i.e., cases for which neither litigant is an organization or representative of the state). As such, since the state is always the prosecutor in criminal cases, criminal cases are not included in our dataset. Most of the cases we examine are civil, land environment, and succession cases. By scraping the metadata associated with each case, we determined key variables, such as case type and names of judges and litigants. We also used additional data sources to determine the gender and ethnicity of participants. Furthermore, we used machine learning techniques to extract other key variables, such as the outcome of the case and the degree to which judges, by associating women with either negative or stereotypical traits, exhibit gender slant against women in their writing, which serve as textual proxies for implicit bias. To determine the causal effect of an in-group relationship between judges and litigants, we rely on the quasi-random assignment of cases to Kenyan judges. In Kenya, cases filed to a court are assigned to individual judges based on their existing caseload and the date of filing, which is orthogonal to any other characteristics of the case. Random assignment assures us that any relationship between in-group status and case outcomes is driven by bias rather than other factors, such as self-selection of judges to certain cases. To confirm this, we test for random assignment across gender and ethnic lines, and we show that male and female defendants and plaintiffs are equally likely to be assigned to a male or female judge. We also show that, overall, defendants and plaintiffs across all ethnicities are not more likely to be assigned judges from their ethnic group. 1 Throughout this paper, we use civil courts to refer to non-criminal courts. In our dataset, this includes civil cases, environment and land cases, succession cases, labor cases, and miscellaneous cases. 2 According to our data, over the past few decades, female judges have been in the majority for only about 37 percent of cases. 3 See http://kenyalaw.org/caselaw/. 2 Our main finding is that judges in Kenya display both gender and ethnic in-group bias towards defendants. Our results suggest that defendants are about 4 percentage points more likely to win if they share the judge’s gender and about 5 percentage points more likely to win if they share the judge’s ethnicity relative to a baseline win rate of 43%. We also investigate whether and how attitudes toward gender affect decision-making of judges. In par- ticular, we examine whether judges that exhibit stereotypical or negative gender attitudes in their written judgements are more likely to display gender bias in the direction of their decisions. Previous research has shown that attitudes toward social groups are highly predictive of judgments and choices (Bertrand, Chugh, and Mullainathan 2005). Some judicial decisions have raised questions when, for example, a judge closed a sex trafficking case of a female minor because “drinking beer with the clients is not tiring work,” or when magistrates in a sexual assault case ruled out the attack on grounds that the victim’s use of a “daring gar- ment” contradicted her “shy” temperament. A large literature documents that ideological and biographical characteristics of judges matter and suggests that their preferences directly affect their rulings (Boyd and Spriggs. 2009; Glynn and Sen 2015; Kastellec 2013; Sunstein et al. 2007). In addition, judicial decisions are often highly predictable, in the sense that some judges are systematically more or less lenient, which hints at predetermined judgments (Dunn et al. 2017), and/or at the use of heuristics or stereotypes where salient attributes determine decisions (Bordalo et al. 2016; Gainer 2016). In order to construct our measure of attitudes toward gender, we rely on word embeddings, a language modeling technique from natural language processing that leverages word co-occurrence to create measures that preserve semantic meaning (Mikolovand et al. 2013; Pennington, Socher, and Manning 2014; Kozlowski, Taddy, and Evans 2019). Specifically, we use word embeddings to construct a lexical slant measure that captures the strength of the association between gendered language and either negative langauge or language that embraces gender stereotypes. We measure lexical gender slant by judge, with the majority opinions authored by each judge used as a separate corpus. We find that slant against women in written judgements is associated with lower win rates for female defendants. We estimate that a one standard deviation change in the measure of gender slant is associated with about a 2 percentage point decrease in win probability for female defendants. We also find that when judges are matched with defendants from the in-group, they issue judgements that are shorter and less likely to be cited. The fact that they are cited less often may suggest that they are of a lower quality (Landes, Lessig, and Solimine 1998). The effect is stronger when the defendant wins the case, which is consistent with lower-quality decisions being associated with in-group bias. These findings have important implications for the Kenyan context. As mentioned, women and certain ethnic groups are underrepresented in the judiciary. As such, they are more likely to be negatively affected by in-group bias. In practical terms, keeping in mind the main case types in the dataset, this could imply a range of negative consequences. For civil cases, which often involve disputes over money (among other topics), in-group bias would imply a financial disadvantage for women and underrepresented ethnic groups. For environment and land cases, bias may make these groups more likely to lose disputes over land ownership. Similarly, for succession cases, bias could lead to women being unfairly cut out of family inheritance or property. This paper contributes to four main literatures. First, it builds on the scant literature related to judicial bias in developing countries and is the first paper, to our knowledge, to focus on civil (non-criminal) cases in the Global South. Judicial bias has been well studied in the United States (Knepper 2018; Depew, Eren, and Mocan 2017) and Israel (Shayo and Zussman 2011; Gazal-Ayal and Sulitzeanu-Kenan 2010), but in few other countries. Even in high-income countries, the literature has rarely evaluated non-criminal cases: To our knowledge, only Shayo and Zussman (2011) and Knepper (2018) examine in-group bias in non-criminal cases, and they focus on a specific case type (small claim courts and sexual discrimination cases, respectively). In this paper, we study in-group bias in the Kenyan context, which builds much-needed evidence for the possibility of judicial bias outside the United States. As such, this paper helps expand our understanding of the scope of judicial bias in the Global South where these data are relatively scarce. Prior research in developing countries has exclusively focused on criminal cases (Ash, Asher, et al. 2021, Choi, Andy Harris, and Shen-Bayh 2021). Instead, we evaluate the entire corpus of non-criminal cases, including contract enforcement, land and property rights, succession cases, and family disputes. We are unaware of any other study in the Global South of cases that ordinarily play a role in economic growth–including those related to contract enforcement, property disputes, and the environment. Relative to the rest of the literature, our 3 findings show moderate in-group bias. Whereas we find that in-group status improves win probability by 4 to 5 percentage points, a recent study from India found null effects (Ash, Asher, et al. 2021), and a study in in Israel found that ethnic in-group status has a 17 to 20 percentage-point impact on win probability (Shayo and Zussman 2011). Second, to our knowledge, our paper is the first to evaluate how attitudes toward gender in judicial writ- ings and the quality of decisions are associated with the presence of in-group bias in judicial decisions. Ash, Chen, and Ornaghi (2021) show that judges displaying gender bias in their writings vote more conservatively in gender-related cases and their attitudes affect their interactions with their colleagues. We expand this result and find that stereotypical or negative gender attitudes in written judgments are also correlated with bias against female defendants in judicial cases. We complement this finding by showing that low-quality decisions (measured through proxies in the judgment text) are correlated with the presence of in-group bias. Third, we are one of the first papers to examine judicial bias both towards both defendants and plaintiffs. This is potentially due to the fact that most previous studies on the topic have focused on criminal cases, for which the plaintiff is typically the state.4 To our knowledge, only Knepper 2018 evaluates biases towards both plaintiffs and defendants, yet focuses on the earlier stages of sexual discrimination cases. Instead, we focus on final decisions in civil cases where both parties are natural persons (i.e., human beings rather than corporations or organizations). We find in-group bias towards defendants, but not towards plaintiffs. A potential explanation for this finding is loss aversion on behalf of the judge’s in-group, which is consistent with social identity theory. The theory predicts that when an individual perceives a “threat” to their in-group (and, by extension, their self-identity), they may be more likely to exhibit bias in favor of their group as a means to defend the group (and their identity) than when no such threat exists (Voci 2010).5 Nonetheless, we are not able to test the mechanisms that explain this difference in in-group bias across plaintiffs and defendants. Thus, our finding would benefit from further research to better understand the extent to which this finding holds in other contexts, as well as to investigate the mechanisms that might explain it. Fourth, our findings contribute to the broader literature on ethnic and gender bias. There is a large literature studying the extent to which ethnicity affects decision-making and preferences in sub-Saharan Africa.6 Considering the high level of ethnic fractionalization on the continent (and in Kenya), ethnic bias has major implications for a range of outcomes, including public service provision (Barkan and Chege 1989), community mobilization (Miguel and Gugerty 2005) and firm productivity (Andrew Harris 2014). Our paper adds to this literature by documenting a high-stakes form of ethnic bias (judicial bias), in Kenya. Relatedly, our study builds on research demonstrating the importance of female representation in public positions for both reducing bias (Beaman et al. 2009) and directly improving outcomes for women (Hessami and Fonseca 2020). It builds evidence for one specific channel (outcomes in court cases) through which female representation in the public sector directly affects women. The rest of the paper is organized as follows. Section 2 presents background information on the judiciary, gender, and ethnicity in Kenya. Section 3 presents the data used in the analysis. Section 4 outlines the empirical strategy. Section 5 presents the results. Finally, section 6 concludes. 2 Background: The Kenyan Judiciary The Kenyan judiciary is divided into two main court types: Superior and Subordinate Courts. The vast majority of our data covers the Superior Courts, which include High Courts, which hear both criminal and civil cases and appeals from Subordinate Courts; Environment and Land Courts; Employment and Labour Relations Courts; the Court of Appeal, which hears appeals from the High Courts, Environment and Land 4 For example, see Ash, Asher, et al. (2021), Gazal-Ayal and Sulitzeanu-Kenan (2010), Abrams, Bertrand, and Mullainathan (2012), Depew, Eren, and Mocan (2017), Lim, Silveira, and Snyder (2016), Yang (2015), Schanzenbach (2014), Mustard (2001), Arnold, Dobbie, and Yang (2018), Grossman et al. (2016), Choi, Andy Harris, and Shen-Bayh (2021), and Alesina and La Ferrara (2014). To our knowledge, only Shayo and Zussman (2011) and Knepper (2018) examine in-group bias in non-criminal cases. However, Shayo and Zussman (2011) focus on small claim courts and only include cases where the defendant and plaintiff are of different religions, so they are not able to examine differences in bias towards each litigant. Knepper (2018) examines workplace sexual discrimination cases, which only have one human litigant, as the defendant is an organization. 5 As evidence, Dietz-Uhler and Murrell (1998) found that when individuals read negative reviews about their group (i.e., when they were exposed to a threat), they were more likely to make positive affirmations about their own group. For additional evidence see Wann and Grieve (2005) and Voci (2010). 6 For an overview, see Berge et al. (2015). 4 Courts, and Employment and Labour Relations Courts; and the Supreme Court, which hears appeals from the Court of Appeal and other high-level cases (Kenyan Judiciary 2021).7 The Kenyan judiciary does not employ a jury system. This means that judges alone are able to decide the outcomes of cases, which implies that bias among judges can have especially serious consequences. For most cases in most courts, there is only one judge. An exception is in Courts of Appeal, where the majority of cases are composed of multi-judge panels. In August 2010, the judicial system was overhauled by the implementation of a new constitution. The Constitution led to a wide range of reforms, including in the judiciary (Akech 2010). The judicial reforms were designed to reduce executive branch control over judicial outcomes, eliminate the system of bribing judges, increase transparency in judge selection, reduce the large backlog of cases, and increase female participation (Akech 2011; Gainer 2016). The reforms included the appointment of an ombudsperson to address corruption complaints; the creation of a meritocratic judge appointment process, separate from the oversight of the executive; the design of a standardized case management system; a doubling of the judicial budget; and the creation of the requirement (applied across elective and appointive bodies throughout the government) that no more than two-thirds of Kenyan judges be of the same gender. Although not all of these reforms have been fully enacted, progress has been made on many dimensions (Gainer 2015; Mutunga 2011; IDLO 2020). These reforms could have important implications for in-group bias and its effects. For one, some of the reforms, such as the reduction of corruption and increase in meritocratic assignment, could potentially reduce overt bias. Moreover, with more women in the judiciary, the aggregate effects of in-group bias for women would be less severe. As we show in the summary statistics section, below, there has been substantial progress towards gender parity in the judiciary–though there is still a long way to go to achieve equality. Inequalities in the judi- ciary are reflective of broader gender inequalities in Kenyan society. According to the 2020 United Nations Development Programme’s Gender Inequality Index, which scores countries based on gender gaps related to representation in government, educational attainment, and labor force participation, Kenya ranks 126th out of 189 countries. Notably, women hold only 23 percent of seats in parliament (UNDP 2020). Inequalities are also widespread along ethnic lines, a fact which provided an impetus for the adoption of the new constitution and its measures to devolve power (Akech 2010). As we show below, some groups are underrepresented in the judiciary relative to their share of the population, whereas others are overrepresented. This finding showcases relative differences in access to justice and judicial representation across ethnicities. 3 Data 3.1 Variable Construction The main data source used in our analysis is the Kenyan Judiciary’s publicly available database for court cases.8 The database includes 159,645 cases, almost exclusively from the Superior Courts, over the period of 1976 to 2020. Kenya Law, an organization within the Kenya Judiciary, began uploading case information in 2006. They upload all cases that are sent to them from the individual courts. Judicial officers in Superior Courts have a mandate to send cases to Kenya Law. For cases prior to 2006, Kenya Law has made (and continues to make) efforts to gather and upload case information. In order to build our dataset for analysis from this database, we scraped the metadata and full text decision associated with each case. In doing so, we were able to directly extract the following for most cases: the names of plaintiffs, defendants, and judges; the type of case; the court in which the case was heard; whether the case was appealed; and the year the judgement was delivered. We also used the history associated with each case to determine whether it was an appeal. To determine gender and ethnicity and remove non-human cases (i.e., cases with companies, organizations, or the state as litigants), we used the name information scraped from the database. Cases without gender or 7 According to our data, the Court of Appeals almost exclusively hears civil cases; the Environment and Land Courts are largely split between civil cases and environment and land cases; the Employment and Labour Relations Courts are largely split between labor cases and civil cases; and the High Courts frequently hear a wide range of cases, including civil cases, land and environment cases, labor cases, criminal cases, and others. We have little data on Supreme Court cases, but it appears to hear mostly civil cases. Despite these general trends, the data appears to show that the courts are generally not restricted in the cases they hear, as they all tend to hear a wide range of case types. 8 See http://kenyalaw.org/caselaw/. 5 ethnicity information for judges and either plaintiffs or defendants were dropped.9 Once gender and ethnicity were assigned to each individual, we could determine the majority genders and plurality ethnicities for the judges, defendants, and plaintiffs for each case.10 We used machine learning to extract several other variables. To determine the winner of each case, we first scraped the case outcome information from the metadata. However, for 58,622 cases, the outcome was not stated. For these cases, we used a Binary Classification Machine Learning Model (described in appendix A) to analyze the text decisions of each case and determine the outcome. In the test set, the model was about 93 percent accurate. To measure the gender bias in judges’ writing, we used a word embedding approach that captures the textual relationship between gendered language and either positive/negative language or career- oriented/family-oriented language, following Ash, Chen, and Ornaghi (2021). This approach allowed us to measure the extent to which judges disproportionately associate women with either negative or stereotypical qualities (i.e., a focus on family rather than career). The two variables resulting from this process are Median slant, career vs family and Median slant, good vs bad. For both measures, positive values indicate greater slant against women. We also created variables measuring aspects of each written judgement that could signal quality of the judgement, including the number of cases cited in the text, the number of laws and acts cited in the text, the length of the text (measured as the number of words), and the number of times the judgement has been cited by other cases in our dataset. Appendix A describes data cleaning and variable construction in greater detail. We make available the metadata of all 159,645 cases, but focus the analysis on 29,363 cases with litigants who are individuals and have gender or ethnicity data.11 This data covers 95 courts and 392 judges. This sample cover the years 1976 to 2020, as figure 1 shows. Summary statistics of variables in the dataset are presented in Appendix B. they show that the main case types in our dataset are civil cases (46 percent), environment and land cases (32 percent), succession (10 percent), miscellaneous (8 percent), and labor relations (2 percent). All other cases comprise less than 1 percent of the total. Criminal cases are not included because they always have at least one non-human litigant (the state). Table C1 in Appendix C shows the court types that are included in the dataset. It indicates that over 99 percent of cases are from Superior Courts.12 9 The processes for removing non-humans and determining gender and ethnicity (as well as the reasons for missing informa- tion) are discussed in appendix A. 10 By majority, we mean an absolute majority, where one gender comprises more than 50 percent of the total. By plurality, we mean a simple majority, where there is more of one ethnic group than any other. If no majority could be determined for gender, the majority gender was coded as missing. If no plurality could be determined for ethnicity, the plurality ethnicity was coded as “no plurality.” This difference in coding was necessary because the main specification for gender in-group analysis requires binary outcomes, while the main specification for ethnicity in-group analysis does not (see specifications in section 4, below). 11 Of the initial 159,645 cases, 33,876 had exclusively human litigants for civil cases. An additional 4,513 cases were dropped because we were unable to determine majority gender or ethnicity for the litigants in the case. 12 In the “other” category, there are a small number of Subordinate Court cases. Most cases are from High Courts, followed by Environment and Land, Court of Appeal, and Employment and Labor. Very few Supreme Court cases are included. 6 Figure 1: Frequency of cases in the dataset over time 3.2 Summary Statistics Figure C1 in Appendix C shows that men comprise the majority of plaintiffs, the majority of defendants, and the majority of judges for most cases. The gender gap is especially large for plaintiffs and defendants. Men comprise the majority of plaintiffs and the majority of defendants in about three times as many cases as women. In contrast, female judges comprise the majority in over half the number of cases as male judges. Figure C2 in Appendix C shows how the gender gap has evolved over time. Since 1980, female rep- resentation has increased for all three roles (i.e., judge, plaintiff, and defendant). The increase has been most dramatic for judges, with sharp increases beginning around 2000. The increases continued after 2010, the year during which the new constitution was adopted. The gender gap is currently more prominent in women’s access to courts as plaintiffs than in their representation in court as judges. Figure C3 in Appendix C illustrates the gender gaps by case type and role. It shows that women are especially underrepresented as plaintiffs and defendants in civil, labor relations, and environment and land cases. In contrast, women are closer to parity as defendants and plaintiffs in family and succession cases. Figure C4 in Appendix C depicts the proportion of cases represented by the different ethnic pluralities. It also depicts each ethnic group’s proportion of the total Kenyan population, as a benchmark for equal rep- resentation. The specific ethnicities are intentionally masked. Clearly, some ethnicities are overrepresented as judges, plaintiffs, and defendants, while others are underrepresented. 4 Empirical Strategy The main goal of our analysis is to determine the extent to which judges exhibit bias towards plaintiffs and defendants of the same gender and ethnicity and to determine the predictors of this bias. To isolate the causal effect of in-group identity, we rely on the as-good-as-random assignment of judges to cases, described in the following subsection and further justified by balance tests. Random assignment is key to our empirical strategy because it assuages the concern that judge ethnicity or gender is correlated with case characteristics that affect outcomes. For example, if judges of a certain ethnic group preferred to rule on cases for which their ethnic group was less likely to be guilty, then we would expect to see indications of in-group bias, but the effect would be driven by selection bias rather than in-group bias. In addition, judges of a certain ethnicity may be more likely to rule on cases in areas of the 7 country where crime is more or less severe. If these distributions of crime severity are correlated with the ethnic distributions of defendants and plaintiffs, then we may again falsely perceive in-group bias. The rest of this section proceeds as follows. Subsection 4.1 presents the qualitative justification that judge assignment is as good as random (at least in the most recent years). Subsections 4.2 and 4.3 present quantitative evidence through balance tests that the random assignment assumption does in fact hold across the full time period for both gender and ethnicity, respectively. The econometric approaches for examining gender and ethnic in-group bias are described in detail in subsections 4.4 and 4.5, respectively. We also aim to examine whether judge gender slant in written opinions is correlated with outcomes for female defendants and plaintiffs. The approaches to these analyses are described in subsection 4.6. Finally, subsection 4.7 presents the specifications for examining the relationship between in-group bias and textual variables other than slant, for both gender and ethnicity. 4.1 Random Assignment of Cases to Judges To evaluate the existence of in-group bias, we exploit the quasi-random assignment of cases to judges. In Kenyan High Courts, cases filed in a court are categorized by court type and sent to the deputy registrar of the relevant court division (family, commercial and admiralty, labour and employment, constitutional, land and environment, or criminal). The deputy registrar then assigns the case to a judge based on the judge’s caseload and calendar, without considering case characteristics. This exogenous assignment is orthogonal to case characteristics such as the gender or ethnicity of the parties. Thus, this system produces as-good-as- random assignment of plaintiffs and defendants to judges, conditional on court division. This is consistent with the World Bank Doing Business’ Index, which asserts that cases are in fact randomly assigned to judges in Kenya (World Bank 2021). Nonetheless, it must be noted that this randomized procedure may be a recent phenomenon. Indeed, introducing randomization was allegedly one of the goals of the reform team following the implementation of the 2010 Kenyan Constitution (Gainer 2015). Therefore, it is necessary to provide further evidence that case selection by judges has not been a common feature across our sample. To do so, we present balance tests in the following two subsections. Considering the possibility that assignment has become de jure random after 2010, in addition to conducting the tests for the full sample, we also split the balance tests for de facto randomization before 2011 and after 2010. As a robustness check, we also conduct the main analysis for only after 2010. 4.2 Gender Balance Tests To confirm that judge assignment to cases is random in terms of gender majority, we use the following balance test for the analysis sample, for case i filed in court c at time t as: judge_maj _f emalei,c,t = 1 def _maj _f emalei,c,t + (1) 2 pla_maj _f emalei,c,t + c,t + Xi,c,t, + ✏i,c,t where c,t is a court-year fixed effect and Xi,c,t is a vector of additional control variables, which may include: binary variables for judge, defendant, and plaintiff plurality ethnicity; variables for the numbers of judges, plaintiffs, and defendants; a binary variable indicating whether the case is an appeal; and binary variables indicating the case type. Court-year fixed effects are used to ensure that we are comparing de- fendants and plaintiffs that are in the same court at the same time. Court-year periods with insufficient variation for regression analysis are dropped from the regressions. For this and all other models, we cluster standard errors at the judge level. As a robustness check for the main analyses, we use HC1 robust standard errors. The results of the balance test are shown in Table D1 in Appendix D. Column (1) does not include additional controls. Column (2) includes controls for ethnicity, and Column (3) adds additional controls (as listed in the table notes). The results indicate that male- and female-majority defendant groups are equally likely to be assigned male- and female-majority judge panels (including single-judge panels). Likewise, male- and female-majority plaintiff groups are equally likely to be assigned male- and female-majority judge panels. In light of the concern that cases have only become randomized after the creation of the new constitution and the accompanying judicial reforms, Tables D2 and D3 present balance tests for pre-2011 and since 2011, respectively. The results are consistent with Table D1. 8 4.3 Ethnicity Balance Tests To confirm that judge assignment to cases is random in terms of ethnic majority, we use variations of the following balance test: judge_plur_kikuyui,c,t = 1 def _plur_kikuyui,c,t + (2) 2 pla_plur _kikuyui,c,t + c,t + Xi,c,t, + ✏i,c,t where c,t and Xi,c,t represent the same fixed effects and controls as before, judge_plur_kikuyui,c,t is a binary variable indicating whether the judge plurality is the Kikuyu ethnic group, def _plur_kikuyui,c,t is a binary variable indicating whether the defendant plurality is the Kikuyu ethnic group, and pla_plur_kikuyui,c,t is a binary variable indicating whether the plaintiff plurality is the Kikuyu ethnic group. We run series of 12 tests, with each test using binary variables for different ethnicities. Tables D4 through D7 in Appendix D report the results of the tests. They show that defendants and plaintiffs across all ethnicities are not more likely to be assigned judges from their ethnic group. One exception is Luhya defendants, as table D5 shows. Balance tests for both pre-2011 and since 2011 are also presented in Appendix D (see Tables D8 through D15). They show that there are significant coefficients for Luhya defendants in the 2011-2020 period and Kamba for the 1976-2010 period. We conduct a robustness check of the main analysis that drops all Luhya and Kamba individuals. Appendix E presents these results. A comparison between these results and the main results below show that the in-group bias we observe is not driven by any possible bias in Luhya or Kamba case assignment.13 4.4 Main Gender Specifications To estimate judicial gender bias, we model outcome Yi,c,t (where Y=1 corresponds to the defendant winning the case) for case i filed in court c at time t as: Yi,c,t = ↵ + 1 judge_maj _f emalei,c,t + 2 def _maj _f emalei,c,t + (3) 3 judge_maj _f emalei,c,t ⇤ def _maj _f emalei,c,t + c,t + Xi,c,t, + ✏i,c,t where judge_maj _f emale and def _maj _f emale are binary variables indicating whether judge panels and defendant groups, respectively, are majority female. The main outcome of interest is the interaction term, which indicates in-group bias. The specification used to test for in-group bias towards plaintiffs is identical to (1), except a binary variable for plaintiff majority gender, pla_maj _f emale substitutes def _maj _f emale. An alternate specification includes both variables, as such: Yi,c,t = ↵ + + 2 def _maj _f emalei,c,t + 1 judge_maj _f emalei,c,t + 4 judge_maj _f emalei,c,t ⇤ def _maj _f emalei,c,t 3 pla_maj _f emalei,c,t (4) 5 judge_maj _f emalei,c,t ⇤ pla_maj _f emalei,c,t + c,t + Xi,c,t, + ✏i,c,t 4.5 Main Ethnicity Specifications For the ethnicity in-group bias analysis, we use a slightly different econometric specification in order to account for the fact that there are many more categories of ethnicity. To estimate judicial ethnic bias, we model outcome Yi,c,t (where Y=1 corresponds to the defendant winning the case) for case i filed in court c at time t as: Yi,c,t = ↵ + 1 judge_pla_samei,c,t + 2 judge_def _samei,c,t + c,t + Xi,c,t, + ✏i,c,t (5) where judge_pla_samei,c,t is a binary variable indicating whether the judge ethnic plurality is the same as the plaintiff ethnic plurality, and judge_def _samei,c,t is a binary variable indicating whether the judge ethnic plurality is the same as the defendant ethnic plurality. 13 Tables D4 through D15 include a full set of controls. To save space, we have not included the results without controls. However, they are qualitatively similar, with no additional significant coefficients for the variables of interest. 9 4.6 Gender Slant Analysis To examine the conditions under which gender bias can be expected, we examine whether judges’ slant against women in opinions predicts bias against female defendants and plaintiffs. For this analysis, we use a specification that examines bias against women in general, rather than in-group bias. To do so, we build on equation (3) by adding one of the two measures of slant, described above, and an interaction between the slant measure and def _maj _f emale and/or pla_maj _f emale. The main outcomes of interest are the interactions with slant, which indicate whether female defendants and plaintiffs are less likely to win the case if the judge exhibits slant in her/his writing. 4.7 Judgement Text Specifications To study the relationship between in-group bias and characteristics of the judgement text for a given case, we start with variations on the following specification: Yi,c,t = ↵ + 1 judge_maj _f emalei,c,t + 2 def _maj _f emalei,c,t + 3 judge_maj _f emalei,c,t ⇤ def _maj _f emalei,c,t + 4 judge_def _samei,c,t + c,t + Xi,c,t, + ✏i,c,t (6) where Yi,c,t represents one of the four judgement text variables (number of cases cited, number of laws/acts cited outcome, number of times the case has been cited, or length) for case i filed in court c at time t. judge_def _samei,c,t is a binary variable indicating whether the judge ethnic plurality is the same as the defendant ethnic plurality. c,t is a court-year fixed effect and Xi,c,t is a vector of additional control variables. In addition to running this specification on the full sample, we split the sample into two groups–cases where the defendant won and cases where the defendant lost–and run an additional series of regression for each. If in-group bias yields positive outcomes and is associated with different characteristics for judgement texts, then we might see significant coefficients on judge_def _samei,c,t and/or 3 judge_maj _f emalei,c,t ⇤ def _maj _f emalei,c,t for the defendant-win sample but not the defendant-lose sample, and the coefficients in the defendant-win sample should be larger than in the full sample. For example, if ethnically biased judgements are associated with the case being cited fewer times, then we should expect to see a negative coefficient on judge_def _samei,c,t in the defendant-win sample, a (potentially null) negative coefficient of lesser magnitude in the full sample, and a null coefficient in the defendant-lose sample. 5 Results 5.1 Main Gender Results The main gender regression results are presented in Table 1. The significantly positive coefficients on the interaction between judge and defendant majority gender provide evidence that there is in-group gender bias from judges towards defendants. This finding is robust to various specifications. The significant results suggest that all else equal, defendants are between 3.6 and 4.0 percentage points more likely to win if they have the same majority gender as the judges. As robustness checks, in Appendix E, we show that the results hold when only cases after 2010 are included, when robust standard errors are used (rather than standard errors clustered at the judge level), and when a probit model is used rather than OLS. The results do not provide evidence of in-group bias towards plaintiffs. However, the coefficients for plaintiff and defendant in-group bias are not significantly different (p=0.216 for column 5), so we cannot claim that bias is stronger towards defendants. Descriptively, we also see that female judges are in general more likely to rule in favor of defendants, and male plaintiffs are in general more likely to lose. Figures 2 and 3 visualize these results. Figure 2 displays defendant win proportions by judge and defendant majority gender. With win proportions higher for female majority defendants among female judge panels and win proportions higher for male majority defendants among male judge panels, the figure is suggestive of in-group bias and qualitatively consistent with the findings from Table 1, even though it does not control for the court-year level of randomization. Figure 3 displays plaintiff win proportions by judge and defendant majority ethnicity. It is also consistent with Table 1 as it is not suggestive of in-group bias. 10 Figure 4 visualizes the in-group bias trend for defendants. Based on a series of regressions, one for each individual judge, it plots the distribution of individual judge bias coefficients. Although there are some more extreme judges, in the direction of both in- and out-group bias, the results seem to be largely driven by mildly in-group biased judges.14 14 Figure F1 in appendix F visualizes the results in another way. It helps highlight the fact that some judges exhibit more extreme gender in-group bias. Table G1 in appendix G explores the effects of putting biased judges on panels. It analyzes in-group bias among the 14 judges with significant coefficients for gender in-group bias towards defendants in individual judge regressions. It shows that when these judges make decisions individually, the defendant is 40 percentage points more likely to win if they share the judge’s gender. In contrast, when these judges rule on panels with other judges, the defendant is only 40 percentage points more likely to win if they share the judge’s gender. It is important to note that these results are not causal. Nonetheless, they suggest one potential means for policymakers to reduce bias: put biased judges on panels. There were insufficient observations to conduct this analysis for ethnicity; few of the ethnicity in-group biased judges had ruled on panels. 11 Table 1: Gender main results (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge maj. female -0.0380⇤⇤⇤ -0.0397⇤⇤ -0.0468⇤⇤⇤ -0.0410⇤⇤⇤ -0.0411⇤⇤⇤ (0.0122) (0.0197) (0.0133) (0.0132) (0.0133) Pla. maj. female -0.0296⇤⇤ -0.0528⇤⇤⇤ -0.0426⇤⇤⇤ -0.0433⇤⇤⇤ (0.0128) (0.0109) (0.0108) (0.0108) Def. maj. female -0.00495 0.00144 0.00706 0.00667 (0.0106) (0.0108) (0.0107) (0.0107) Judge maj. fem. X pla. maj. fem. 0.0173 0.00690 0.00655 0.00726 (0.0208) (0.0171) (0.0169) (0.0168) Judge maj. fem. X def. maj. fem. 0.0359⇤⇤ 0.0400⇤⇤ 0.0379⇤⇤ 0.0380⇤⇤ (0.0167) (0.0168) (0.0168) (0.0167) DV mean 0.452 0.429 0.454 0.454 0.454 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No No Yes Other controls No No No Yes Yes Observations 22801 25618 20394 20394 20394 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) majority gender as judges. The coefficients of interest are on the interaction terms. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equations 3 and 4. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown.Pla. = plaintiff, def. = defendant, maj. = majority. 12 Figure 2: Defendant win proportion by judge and defendant majority gender def. = defendant, maj. = majority. Figure 3: Defendant win proportion by judge and plaintiff majority gender pla. = plaintiff, maj. = majority. 13 Figure 4: Distribution of coefficients estimating individual judges’ in-group gender bias towards defendants Coefficients are based on a regression with court-year fixed effects. Judges without sufficient variation in outcomes were dropped. In total, 187 judges are included. 5.2 Gender Slant Analysis Results To investigate whether proxies for judges’ implicit attitudes or stereotypes captured in their textual slant drive the bias in decision-making we see, we present the results of the slant analysis in Tables 2 and 3. Using the career vs. family measure of slant, Table 2 provides evidence of a correlation between biased writing and negative outcomes for women. It suggests that, for a 0.1 increase in slant against women (equivalent to about one standard deviation of the career vs. family measure), female defendants are about 1.5 percentage points less likely to win. The results hold across various specifications. The results with the good vs. bad measure in Table 3 are similar, though larger. They show that, for a 0.05 increase in slant against women (equivalent to about one standard deviation of the good vs. bad measure), female defendants are about 1.6 to 1.8 percentage points less likely to win. Figure 5 presents the predicted win proportions for male and female defendants and various levels of judge slant, for the career vs. family measure. These predictions are based on Table 2, column (3). The figure shows that male defendants are more likely to win–and female defendants are less likely to win–if judges are more slanted against women in their writing. Figure 6 presents the predicted win proportions for male and female defendants and various levels of judge slant, for the good vs. bad measure. These predictions are based on Table 3, column (3). In this case, the figure shows that male defendants are essentially unaffected by a judge’s slant. However, female defendants are still less likely to win if judges are more slanted against women in their writing.15 One other noteworthy finding related to slant is that for the good vs bad measure of slant, female judges are much more likely to be slanted (p<0.01), with an average slant of 0.069 compared to 0.056 for men. For the family vs career measure, male judges are much more likely to be slanted (p<0.01), with an average slant of -0.025 compared to -0.036 for women. 15 In Appendix H, we present several other results related to textual gender slant. Tables H1 and H2 analyze the relationship between judge slant and appeals and figures H1 and H2 visualize the relationship. For the family vs career measure of slant, the relationship is null or even negative for some specifications. For the good vs bad measure, the relationship is positive but weak and not significant for more specifications. These mostly null results are constrasted with the findings for the relationship between slant and and reversals, presented in tables H3 and H4 and figures H3 and H4. Although the results for the family vs career measure of slant are again mixed, the results for the good vs bad measure are more consistently positive. These findings suggest that 1) judge slant (according to the good vs bad measure) may be associated with lower quality judgements prone to reversals and 2) since the appeals results are null, litigants and attorneys may not able to recognize gender bias in decisions and/or are not aware that they are more likely to have decisions reversed if they appeal. 14 Table 2: Gender results with text slant, career vs family measure (1) (2) (3) (4) Def. win Def. win Def. win Def. win Judge maj. female -0.0481⇤⇤⇤ -0.0479⇤⇤⇤ -0.0477⇤⇤⇤ -0.0435⇤⇤⇤ (0.0138) (0.0138) (0.0138) (0.0138) Pla. maj. female -0.0505⇤⇤⇤ -0.0486⇤⇤⇤ -0.0500⇤⇤⇤ -0.0372⇤⇤⇤ (0.0117) (0.0118) (0.0117) (0.0116) Def. maj. female -0.00350 -0.00635 -0.00619 -0.00228 (0.0117) (0.0117) (0.0117) (0.0116) Judge maj. fem. X pla. maj. fem. 0.00171 0.00211 0.00163 0.000857 (0.0180) (0.0182) (0.0181) (0.0177) Judge maj. fem. X def. maj. fem. 0.0460⇤⇤⇤ 0.0440⇤⇤⇤ 0.0440⇤⇤ 0.0430⇤⇤ (0.0174) (0.0169) (0.0169) (0.0168) Slant against women, career vs family 0.0110 0.0316 0.0457 -0.0358 (0.0823) (0.0815) (0.0828) (0.0840) Pla. maj. fem. X Slant against women -0.0645 -0.0537 -0.0160 (0.0852) (0.0866) (0.0884) Def. maj. fem. X Slant against women -0.156⇤ -0.152⇤ -0.146⇤ (0.0856) (0.0861) (0.0828) DV mean Court-year FE Yes Yes Yes Yes Ethnicity dummies No No No Yes Other controls No No No Yes Observations 18205 18205 18205 18205 The regressions test whether defendants/plaintiffs are more likely to lose if they are female and the judge is slanted against females in their writing. The coefficients of interest are on the interaction terms in the last two rows. The measure of slant against women is based on the judges’ stereotypical association of women with family-based qualities rather than career-based qualities. All columns are based on a linear regression model. For specification details, see equation 3. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant, maj. = majority. 15 Table 3: Gender results with text slant, good vs bad measure (1) (2) (3) (4) Def. win Def. win Def. win Def. win Judge maj. female -0.0434⇤⇤⇤ -0.0436⇤⇤⇤ -0.0445⇤⇤⇤ -0.0392⇤⇤ (0.0156) (0.0154) (0.0154) (0.0158) Pla. maj. female -0.0322⇤ -0.0493⇤⇤⇤ -0.0330⇤ -0.0223 (0.0167) (0.0128) (0.0168) (0.0157) Def. maj. female -0.00536 0.0168 0.0160 0.0162 (0.0117) (0.0146) (0.0146) (0.0142) Judge maj. fem. X pla. maj. fem. 0.00500 0.00159 0.00479 0.00289 (0.0203) (0.0206) (0.0204) (0.0201) Judge maj. fem. X def. maj. fem. 0.0464⇤⇤ 0.0505⇤⇤⇤ 0.0504⇤⇤⇤ 0.0502⇤⇤⇤ (0.0185) (0.0182) (0.0182) (0.0181) Slant against women, good vs bad -0.0882 -0.0765 -0.00212 -0.0307 (0.151) (0.146) (0.152) (0.155) Pla. maj. fem. X Slant against women -0.298⇤ -0.277 -0.244 (0.179) (0.181) (0.176) Def. maj. fem. X Slant against women -0.375⇤⇤ -0.358⇤⇤ -0.321⇤ (0.173) (0.175) (0.172) DV mean Court-year FE Yes Yes Yes Yes Ethnicity dummies No No No Yes Other controls No No No Yes Observations 15175 15175 15175 15175 The regressions test whether defendants/plaintiffs are more likely to lose if they are female and the judge is slanted against females in their writing. The coefficients of interest are on the interaction terms in the last two rows. The measure of slant against women is based on the judges’ association of women with negative qualities. All columns are based on a linear regression model. For specification details, see equation 3. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant, maj. = majority. 16 Figure 5: Predicted defendant win proportions at various levels of judge slant (career vs family measure), by defendant gender Based on table 2, column (3). Figure 6: Predicted defendant win proportions at various levels of judge slant (good vs bad measure), by defendant gender Based on table 3, column (3). 17 5.3 Main Ethnicity Results The main ethnicity regression results are presented in Table 4. They show that defendants are between 4.4 and 5.8 percentage points more likely to win if they share an ethnicity with the judge. This is evidence of in-group bias among judges towards defendants. The finding is robust to all of the specifications presented. As with gender, in-group bias is not observed for plaintiffs; the coefficients are both positive and negative across specifications and are not significant. Unlike with gender, the coefficients for plaintiff and defendant in-group bias are significantly different (p=0.022 for column 5). As robustness checks, in Appendix E, we show that the results hold when only cases after 2010 are included, when robust standard errors are used rather than standard errors clustered at the judge level, and when a probit model is used rather than OLS. Figure 7 displays defendant win proportions across various in-group categories relating judges, defendants, and plaintiffs. Figure 7a displays outcomes when the judge and plaintiff have the same ethnicity. Figure 7b displays outcomes when the judge and plaintiff have different ethnicities. Consistent with the regression, they both show that defendants are qualitatively more likely to win when judges and defendants are the same the ethnicity without the inclusion of court-year fixed effects. Figure 8 visualizes the in-group bias trend for defendants. Based on a series of regressions, one for each individual judge, it plots the distribution of individual judge bias coefficients. Similar to the findings for gender, the results seem to be largely driven by a preponderance of mildly in-group biased judges, despite the presence of some more extreme judges, in the direction of both in- and out-group bias. 16 Table 4: Ethnicity results (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge-pla. same 0.00681 -0.0138 -0.00549 -0.00578 (0.0127) (0.0139) (0.0153) (0.0154) Judge-def. same 0.0439⇤⇤⇤ 0.0573⇤⇤⇤ 0.0532⇤⇤⇤ 0.0551⇤⇤⇤ (0.0122) (0.0143) (0.0159) (0.0158) DV mean 0.450 0.453 0.453 0.453 0.453 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No Yes Yes Other controls No No No No Yes Observations 21842 20971 18952 18952 18952 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) plurality ethnicity as judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 5. Judge-pla. same and Judge-def. same refer to similarity in plurality ethnicity. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for both defendants and plaintifs. Other controls include case type dummies; a dummy for an appeal case; variables for the numbers of defendants, plaintiffs, and judges; and dummies for defendant, plaintiff, and judge majority gender. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant. 16 Figure F2 in appendix F visualizes the results in another way. It helps highlight the fact that some judges exhibit more extreme gender in-group bias. Appendix I presents the results from a regression with interactions between gender and ethnic in-group status. The results are null. 18 Figure 7: defendant win proportion by similarities/differences in plurality ethnicity across judges, plaintiffs, and defendants (a) (b) def. = defendant, pla. = plaintiff. 19 Figure 8: Distribution of coefficients estimating individual judges’ in-group ethnic bias towards defendants Coefficients are based on a regression with court-year fixed effects. Judges without sufficient variation in outcomes were dropped. In total, 185 judges are included. 5.4 Judgement Text Results To investigate the mechanism through which bias may be rendered, Tables 5 - 11 present the results from equation 6. Tables 5 - 7 present the results for gender in-group bias. They show that there is a significant negative correlation between the number of words in a judgement and in-group gender status for judges and defendants–but only when the defendant wins. This suggests that, when there is potential for gender in-group bias (i.e., when the judge and defendant are the same gender and the defendant wins), the judge tends to writer shorter judgements. It is possible that, when judges make biased judgements, they are less able to justify their decision based on solid legal grounds, and therefore write shorter judgements. The magnitude of the effect is relatively small. Column (4) of table 6 suggests that judges write about 141 fewer words when the defendant wins and they are the same gender as the judge. But as table B1 in appendix B shows, the mean and standard deviation for number of words in a judgement are 1451 and 1337, respectively. Tables 8 - 10 present the results for ethnicity. They also provide evidence for biased decisions being associated with shorter written judgements. In addition, the tables show that, when the judge and defendant are the same ethnicity, the judgement is likely to be cited fewer times. Consistent with an in-group bias interpretation, the effect is strongest in the sample where the defendant wins, significant but weaker in the full sample, and null in the sample where the defendant loses. Though we cannot be certain what is driving this relationship, it may indicate that judges are less likely to cite cases with biased decisions. The effects on citations are substantial. The mean number of times cited is about 0.23, and column (2) of Table 9 suggests that judgements are cited about 0.14 fewer times when the defendant wins and they are the same ethnicity as the judge. Table 11 shows that most of these findings (with the exception of the relationship between in-group ethnic bias and the number of words) are robust to the inclusion of additional controls. 20 Table 5: Judgement text regressions, gender, full sample (1) (2) (3) (4) Num. citations Times cited Num laws cited Words in judg. Judge maj. female 0.402⇤⇤ -0.0948⇤⇤ 0.293 82.45 (0.195) (0.0413) (0.183) (85.79) Def. maj. female -0.108⇤ 0.0389 -0.130 -28.85 (0.0650) (0.0613) (0.0844) (22.16) Judge maj. fem. X def. maj. fem. -0.0790 -0.0437 0.228 -25.34 (0.123) (0.0674) (0.151) (46.18) DV mean 2.020 0.250 2.200 1484.0 Court-year FE Yes Yes Yes Yes Observations 22801 22801 22801 22801 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. If in-group bias is associated with different characteristics for judgement texts, then we should see significant coefficients for the defendant-win sample but not the defendant-lose sample, the coefficients in the defendant-win sample should be larger than in the full sample. This table presents the full sample results. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 6. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. Def. = defendant, maj. = majority, fem. = female. Table 6: Judgement text regressions, gender, defendant win (1) (2) (3) (4) Num. citations Times cited Num laws cited Words in judg. Judge maj. female 0.326 -0.119⇤⇤⇤ 0.126 85.48 (0.205) (0.0459) (0.223) (86.29) Def. maj. female -0.0684 -0.0653 -0.218⇤ 56.57 (0.0855) (0.0843) (0.129) (35.86) Judge maj. fem. X def. maj. fem. -0.176 0.0805 0.186 -140.1⇤⇤ (0.175) (0.0975) (0.203) (67.75) DV mean 2.076 0.257 2.303 1517.9 Court-year FE Yes Yes Yes Yes Observations 10235 10235 10235 10235 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. If in-group bias is associated with different characteristics for judgement texts, then we should see significant coefficients for the defendant-win sample but not the defendant-lose sample, the coefficients in the defendant-win sample should be larger than in the full sample. This table presents the defendant-win sample results. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 6. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. Def. = defendant, maj. = majority, fem. = female. 21 Table 7: Judgement text regressions, gender, defendant lose (1) (2) (3) (4) Num. citations Times cited Num laws cited Words in judg. Judge maj. female 0.462⇤⇤ -0.0939 0.415⇤⇤ 70.47 (0.216) (0.0630) (0.196) (96.78) Def. maj. female -0.140 0.0993 -0.0529 -106.1⇤⇤⇤ (0.103) (0.0768) (0.0981) (35.96) Judge maj. fem. X def. maj. fem. -0.0416 -0.110 0.221 73.42 (0.160) (0.0818) (0.183) (58.10) DV mean 1.963 0.239 2.100 1449.4 Court-year FE Yes Yes Yes Yes Observations 12422 12422 12422 12422 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. If in-group bias is associated with different characteristics for judgement texts, then we should see significant coefficients for the defendant-win sample but not the defendant-lose sample, the coefficients in the defendant-win sample should be larger than in the full sample. This table presents the defendant-lost sample results. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 6. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. Def. = defendant, maj. = majority, fem. = female. Table 8: Judgement text regressions, ethnicity, full sample (1) (2) (3) (4) Num. citations Times cited Num laws cited Words in judg. Judge-defendant same ethnicity=1 -0.134 -0.0828⇤⇤⇤ -0.146 -67.04 (0.110) (0.0310) (0.144) (53.06) DV mean 2.012 0.217 2.199 1480.2 Court-year FE Yes Yes Yes Yes Observations 20971 20971 20971 20971 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. If in-group bias is associated with different characteristics for judgement texts, then we should see significant coefficients for the defendant-win sample but not the defendant-lose sample, the coefficients in the defendant-win sample should be larger than in the full sample. This table presents the full sample results. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 6. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. 22 Table 9: Judgement text regressions, ethnicity, defendant win (1) (2) (3) (4) Num. citations Times cited Num laws cited Words in judg. Judge-defendant same ethnicity=1 -0.147 -0.137⇤⇤ -0.229 -97.65⇤ (0.137) (0.0583) (0.177) (57.53) DV mean 2.054 0.220 2.253 1505.6 Court-year FE Yes Yes Yes Yes Observations 9433 9433 9433 9433 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. If in-group bias is associated with different characteristics for judgement texts, then we should see significant coefficients for the defendant-win sample but not the defendant-lose sample, the coefficients in the defendant-win sample should be larger than in the full sample. This table presents the defendant-win sample results. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 6. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. Table 10: Judgement text regressions, ethnicity, defendant lose (1) (2) (3) (4) Num. citations Times cited Num laws cited Words in judg. Judge-defendant same ethnicity=1 -0.0923 -0.0321 -0.130 -46.70 (0.147) (0.0361) (0.181) (61.11) DV mean 1.979 0.211 2.157 1458.6 Court-year FE Yes Yes Yes Yes Observations 11395 11395 11395 11395 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. If in-group bias is associated with different characteristics for judgement texts, then we should see significant coefficients for the defendant-win sample but not the defendant-lose sample, the coefficients in the defendant-win sample should be larger than in the full sample. This table presents the defendant-lose sample results. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 6. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. 23 Table 11: Judgement text regressions, additional controls, defendant win (1) (2) (3) (4) Times cited Times cited Words in judg. Words in judg. Judge maj. female -0.0996⇤ -0.0748 89.96 100.3 (0.0539) (0.0524) (93.08) (91.29) Def. maj. female -0.0209 0.0236 58.03 87.37⇤ (0.0849) (0.0960) (46.72) (49.78) Judge maj. fem. X def. maj. fem. 0.0407 0.0250 -139.0⇤ -152.7⇤⇤ (0.0998) (0.105) (73.73) (73.13) Judge-defendant same ethnicity=1 -0.137⇤⇤ -0.127⇤ -97.20 -70.46 (0.0612) (0.0687) (59.67) (63.01) DV mean 0.220 0.220 1505.1 1505.1 Court-year FE Yes Yes Yes Yes Other controls No Yes No Yes Observations 7963 7963 7963 7963 The regressions test whether in-group bias is associated with significantly different aspects of judges’ written judgements. Standard errors, in parentheses, are clustered at the judge level. Num. citations refers to the number of citations in the judgement. Times cited refers to the number of times the case has been cited. Num. laws cited refers to the number of laws and and acts cited in the judgement. Words in judg. refers to the number of words in the written judgement. Other controls include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges; case type dummies; a dummy for an appeal case; and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Def. = defendant, maj. = majority, fem. = female. 24 6 Conclusion In this paper, we examine the extent and determinants of judicial bias in Kenya, with a focus on gender and ethnic in-group bias. Our data cover Kenyan Superior Court cases spanning 1976-2020 (with a focus on civil, land and environment, and succession cases) and our identification strategy relies on the random assignment of judges to cases. Our analysis also looks at the relationship between bias in judge decisions and measures of slant against women in judges’ written decisions, which we derive through machine learning. Our main finding is that judges in Kenya display both gender and ethnic in-group bias towards defendants. Our results suggest that defendants are about 4 percentage points more likely to win if they share the judge’s gender and about 5 percentage points more likely to win if they share the judge’s ethnicity. We also find evidence that slant against women in written judgements is associated with lower win-rates for female defendants. The results show that a one standard deviation change in the measure of gender slant is associated with about a 2 percentage point decrease in win probability for female defendants. Finally, we show that potentially biased judgements are associated with shorter written judgements (for gender and ethnic bias) that are less likely to be cited (for ethnic bias), which suggests that biased decisions are linked to poorer quality written judgements. These findings have important implications for the Kenyan context. Women and certain ethnic groups are underrepresented in the judiciary. As such, they are more likely to be negatively affected by in-group bias. In concrete terms–since the main cases in the dataset are civil cases, environment and land cases, and succession cases–in-group bias might imply a financial disadvantage, greater likelihood of losing disputes over land ownership, or being cut out of family inheritance or property. Several approaches could be taken to reduce bias. Primarily, greater efforts could be made to achieve equal representation of female judges and representation of ethnic groups relative to their proportion of the total population. Second, implicit bias trainings, which have been proven effective in some settings (Jackson, Hillard, and Schneider 2014), could be implemented for judges. Third, judges could simply be provided with data on the extent of their bias in decision-making. Some research has shown that the provision of information on biases can lead to more action in favor of out-groups (Hillard, Ryan, and Gervais 2013). Importantly, the application of these approaches to the Kenyan context should be rigorously tested. The findings also make important contributions to the literature on judicial bias. They expand the study of in-group judicial bias outside the most heavily studied contexts and provide further evidence that such bias may be prevalent across many contexts. The results are also the first to show that judges may exhibit greater bias towards defendants than towards plaintiffs, a phenomenon which is consistent with social identity theory. Furthermore, they contribute to the broader literature on ethnic bias in Kenya and sub-Saharan Africa more broadly, showing that ethnic preferences influence decision-making in courts. They also build on the literatures related to gender discrimination and the importance of female representation in the public sector. Finally, the paper presents a novel application of machine learning techniques to help understand the determinants of bias. Future research should focus on further unveiling the determinants and scope of bias in the judiciary, as well as on how to reduce the presence of bias in the judiciary. 25 References Abrams, David, Marianne Bertrand, and Sendhil Mullainathan (2012). “Do Judges Vary in Their Treatment of Race?” In: The Journal of Legal Studies 41.2, pp. 347–383. Akech, Migai (2010). Institutional Reform in the New Constitution of Kenya. International Center for Tran- sitional Justice. — (2011). “Abuse of Power and Corruption in Kenya: Will the New Constitution Enhance Government Accountability?” In: Ind. J. of Global Legal Studies 341, pp. 377–378. Alesina, Alberto and Eliana La Ferrara (2014). “A Test of Racial Bias in Capital Sentencing”. In: American Economic Review 104.11, pp. 3397–3433. Antoniak, Maria and David Mimno (2018). “Evaluating the stability of embedding-based word similarities”. In: Transactions of the Association for Computational Linguistics 6, pp. 107–119. Arnold, David, Will Dobbie, and Crystal Yang (2018). “Racial Bias in Bail Decisions”. In: The Quarterly Journal of Economics 133.4, pp. 1885–1932. Ash, Elliott, Sam Asher, et al. (2021). “Measuring Gender and Religious Bias in the Indian Judiciary”. In: Center for Law and Economics Working Paper Series 3. Ash, Elliott, Daniel Chen, and Arianna Ornaghi (2021). “Gender Attitudes in the Judiciary: Evidence from U.S. Circuit Courts”. In: Working Paper. Asingo, Patrick et al. (2018). Ethnicity and Politicization in Kenya. Kenya Human Rights Commission. Barkan, Joel and Michael Chege (1989). “Decentralising the state: district focus and the politics of realloca- tion in Kenya”. In: The Journal of Modern African Studies 27.3, pp. 431–453. Beaman, Lori et al. (2009). “Powerful Women: Does Exposure Reduce Bias?” In: The Quarterly Journal of Economics 124.4, pp. 1497–1540. Berge, Lars et al. (2015). “How strong are ethnic preferences?” In: Working paper. Bertrand, Marianne, Dolly Chugh, and Sendhil Mullainathan (2005). “Implicit discrimination.” In: American Economic Review 95.2, pp. 94–98. Bordalo, Pedro et al. (2016). “Stereotypes”. In: The Quarterly Journal of Economics 131.4, pp. 1753–1794. Boyd, Christina and James Spriggs. (2009). “An examination of strategic anticipation of appellate court preferences by federal district court judges”. In: Wash. U. J. L. and Pol’y 37. Choi, Danny, Andy Harris, and Fiona Shen-Bayh (2021). “Ethnic Bias in Criminal Sentencing: Evidence from Kenya”. In: Forthcoming at the American Political Science Review. Depew, Briggs, Ozkan Eren, and Naci Mocan (2017). “Judges, juveniles, and in-group bias”. In: The Journal of Law and Economics 60.2, pp. 209–239. Dietz-Uhler, Beth and Audrey Murrell (1998). “Effects of social identity and threat on self-esteem and group attributions”. In: Group Dynamics: Theory, Research, and Practice 2.1. Dunn, Matt et al. (2017). “Early predictability of asylum court decisions”. In: Proceedings of the 16th edition of the International Conference on Artificial Intelligence and Law, pp. 233–236. Friedrich-Ebert-Stiftung (2012). Regional Disparities and Marginalisation in Kenya. Gainer, Maya (2015). “Transforming the Courts: Judicial Sector Reforms in Kenya”. In: Princeton University Innovations for Successful Societies 1.7. — (2016). “How Kenya Cleaned Up Its Courts”. In: Foreign Policy. Gazal-Ayal, Oren and Raanan Sulitzeanu-Kenan (2010). “Let My People Go: Ethnic In-Group Bias in Judicial Decisions–Evidence from a Randomized Natural Experiment”. In: Journal of Empirical Legal Studies 7.3, pp. 403–428. Glynn, Adam and Maya Sen (2015). “Identifying judicial empathy: does having daughters cause judges to rule for women’s issues?” In: American Journal of Political Science 59.1, pp. 37–54. Grossman, Guy et al. (2016). “Descriptive Representation and Judicial Outcomes in Multiethnic Societies”. In: American Journal of Political Science 60.1, pp. 44–69. Harris, Andrew (2014). Replication data for: What’s in a name? A Method for Extracting Information about Ethnicity from Names. url: https://doi.org/10.7910/DVN/27691. Hessami, Zohal and Mariana Lopes da Fonseca (2020). “Female political representation and substantive effects on policies: A literature review”. In: European Journal of Political Economy 63. Hillard, Amy, Carey Ryan, and Sarah J. Gervais (2013). “Reactions to the implicit association test as an educational tool: A mixed methods study”. In: Social Psychology of Education 16, pp. 495–516. 26 Hochreiter, Sepp and Jurgen Schmidhuber (1997). “Long short-term memory”. In: Neural computation 9.8, pp. 1735–1780. IDLO (2020). Women’s Professional Participation in Kenya’s Justice Sector: Barriers and Pathways. Inter- national Development Law Organization. Jackson, Sarah, Amy Hillard, and Tamera Schneider (2014). “Reactions to the implicit association test as an educational tool: A mixed methods study”. In: Social Psychology of Education 17, pp. 419–438. Kastellec, Jonathan (2013). “Racial diversity and judicial influence on appellate courts”. In: American Journal of Political Science 56.1, pp. 167–183. Kenyan Judiciary (2021). Courts: Overview. url: https://www.judiciary.go.ke/courts/. KNBS (2019). 2019 Kenya Population and Housing Census Volume IV: Distribution of Population by Socio- Economic Characteristics. Knepper, Matthew (2018). “When the shadow is the substance: Judge gender and the outcomes of workplace sex discrimination cases”. In: Journal of Labor Economics 36.3, pp. 623–664. Kozlowski, Austin, Matt Taddy, and James Evans (2019). “The geometry of culture: Analyzing the meanings of class through word embeddings”. In: American Sociological Review 84.5, pp. 905–949. Landes, William, Lawrence Lessig, and Michael Solimine (1998). “Judicial influence: A citation analysis of federal courts of appeals judges”. In: The Journal of Legal Studies 27.2, pp. 271–332. Lim, Claire, Bernardo Silveira, and James Snyder (2016). “Do judges’ characteristics matter? ethnicity, gender, and partisanship in Texas state trial courts?” In: American Law and Economics Review 18.2, pp. 302–357. Miguel, Edward and Mary Kay Gugerty (2005). “Ethnic diversity, social sanctions, and public goods in Kenya”. In: Journal of Public Economics 89.11-12, pp. 2325–2368. Mikolovand, Tomas et al. (2013). “Distributed Representations of Words and Phrases and their Composi- tionality”. In: Advances in neural information processing systems 26. Mustard, David (2001). “Racial, Ethnic, and Gender Disparities in Sentencing: Evidence from the U.S. Federal Court”. In: The Journal of Law and Economics 44.1, pp. 285–314. Mutunga, Willy (2011). Progress Report On The Transformation Of The Judiciary. url: http://kenyalaw. org/kenyalawblog/progress-report-on-the-transformation-of-the-judiciary/. Pennington, Jeffrey, Richard Socher, and Christopher Manning (2014). “Glove: Global vectors for word rep- resentation”. In: Proceedings of the 2014 conference on empirical methods in natural language processing. Ponticelli, Jacopo and Leonardo Alencar (2016). “Court enforcement, bank loans, and firm investment: Evidence from a bankruptcy reform in Brazil”. In: The Quarterly Journal of Economics 131.3, pp. 1365– 1413. Rodrik, Dani (2000). “Institutions for high-quality growth: what they are and how to acquire them”. In: Studies in comparative international development 35.3, pp. 3–31. Schanzenbach, Max (2014). “Racial and Gender Disparities in Prison Sentences: The Effect of District-Level Judicial Demographics”. In: American Law and Economics Association Annual Meetings. Shayo, Moses and Asaf Zussman (2011). “Judicial ingroup bias in the shadow of terrorism”. In: The Quarterly Journal of Economics 126.3, pp. 1447–1484. Sloan, CarlyWill (2020). “Racial bias by prosecutors: Evidence from random assignment”. In: Working paper. Spirling, Arthur and Pedro Rodriguez (2019). “Word embeddings: What works, what doesn’t, and how to tell the difference for applied research”. In: Journal of Politics. Sunstein, Cass et al. (2007). Are judges political? An empirical analysis of the federal judiciary. Brookings Institution Press. UNDP (2020). Gender Inequality Index. url: http://hdr.undp.org/en/content/gender-inequality- index-gii. Visaria, Sujata (2009). “Legal reform and loan repayment: The microeconomic impact of debt recovery tribunals in India”. In: American Economic Journal: Applied Economics 1.3, pp. 59–81. Voci, Alberto (2010). “The link between identification and in-group favouritism: Effects of threat to social identity and trust-related emotions”. In: British Journal of Social Psychology 45.2, pp. 265–284. Wann, Daniel and Frederick Grieve (2005). “Biased Evaluations of In-Group and Out-Group Spectator Behavior at Sporting Events: The Importance of Team Identification and Threats to Social Identity”. In: The journal of social psychology 145.5, pp. 531–546. World Bank (2017). World Development Report: Governance and the Law. The World Bank Group. 27 World Bank (2021). Ease of Doing Business in Kenya. url: https://www.doingbusiness.org/en/data/ exploreeconomies/kenya#DB_ec. Yang, Crystal (2015). “Free at Last? Judicial Discretion and Racial Disparities in Federal Sentencing”. In: The Journal of Legal Studies 44.1, pp. 75–111. 28 Appendix Appendix A: Variable Construction Constructing variables with judge, defendant, and plaintiff information The names of judges, defendants, and plaintiffs were used to remove non-humans and to extract additional information for each case, including gender, ethnicity, and the number of judges and litigants. Cases were identified as non-human and removed if either the plaintiff or defendant name included any of a long list of key words, such as “republic,” “company,” or “medical.” A full list of the keywords can be found in the cleaning scripts in the replication materials posted online. Afterwards, we could determine the gender of each individual using their first name and the ethnicity of each individual using their last name. To assign gender based on first names we used the genderize.io API and Gender API, both of which use global databases of names and genders to probabilistically assign gender to names.17 One exception was for the judges, for whom gender was assigned manually. To assign ethnicity based on last names, we used data available on Harvard Dataverse that links names to ethnicities (Andrew Harris 2014). This data could be used to identify 12 ethnic groups (Meru, Kisii, Kalenjin, Kamba, Luo, Turkana, Mijikenda, Luhya, Kikuyu, Somali, Masai, and Pokot). This includes one ethnic sub-group, the Pokot, which is a sub-group of the Kalenjin. Throughout our analysis, Kalenjin refers to non-Pokot Kalenjin. Together, these groups account for about 91 percent of the population of Kenya. Of the other 29 major ethnic groups (i.e., non-subgroups) identified in the 2019 census, the largest group accounts for only about 0.9 percent of the population. Gender and ethnicity could not be determined for all individuals in all cases. Gender could not be determined if the first name was either abbreviated (i.e., if only initials were given), it did not clearly match to a single gender, or it was not included in the API datasets. Ethnicity could not be determined if the last name was not included in the ethnicity dataset. For some of the cases included in analysis, information could be extracted for plaintiffs but not defendants (and vice versa) and for gender but not ethnicity (and vice versa). It is important to note that there is the possibility of a small amount of error resulting from the automated process of removing non-humans and determining gender and ethnicity. For example, although the list of key words for non-humans is long and we have manually scanned the data for non-humans, it is still possible that some non-humans remain. It is also possible that gender and/or ethnicity has been assigned to non-humans with certain key words included in the organization name. Similarly, if names were separated in an unusual way, it is possible that the number of defendants or plaintiffs was incorrectly counted, possibly resulting in an incorrect assignment of majority/plurality gender/ethnicity. However, having thoroughly scanned the data, we are confident that the number of such errors is insignificant. Using the Binary Classification Machine Learning Model to construct the defendant_win outcome variable To determine the winner of each case, we created a Binary Classification Machine Learning Model using the Global Vectors for Word Representation (GloVe) algorithm (Pennington, Socher, and Manning 2014). The objective function of GloVe can be written as follows: P t J (w ) = f (Xij )(wi wj logXij ))2 (7) where Xij denotes the co-occurrence count between words i and j , and f (·) is a weighting function that serves to down-weight particularly frequent words. The objective function J (·) trains the word vectors to minimize the squared difference between the dot product of the vectors representing two words and their empirical co-occurrence in the corpus. The algorithm requires two hyperparameters, dimensionality of the vectors and the window size for computing co-occurrence statistics. Prior research has found 300 to be the optimum size in many a cases and that increasing dimensionality beyond 300 has negligible improvements for downstream tasks (Pennington, Socher, and Manning 2014; Spirling and Rodriguez 2019). Following that 17 See the following websites: https://genderize.io/; https://gender-api.com/. 29 Table A1: Model outcomes Training set accuracy 92.44% Validation set accuracy 91.92% Test set accuracy (on previously unseen data) 92.83% Accuracy 0.928388 Precision 0.896705 Recall 0.959647 F1 score 0.927109 literature, we train 300 dimensional vectors. We used a standard 10-word window size, in between a shorter window size (which tends to capture syntactic/functional relations between words) and a longer window size (which tends to capture topical relations between words). To improve accuracy, the classification model was also comprised of a Long Short-Term Memory layer in addition to the fully connected neural network layers and the initial embedding layer (Hochreiter and Schmidhuber 1997). Applying this model to our data, we used the bottom 500 words of the case judgements, since the outcomes were found to be present towards the bottom of the judgements. As a training dataset, we applied the model to cases for which we could determine the outcome (in favor or against the defendant) directly from the case outcome variable of the metadata. There were 49,706, 6,214, and 6,213 cases in the training, testing, and validation sets, respectively. The results of the model are presented in table A1. Using word embeddings to determine textual slant To determine each judge’s textual gender slant (i.e., the degree to which each judge exhibits gender bias in their written judgements), we make use of word embeddings, which model the text present in the judgements in the form of low dimensional Euclidean space vectors (Pennington, Socher, and Manning 2014). In other words, word embeddings are low dimensional vectors which can accommodate large vocabularies and corpora without increasing dimensionality. The representation resulting from them captures relations between the words. In order to catch semantic similarity amongst words, the positions are assigned to word vectors in the Euclidean space, such that the words that appear frequently in the same context have representations close to each other in the space, while words that appear rarely together have representations that are far apart. To train our word embeddings, we used the GloVe algorithm, described above. The embeddings we trained were then used for identification of cultural dimensions in language (Kozlowski, Taddy, and Evans 2019). That is, we identified a gender dimension by taking the difference between the average normalized vector across a set of male words and the average normalized vector across a set of female words, as such: X X ~ male ~ f emale = ~ n malewordn / | Nmale | + ~ n f emalewordn / | Nf emale | where Nmale is the number of words used to identify the male dimension. In order to determine the similarity within these dimensions, we used cosine similarity as a measure, defined as follows: sim(~ x, ~ x ) /( k ~ x·~ y ) = cos(✓) = (~ y k) x kk ~ x and ~ where ~ y are non-zero vectors, ✓ is the associated angle, and k · k is the 2-norm. Therefore, we can see that words with male (female) connotations are going to be positively (negatively) correlated with the gender dimension defined by male ~ ~ . f emale These dimensions were then used to construct the gender slant measures. For the first, we aimed to capture the strength of the association between gender and stereotypical attitudes, which identify men more closely with careers and women with family. Specifically, we used the cosine similarity between the vector representing the gender dimension, defined by male ~ ~ , and the vector representing the career-family f emale dimension, defined by career ~ ~ . For our second measure, we aimed to capture stereotypical attitudes f amily that associate men with “good” and women with “bad” words. For this measure, instead of career ~ ~ , f amily ~ we used good ~ . bad 30 ~ For the male ~ f emale dimension, we used various gender-specific words which were found out to be the five most frequently occurring in our corpus. Words for career ~ ~ f amily ~ and good ~ were chosen in a bad similar fashion. Only five words were chosen for each because, given the relatively small size of the corpus, the inclusion of too many words could result in invalid measures of slant. The word used are displayed in table A2. Table A2: Words used for each vector dimension Vector dimension Words M aleN~ ames john, joseph, peter, james, david ~ ames F emaleN faith, mary, rose, jane, margaret M~ale his, he, him, mr, himself ~ F emale her, she, ms, mrs, herself ~ Good competent, strong, power, serious, professional ~ Bad frivolous, vain, incompetent, unreasonable, incapable ~ Career company, service, pay, business, work ~ F amily family, wife, mother, father, brother Each dimensions includes the five most common relevant words in the corpus. Only five words were chosen for each because, given the relatively small size of the corpus, the inclusion of too many words could results in invalid measures of slant. To apply this process to the data, we first preprocessed the entire Kenya Law corpus of judgements by removing punctuations (but retaining hyphenated words). To avoid case sensitivity, we transformed all our words to lower case. We then retained only the most common 50,000 words in all judicial opinions. To obtain judge-specific gender slant measures, we took the set of majority opinions authored by each judge as a separate corpus and trained separate GloVe embeddings on each judge’s corpus. To ensure convergence, we trained vectors for 20 iterations with a learning rate of 0.05. Since each judge might not have a sufficiently large number of tokens, we follow the approach suggested by Antoniak and Mimno (2018) and train embedding models on 25 bootstrap samples of each judge corpus. Specifically, we consider each sentence written by a judge as a document and then create a corpus by sampling with replacement from all sentences. The number of sentences contained in the bootstrapped sample is the same as the total number of sentences in the original judge corpus. We then calculate our slant measure for all bootstrap samples and assign to each judge the median value of the measure across the samples. Given that embeddings trained on small corpora tend to be sensitive to the inclusion of specific documents, the bootstrap procedure produces more stable results. In addition, bootstrapping ensures stability with respect to the initialization of the word vectors–a potential concern given that GloVe presents a non-convex objective function (Spirling and Rodriguez 2019). The two variables resulting from this process are Median slant, career vs. family and Median slant, good vs. bad. For both measures, positive values indicate greater slant against women. To validate that the embeddings capture meaningful information about gender, after following the boot- strapping procedure, we compute the cosine similarity between the gender dimension and each of the vectors representing the five most common male and female names for each judge and bootstrap sample. We then regress a dummy for whether the name is male on the median cosine similarity between the vector repre- senting the name and the gender dimension across bootstrap samples, separately for each judge. Figure A1 shows the cumulative distribution of the t-statistics resulting from these regressions for sets of judges with different numbers of tokens. It shows that most t-statistics are significant (and they are never lower than zero). This shows that the gender dimension identified in the embeddings does indeed contain meaningful gender information. Constructing other textual variables To create the measure of the number of cases cited in the text, we extracted a window of 10 words (5 on each side) around the words v, vs, and ndashvs (because sometimes HTML elements from the website are included in the text), which were found to be a common way of citing other judgements. This window of 10 words was then cleaned to produce the final cited judgements. A similar process was used to cite the 31 Figure A1: Cumulative distribution of t-statistics from regressions testing the validity of the word embeddings The vertical line indicates T-stat=1.96, for significance at p<0.05. T-statistics are from regressions between a dummy for whether the name is male on the median cosine similarity between the vector representing the name and the gender dimension across bootstrap samples, separately for each judge. number of laws and acts cited in a case. Once we had the information on citations in each case, we were able to also determine the number of times each case in the dataset was cited. 32 Appendix B: Variable Summaries Table B1: Summary of main variables count mean sd min max Def. win 29373 .4290675 .4949514 0 1 Judge maj. female 28627 .3650051 .48144 0 1 Pla. maj. female 26418 .2499432 .4329881 0 1 Def. maj. female 23555 .2379962 .4258658 0 1 Judge-plaintiff same ethnicity 21958 .1321159 .3386244 0 1 Judge-defendant same ethnicity 21094 .1257704 .3315982 0 1 This is an appeal case 29373 .2739591 .4459958 0 1 This case is appealed 29373 .0176352 .1316238 0 1 Decision is reversed in the appeal 518 .507722 .5004236 0 1 Number of defendants 29373 1.584619 1.440699 1 68 Number of plaintiffs 29373 1.315528 1.134249 1 65 Number of judges 29373 1.108944 .4580071 1 9 Median slant, career v family 26113 -.0283425 .0988901 -.2812362 .30826 Median slant, good v bad 22023 .0615131 .0548388 -.0875989 .2815675 Case type: civil 28512 .4645062 .4987473 0 1 Case type: tax 28512 .0023148 .0480576 0 1 Case type: human rights 28512 .0012626 .0355116 0 1 Case type: judicial review 28512 .000947 .0307588 0 1 Case type: divorce 28512 .0019992 .044668 0 1 Case type: election 28512 .0018589 .0430752 0 1 Case type: labor relations 28512 .0166947 .1281272 0 1 Case type: environment and land 28512 .3231622 .4676923 0 1 Case type: family 28512 .0068042 .0822076 0 1 Case type: industrial 28512 .0033319 .0576276 0 1 Case type: miscellaneous 28512 .0818603 .2741565 0 1 Case type: succession 28512 .0952581 .2935763 0 1 Number of cases cited in judgement 29373 1.926089 3.534747 0 87 Times judgement cited 29373 .23365 1.938323 0 109 Laws cited in judgement 29373 2.203793 4.106973 0 146 Words in judgement 29373 1451.541 1337.457 0 42980 33 Table B2: Summary of main variables, count only count Court ID 29373 Year of delivery 29373 Court-year FE 29373 Plurality ethnicity of plaintiffs 24356 Plurality ethnicity of defendants 23415 Plurality ethnicity of judges 26425 34 Appendix C: Additional Descriptive Statistics Table C1: Frequency of court types in the dataset Frequency Court of appeal 1660 Employment and labor relations 1081 Environment and land court 8619 High court 17865 Other 124 Supreme court 24 Total 29373 Other includes Election Petition in Magistrate Courts, the Judges and Magistrates Vetting Board, Kadhis Courts, and the National Environment Tribunal Figure C1: Total number of cases, by majority gender and role in the case 35 Figure C2: Proportion of cases over time with majority female judges, defendants, and plaintiffs Figure C3: Proportion of female majorities, by case type and role in the case See appendix B for list of case types included in “other.” 36 Figure C4: Ethnicities as a proportion of total cases (by role in the case) and the total population in Kenya Proportions of the total population are derived from the 2019 census. Specific ethnicities are intentionally masked. 37 Appendix D: Balance Tests, Before and After 2011 Table D1: Gender randomization checks (1) (2) (3) Judge maj. female Judge maj. female Judge maj. female Pla. maj. female 0.0114 0.0116 0.00739 (0.00863) (0.00860) (0.00679) Def. maj. female 0.00351 0.00335 -0.000446 (0.00764) (0.00755) (0.00651) DV mean 0.363 0.363 0.363 Court-year FE Yes Yes Yes Ethnicity dummies No Yes Yes Other controls No No Yes Observations 20394 20394 20394 The regressions test whether female plaintiffs/defendants are more likely to be matched with female judges than male judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 1. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, maj. = majority. 38 Table D2: Gender randomization checks, before 2011 (1) (2) (3) Judge maj. female Judge maj. female Judge maj. female Pla. maj. female 0.0222 0.0210 0.0115 (0.0190) (0.0189) (0.0154) Def. maj. female -0.00690 -0.00613 -0.0115 (0.0137) (0.0133) (0.0111) DV mean 0.298 0.298 0.298 Court-year FE Yes Yes Yes Ethnicity dummies No Yes Yes Other controls No No Yes Observations 4717 4717 4717 The regressions test whether female plaintiffs/defendants are more likely to be matched with female judges than male judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 1. Sample is restricted to the years 1976-2010. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = Plaintiffs, Def. = defendants, maj. = majority. Table D3: Gender randomization checks, 2011 and after (1) (2) (3) Judge maj. female Judge maj. female Judge maj. female Pla. maj. female 0.00827 0.00850 0.00514 (0.00968) (0.00951) (0.00735) Def. maj. female 0.00674 0.00617 0.00325 (0.00845) (0.00832) (0.00719) DV mean 0.382 0.382 0.382 Court-year FE Yes Yes Yes Ethnicity dummies No Yes Yes Other controls No No Yes Observations 15677 15677 15677 The regressions test whether female plaintiffs/defendants are more likely to be matched with female judges than male judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 1. Sample is restricted to the years 2011-2020. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = Plaintiffs, Def. = defendants, maj. = majority. 39 Table D4: Ethnicity randomization checks 1 (1) (2) (3) Judge plur. Kalenjin Judge plur. Kamba Judge plur. Kikuyu Pla. plur. Kalenjin 0.00699 (0.00894) Def. plur. Kalenjin -0.00933 (0.0103) Pla. plur. Kamba -0.00785 (0.00651) Def. plur. Kamba 0.00337 (0.00771) Pla. plur. Kikuyu 0.00640 (0.00760) Def. plur. Kikuyu 0.00167 (0.00759) DV mean 0.0895 0.125 0.275 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 14594 14594 14944 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 40 Table D5: Ethnicity randomization checks 2 (1) (2) (3) Judge plur. Kisii Judge plur. Luhya Judge plur. Luo Pla. plur. Kisii -0.0111 (0.00753) Def. plur. Kisii 0.00201 (0.00795) Pla. plur. Luhya 0.00489 (0.00674) Def. plur. Luhya 0.0112⇤ (0.00589) Pla. plur. Luo 0.000715 (0.00907) Def. plur. Luo 0.00195 (0.0104) DV mean 0.0765 0.174 0.173 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 14944 14944 14944 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 41 Table D6: Ethnicity randomization checks 3 (1) (2) (3) Judge plur. Masai Judge plur. Meru Judge plur. Mijikenda Pla. plur. Masai -0.000333 (0.00218) Def. plur. Masai 0.000336 (0.000782) Pla. plur. Meru 0.00203 (0.00313) Def. plur. Meru -0.000889 (0.00268) Pla. plur. Mijikenda -0.000673 (0.00299) Def. maj. Mijikenda 0.00369 (0.00408) DV mean 0.00589 0.0260 0.0110 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 14944 14944 14944 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 42 Table D7: Ethnicity randomization checks 4 (1) (2) (3) Judge plur. Pokot Judge plur. Somali Judge plur. Turkana Pla. plur. Pokot -0.00444 (0.00458) Def. plur. Pokot 0.00327 (0.00832) Pla. plur. Somali 0.00474 (0.00516) Def. plur. Somali -0.00715 (0.00518) Pla. plur. Turkana 0.0000974 (0.000118) Def. plur. Turkana -0.000940 (0.00102) DV mean 0.00482 0.00964 0.000468 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 14944 14944 14944 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 43 Table D8: Ethnicity randomization checks, before 2011, 1 (1) (2) (3) Judge plur. Kalenjin Judge plur. Kamba Judge plur. Kikuyu Pla. plur. Kalenjin 0.000868 (0.00136) Def. plur. Kalenjin -0.00107 (0.00132) Pla. plur. Kamba -0.0483⇤⇤⇤ (0.0181) Def. plur. Kamba 0.0687⇤⇤⇤ (0.0233) Pla. plur. Kikuyu 0.00681 (0.0109) Def. plur. Kikuyu -0.00961 (0.0101) DV mean 0.0438 0.128 0.169 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 3106 3106 3106 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 1976-2010. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 44 Table D9: Ethnicity randomization checks, before 2011, 2 (1) (2) (3) Judge plur. Kisii Judge plur. Luhya Judge plur. Luo Pla. plur. Kisii -0.0345⇤⇤ (0.0164) Def. plur. Kisii -0.0140 (0.0159) Pla. plur. Luhya 0.0141 (0.0285) Def. plur. Luhya -0.00399 (0.0188) Pla. plur. Luo 0.0197 (0.0268) Def. plur. Luo -0.00110 (0.0267) DV mean 0.0802 0.216 0.221 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 3106 3106 3106 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 1976-2010. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 45 Table D10: Ethnicity randomization checks, before 2011, 3 (1) (2) (3) Judge plur. Masai Judge plur. Meru Judge plur. Mijikenda Pla. plur. Masai 0.000167 (0.000473) Def. plur. Masai 0.000227 (0.000414) Pla. plur. Meru 0.00370 (0.00939) Def. plur. Meru -0.0123 (0.00906) Pla. plur. Mijikenda -0.0129 (0.0129) Def. maj. Mijikenda 0.000787 (0.00325) DV mean 0.000966 0.0599 0.0167 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 3106 3106 3106 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 1976-2010. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 46 Table D11: Ethnicity randomization checks, before 2011, 4 (1) (2) (3) Judge plur. Pokot Judge plur. Somali Judge plur. Turkana Pla. plur. Pokot -0.0102 (0.00724) Def. plur. Pokot 0.0236 (0.0701) Pla. plur. Somali 0.0142 (0.0211) Def. plur. Somali -0.0308 (0.0226) Pla. plur. Turkana 0 (.) Def. plur. Turkana 0 (.) DV mean 0.0109 0.0174 0 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 3106 3106 3106 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 1976-2010. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 47 Table D12: Ethnicity randomization checks, 2011 and after, 1 (1) (2) (3) Judge plur. Kalenjin Judge plur. Kamba Judge plur. Kikuyu Pla. plur. Kalenjin 0.00864 (0.0108) Def. plur. Kalenjin -0.0113 (0.0123) Pla. plur. Kamba -0.000521 (0.00806) Def. plur. Kamba -0.00935 (0.00840) Pla. plur. Kikuyu 0.00681 (0.00987) Def. plur. Kikuyu 0.00469 (0.00970) DV mean 0.100 0.125 0.303 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 11838 11838 11838 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 2011-2020. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 48 Table D13: Ethnicity randomization checks, 2011 and after, 2 (1) (2) (3) Judge plur. Kisii Judge plur. Luhya Judge plur. Luo Pla. plur. Kisii -0.00434 (0.00743) Def. plur. Kisii 0.00863 (0.00983) Pla. plur. Luhya 0.00416 (0.00644) Def. plur. Luhya 0.0143⇤⇤ (0.00607) Pla. plur. Luo -0.00583 (0.00963) Def. plur. Luo 0.00269 (0.0106) DV mean 0.0755 0.163 0.160 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 11838 11838 11838 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 2011-2020. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 49 Table D14: Ethnicity randomization checks, 2011 and after, 3 (1) (2) (3) Judge plur. Masai Judge plur. Meru Judge plur. Mijikenda Pla. plur. Masai -0.000695 (0.00261) Def. plur. Masai 0.000584 (0.000898) Pla. plur. Meru 0.00183 (0.00265) Def. plur. Meru 0.00195 (0.00287) Pla. plur. Mijikenda 0.00181 (0.00255) Def. maj. Mijikenda 0.00498 (0.00480) DV mean 0.00718 0.0171 0.00946 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 11838 11838 11838 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 2011-2020. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 50 Table D15: Ethnicity randomization checks, 2011 and after, 4 (1) (2) (3) Judge plur. Pokot Judge plur. Somali Judge plur. Turkana Pla. plur. Pokot -0.00227 (0.00281) Def. plur. Pokot -0.000560 (0.00171) Pla. plur. Somali 0.00220 (0.00364) Def. plur. Somali -0.00161 (0.00244) Pla. plur. Turkana 0.000157 (0.000191) Def. plur. Turkana -0.00111 (0.00120) DV mean 0.00321 0.00760 0.000591 Court-year FE Yes Yes Yes Ethnicity dummies Other controls Yes Yes Yes Observations 11838 11838 11838 The regressions test whether female plaintiffs/defendants are more likely to be matched with judges of their own ethnicity than judges of other ethnicities.Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 2. Sample is restricted to the years 2011-2020. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 51 Appendix E: Robustness Checks Table E1: Gender results, 2010 and after (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge maj. female -0.0380⇤⇤⇤ -0.0397⇤⇤ -0.0468⇤⇤⇤ -0.0410⇤⇤⇤ -0.0411⇤⇤⇤ (0.0122) (0.0197) (0.0133) (0.0132) (0.0133) Pla. maj. female -0.0296⇤⇤ -0.0528⇤⇤⇤ -0.0426⇤⇤⇤ -0.0433⇤⇤⇤ (0.0128) (0.0109) (0.0108) (0.0108) Def. maj. female -0.00495 0.00144 0.00706 0.00667 (0.0106) (0.0108) (0.0107) (0.0107) Judge maj. fem. X pla. maj. fem. 0.0173 0.00690 0.00655 0.00726 (0.0208) (0.0171) (0.0169) (0.0168) Judge maj. fem. X def. maj. fem. 0.0359⇤⇤ 0.0400⇤⇤ 0.0379⇤⇤ 0.0380⇤⇤ (0.0167) (0.0168) (0.0168) (0.0167) DV mean 0.452 0.429 0.454 0.454 0.454 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No No Yes Other controls No No No Yes Yes Observations 22801 25618 20394 20394 20394 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) majority gender as judges. The coefficients of interest are on the interaction terms. Years before 2010 are dropped. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equations 3 and 4. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant, maj. = majority. 52 Table E2: Gender results, robust standard errors (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge maj. female -0.0380⇤⇤⇤ -0.0397⇤⇤⇤ -0.0468⇤⇤⇤ -0.0410⇤⇤⇤ -0.0411⇤⇤⇤ (0.0106) (0.00956) (0.0122) (0.0122) (0.0122) Pla. maj. female -0.0296⇤⇤⇤ -0.0528⇤⇤⇤ -0.0426⇤⇤⇤ -0.0433⇤⇤⇤ (0.00918) (0.0102) (0.0102) (0.0102) Def. maj. female -0.00495 0.00144 0.00706 0.00667 (0.00986) (0.0105) (0.0105) (0.0105) Judge maj. fem. X pla. maj. fem. 0.0173 0.00690 0.00655 0.00726 (0.0148) (0.0164) (0.0164) (0.0164) Judge maj. fem. X def. maj. fem. 0.0359⇤⇤ 0.0400⇤⇤ 0.0379⇤⇤ 0.0380⇤⇤ (0.0163) (0.0172) (0.0171) (0.0171) DV mean 0.452 0.429 0.454 0.454 0.454 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No No Yes Other controls No No No Yes Yes Observations 22801 25618 20394 20394 20394 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) majority gender as judges. The coefficients of interest are on the interaction terms. Includes robust standard errors. All columns are based on a linear regression model. For specification details, see equations 3 and 4. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown.Pla. = plaintiff, def. = defendant, maj. = majority. 53 Table E3: Gender results, probit (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge maj. female -0.0990⇤⇤⇤ -0.105⇤ -0.123⇤⇤⇤ -0.108⇤⇤⇤ -0.108⇤⇤⇤ (0.0315) (0.0539) (0.0344) (0.0344) (0.0346) Pla. maj. female -0.0779⇤⇤ -0.139⇤⇤⇤ -0.113⇤⇤⇤ -0.115⇤⇤⇤ (0.0344) (0.0289) (0.0287) (0.0287) Def. maj. female -0.0135 0.00311 0.0187 0.0177 (0.0275) (0.0282) (0.0282) (0.0282) Judge maj. fem. X pla. maj. fem. 0.0469 0.0186 0.0170 0.0193 (0.0562) (0.0450) (0.0446) (0.0445) Judge maj. fem. X def. maj. fem. 0.0944⇤⇤ 0.106⇤⇤ 0.100⇤⇤ 0.100⇤⇤ (0.0431) (0.0436) (0.0438) (0.0437) DV mean 0.452 0.429 0.454 0.454 0.454 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No No Yes Other controls No No No Yes Yes Observations 22595 25397 20162 20162 20162 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) majority gender as judges. The coefficients of interest are on the interaction terms. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a probit regression model. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for defendants, plaintiffs, and judges. Other controls include case type dummies, a dummy for an appeal case, and variables for the numbers of defendants, plaintiffs, and judges. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown.Pla. = plaintiff, def. = defendant, maj. = majority. Table E4: Ethnicity results, no Kamba or Luhya (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge-pla. same 0.0103 -0.00517 0.00163 0.00224 (0.0153) (0.0170) (0.0206) (0.0207) Judge-def. same 0.0446⇤⇤⇤ 0.0530⇤⇤⇤ 0.0481⇤⇤ 0.0502⇤⇤ (0.0145) (0.0179) (0.0235) (0.0235) DV mean 0.450 0.451 0.452 0.452 0.452 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No Yes Yes Other controls No No No No Yes Observations 11368 10987 9741 9741 9741 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) plurality ethnicity as judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 5. Judge-pla. same and Judge-def. same refer to similarity in plurality ethnicity. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for both defendants and plaintifs. Other controls include case type dummies; a dummy for an appeal case; variables for the numbers of defendants, plaintiffs, and judges; and dummies for defendant, plaintiff, and judge majority gender. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant. 54 Table E5: Ethnicity results, 2010 and after (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge-pla. same 0.00582 -0.0143 -0.00466 -0.00424 (0.0142) (0.0153) (0.0169) (0.0170) Judge-def. same 0.0456⇤⇤⇤ 0.0589⇤⇤⇤ 0.0533⇤⇤⇤ 0.0561⇤⇤⇤ (0.0137) (0.0157) (0.0178) (0.0176) DV mean 0.447 0.452 0.451 0.451 0.451 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No Yes Yes Other controls No No No No Yes Observations 17878 17194 15555 15555 15555 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) plurality ethnicity as judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 5. Judge-pla. same and Judge-def. same refer to similarity in plurality ethnicity. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for both defendants and plaintifs. Other controls include case type dummies; a dummy for an appeal case; variables for the numbers of defendants, plaintiffs, and judges; and dummies for defendant, plaintiff, and judge majority gender. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant. Years before 2010 are dropped Table E6: Ethnicity results, robust standard errors (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge-pla. same 0.00681 -0.0138 -0.00549 -0.00578 (0.0115) (0.0142) (0.0151) (0.0151) Judge-def. same 0.0439⇤⇤⇤ 0.0573⇤⇤⇤ 0.0532⇤⇤⇤ 0.0551⇤⇤⇤ (0.0120) (0.0145) (0.0155) (0.0155) DV mean 0.450 0.453 0.453 0.453 0.453 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No Yes Yes Other controls No No No No Yes Observations 21842 20971 18952 18952 18952 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) plurality ethnicity as judges. Includes robust standard errors. All columns are based on a linear regression model. For specification details, see equation 5. Judge-pla. same and Judge-def. same refer to similarity in plurality ethnicity. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for both defendants and plaintifs. Other controls include case type dummies; a dummy for an appeal case; variables for the numbers of defendants, plaintiffs, and judges; and dummies for defendant, plaintiff, and judge majority gender. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant. Years before 2010 are dropped 55 Table E7: Ethnicity results, probit (1) (2) (3) (4) (5) Def. win Def. win Def. win Def. win Def. win Judge-pla. same 0.0166 -0.0391 -0.0173 -0.0183 (0.0330) (0.0366) (0.0401) (0.0406) Judge-def. same 0.115⇤⇤⇤ 0.153⇤⇤⇤ 0.142⇤⇤⇤ 0.148⇤⇤⇤ (0.0314) (0.0376) (0.0415) (0.0415) DV mean 0.451 0.455 0.455 0.455 0.455 Court-year FE Yes Yes Yes Yes Yes Ethnicity dummies No No No Yes Yes Other controls No No No No Yes Observations 21617 20759 18726 18726 18726 The regressions test whether defendants (plaintiffs) are more likely to win (lose) if they have the same (a different) plurality ethnicity as judges. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a probit regression model. Judge-pla. same and Judge-def. same refer to similarity in plurality ethnicity. Ethnicity dummies include binary variables indicating whether a given ethnicity is the plurality, one for each ethnicity, for both defendants and plaintifs. Other controls include case type dummies; a dummy for an appeal case; variables for the numbers of defendants, plaintiffs, and judges; and dummies for defendant, plaintiff, and judge majority gender. To prevent a loss of observations, all categorical controls (such as case type) include a dummy that denotes if data is missing/unknown. Pla. = plaintiff, def. = defendant. 56 Appendix F: Visualization of Gender and Ethnic In-group Bias Among Individ- ual Judges Figures F1 and F2 visualize the in-group bias trend for defendants. Based on a series of regressions, one for each individual judge, it plots the predicted win proportion when defendants have the same majority gender (ethnicity) as each judge in relation to the predicted win proportion when defendants have a different gender (ethnicity) than each judge. Each bubble in the graph represents a specific judge. Bubbles above the 45-degree line indicate that the judge has in-group bias. The darker the bubble is, the more significant the relationship is. The larger the bubble, the more observations there are. Finally, the plus sign represents the predicted win proportions from a regression that includes all of the judges depicted in the graph. Since it is above the line, it shows that there is, on average, in-group bias towards defendants among the judges. As depicted by the plus sign, the predicted win proportion when judges have the same gender as defen- dants is 0.454. When they have a different gender, it is 0.430, 0.024 less. These results are similar to the results from table 1. The predicted win proportion when judges have the same ethnicity as defendants is 0.486. When they have a different ethnicity, it is 0.442, 0.044 less. These results are similar to the results from table 4. Figure F1: Predicted defendant win proportion, by judge and by defendant similarity with judge gender def. = defendant, prop. = proportion. Each bubble indicates a specific judge. Only single judges are included, not judge panels. Judges without sufficient variation in outcomes were dropped. In total, 187 judges are included. The aggregate regression includes all single-judge panel observations, a total of 21,359. The outcome is significant at p < 0.01. Predictions are based on a regression with court-year fixed effects. 57 Figure F2: Predicted defendant win proportion, by judge and by defendant similarity with judge ethnicity def. = defendant, prop. = proportion. Each bubble indicates a specific judge. Only single judges are included, not judge panels. Judges without sufficient variation in outcomes were dropped. In total, 185 judges are included. The aggregate regression includes all single-judge panel observations, a total of 18,101. The outcome is significant at p < 0.01. Predictions are based on a regression with court-year fixed effects. 58 Appendix G: Effect of Putting Biased Judges on Panels Table G1: Results for significantly in-group gender biased judges, off and on panels (1) (2) Def. win Def. win Judge maj. female -0.189⇤⇤⇤ 0.130 (0.0447) (0.161) Def. maj. female -0.223⇤⇤⇤ -0.0501 (0.0523) (0.0804) Judge maj. fem. X def. maj. fem. 0.396⇤⇤⇤ 0.0797 (0.0605) (0.167) DV mean 0.430 0.665 Court-year FE Yes Yes Individual decisions Yes No Panel decisions No Yes Observations 1787 200 Sample is restricted to the 14 judges with significant gender in-group bias coefficients for defendants in individual regression Column 1 includes only cases where the judges ruled individually. Column 2 includes only cases where they ruled on panels. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equations 3 and 4. Pla. = plaintiff, def. = defendant, maj. = majority. 59 Appendix H: Relationship Between Slant and Appeals, and Slant and Reverals Table H1: Appeals and slant, family vs career (1) (2) (3) (4) (5) appealed appealed appealed appealed appealed Slant against women, career vs family -0.00499 -0.0211 0.0190 0.00413 -0.0665⇤⇤ (0.0155) (0.0182) (0.0250) (0.0160) (0.0268) Def. maj. female 0.00186 0.000931 (0.00278) (0.00431) Pla. maj. female 0.00311 0.00338 (0.00279) (0.00448) Def. maj. fem. X Slant against women 0.0300 -0.00766 (0.0246) (0.0387) Pla. maj. fem. X Slant against women -0.00585 -0.0113 (0.0221) (0.0408) DVmean 0.0184 0.0191 0.0187 0.0183 0.0197 Court-year FE Yes Yes Yes Yes Yes Restricted sample No No Yes No Yes Observations 26019 20751 11530 23480 9734 The regressions test whether slanted judges are more likely to have case decisions appealed.The coefficients of interest are ’Slant against women, family vs career’ and the interactions. Column 3 (5) restricts the sample to cases where the defendant (plaintiff ) loses, and the interaction tests if reverals are more likely if the judges is more slanted and the defendant (plaintiff ) is female. These cases have the most potentail for gender bias. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. 60 Figure H1: Relationship between judge slant against women (career vs family measure) and appeals Data points are binned and account for court-year fixed effects. 61 Table H2: Appeals and slant, good vs bad (1) (2) (3) (4) (5) appealed appealed appealed appealed appealed Slant against women, good vs bad 0.0421⇤ 0.0394 0.0199 0.0474 0.0267 (0.0253) (0.0349) (0.0420) (0.0333) (0.0594) Def. maj. female 0.00503 0.00472 (0.00431) (0.00715) Pla. maj. female 0.00448 0.00645 (0.00537) (0.00840) Def. maj. fem. X Slant against women -0.0376 -0.0341 (0.0515) (0.0710) Pla. maj. fem. X Slant against women -0.0175 -0.0457 (0.0629) (0.0902) DVmean 0.0199 0.0206 0.0203 0.0198 0.0217 Court-year FE Yes Yes Yes Yes Yes Restricted sample No No Yes No Yes Observations 21951 17316 9664 19785 8098 The regressions test whether slanted judges are more likely to have case decisions appealed.The coefficients of interest are ’Slant against women, good vs bad’ and the interactions. Column 3 (5) restricts the sample to cases where the defendant (plaintiff ) loses, and the interaction tests if reverals are more likely if the judges is more slanted and the defendant (plaintiff ) is female. These cases have the most potentail for gender bias. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. 62 Figure H2: Relationship between judge slant against women (good vs bad measure) and appeals Data points are binned and account for court-year fixed effects. 63 Table H3: Reversals and slant, family vs career (1) (2) (3) (4) (5) reversed reversed reversed reversed reversed Slant against women, career vs family 0.0103 0.00565 0.0369⇤⇤ 0.0146 -0.0380⇤⇤ (0.00966) (0.0120) (0.0167) (0.0109) (0.0170) Def. maj. female 0.000515 0.000841 (0.00212) (0.00342) Pla. maj. female 0.000890 0.00246 (0.00185) (0.00253) Def. maj. fem. X Slant against women 0.0213 0.00771 (0.0199) (0.0315) Pla. maj. fem. X Slant against women -0.00453 0.00322 (0.0143) (0.0214) DVmean 0.00949 0.00983 0.0116 0.00920 0.00760 Court-year FE Yes Yes Yes Yes Yes Restricted sample No No Yes No Yes Observations 26019 20751 11530 23480 9734 The regressions test whether slanted judges are more likely to have case decisions reversed. The coefficients of interest are ’Slant against women, career vs family’ and the interactions. Column 3 (5) restricts the sample to cases where the defendant (plaintiff ) loses, and the interaction tests if reverals are more likely if the judges is more slanted and the defendant (plaintiff ) is female. These cases have the most potentail for gender bias. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 64 Figure H3: Relationship between judge slant against women (career vs family measure) and reversals Data points are binned and account for court-year fixed effects. 65 Table H4: Reversals and slant, good vs bad (1) (2) (3) (4) (5) reversed reversed reversed reversed reversed Slant against women, good vs bad 0.0568⇤⇤⇤ 0.0505⇤⇤ 0.0461 0.0644⇤⇤⇤ 0.0420 (0.0169) (0.0241) (0.0355) (0.0212) (0.0264) Def. maj. female 0.00127 0.00144 (0.00240) (0.00427) Pla. maj. female 0.00312 0.00488 (0.00366) (0.00556) Def. maj. fem. X Slant against women -0.00465 -0.00295 (0.0305) (0.0446) Pla. maj. fem. X Slant against women -0.0317 -0.0395 (0.0418) (0.0623) DVmean 0.0102 0.0104 0.0125 0.0100 0.00852 Court-year FE Yes Yes Yes Yes Yes Restricted sample No No Yes No Yes Observations 21951 17316 9664 19785 8098 The regressions test whether slanted judges are more likely to have case decisions reversed. The coefficients of interest are ’Slant against women, good vs bad’ and the interactions. Column 3 (5) restricts the sample to cases where the defendant (plaintiff ) loses, and the interaction tests if reverals are more likely if the judges is more slanted and the defendant (plaintiff ) is female. These cases have the most potentail for gender bias. Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. Pla. = plaintiffs, def. = defendants, plur. = plurality, maj. = majority. 66 Figure H4: Relationship between judge slant against women (good vs bad measure) and reversals Data points are binned and account for court-year fixed effects. 67 Appendix I: Interaction Between Gender and Ethnicity In-groups Table I1: Ethnicity and gender interaction results (1) (2) (3) Def. win Def. win Def. win Judge-def. same gender 0.0183⇤ 0.0160 (0.00996) (0.0109) Judge-def. same ethnicity 0.0440⇤⇤ 0.0622⇤⇤⇤ (0.0176) (0.0189) Judge-def. same gender X Judge-def. same ethnicity -0.0131 -0.0224 (0.0225) (0.0237) Judge-pla. same gender 0.0194⇤⇤ 0.0143 (0.00912) (0.0119) Judge-pla. same ethnicity 0.0268 0.00874 (0.0174) (0.0176) Judge-pla. same gender X Judge-pla. same ethnicity -0.0260 -0.0261 (0.0240) (0.0288) DV mean 0.455 0.453 0.458 Court-year FE Yes Yes Yes Observations 17675 19172 14580 Standard errors, in parentheses, are clustered at the judge level. All columns are based on a linear regression model. For specification details, see equation 5. Pla. = plaintiff, def. = defendant, maj. = majority. 68