Policy Research Working Paper                 10597




Using Large Language Models for Qualitative
    Analysis can Introduce Serious Bias
                              Julian Ashwin
                             Aditya Chhabra
                             Vijayendra Rao




Development Economics
Development Research Group
November 2023
Policy Research Working Paper 10597


  Abstract
  Large Language Models (LLMs) are quickly becoming                                  that LLMs make in annotating interview transcripts are not
  ubiquitous, but the implications for social science research                       random with respect to the characteristics of the interview
  are not yet well understood. This paper asks whether LLMs                          subjects. Training simpler supervised models on high-qual-
  can help us analyse large-N qualitative data from open-                            ity human annotations with flexible coding leads to less
  ended interviews, with an application to transcripts of                            measurement error and bias than LLM annotations. There-
  interviews with displaced Rohingya people in Cox’s Bazaar,                         fore, given that some high quality annotations are necessary
  Bangladesh. The analysis finds that a great deal of caution                        in order to asses whether an LLM introduces bias, this
  is needed in using LLMs to annotate text as there is a risk                        paper argues that it is probably preferable to train a bespoke
  of introducing biases that can lead to misleading inferences.                      model on these annotations than it is to use an LLM for
  Here this refers to bias in the technical sense, that the errors                   annotation.




 This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may
 be contacted at vrao@worldbank.org.




          The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
          issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
          names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
          of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
          its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                        Produced by the Research Support Team
    Using Large Language Models for Qualitative Analysis can
                    Introduce Serious Bias∗
               Julian Ashwin                      Aditya Chhabra               Vijayendra Rao†
             Maastricht University                 World Bank                   World Bank




    Keywords: Large Language Models, Qualitative Analysis, ChatGPT, Llama 2, Text as Data,
                             Aspirations, Rohingya, Bangladesh

                                         JEL Codes: C63 C89 O1




1     Introduction
Large Language Models (LLMs) are increasingly being used in social science research to, among other
things, analyze and annotate text data (Gilardi et al., 2023). As LLMs become more accessible
and popular we can expect that there will be a temptation to use them to analyze open-ended
interview data such as those used by qualitative researchers (Small and Calarco, 2022) who follow
an interpretative analytical approach. This relies on careful, nuanced, coding conducted by trained
social scientists (Detering and Waters, 2018). Qualitative analysis of this kind lies at the core of
fields like anthropology and sociology, and there is now a rapidly expanding literature on the use of
Natural Language Processing (NLP) methods to analyze qualitative data in sociology (Bonikowski
and Nelson, 2022), and qualitative analysis and NLP are also now being increasingly employed in
more quantitative fields such as economics (Rao, 2023).
    Data generated from open-ended, in-depth, interviews is potentially very different from the benchmark
datasets often used in the NLP literature to validate modelling approaches such as English language
tweets and news, or product reviews. This is because qualitative research is often conducted in a
   ∗ The authors are grateful to the World Bank’s Knowledge for Change Program, and the World Bank-UNHCR Joint

Data Center on Forced Displacement for financial support. Sudarshan Aittreya provided valuable research assistance
for the project.
   † Corresponding author: vrao@worldbank.org
manner in which the specific context matters for interpreting the data, and analyzed with codes
that are "flexibly" developed that can be quite nuanced and complex. This is a particular problem
in non-Western societies because LLMs have been shown to most resemble people from Western,
Educated, Industrialized, Rich and Democratic (WEIRD) societies (Atari et al., 2023), and our
example application falls into this category. We have interviews on a very specific topic (children’s
aspirations) with a very specific population (Rohingya refugees and their hosts in Bangladesh) who
are not well represented in the training data that LLMs are trained on (or in the data used in the
NLP literature more broadly).
    We find that in such a context, using LLMs to annotate text is potentially dangerous. We test three
different LLMs (ChatGPT and two versions of Meta’s Llama 2) and find that the prediction errors
they make in annotation are not random with respect to the characteristics of the interview subject.
This can lead to misleading conclusions in later analysis, as we shown in Figure 5. Statistical analysis
based on LLM annotations can lead to estimated effects that are very different from those based
on human expert annotations. It is therefore crucial to have some high quality expert annotations,
even if it is just to assess whether the LLM is introducing bias or not. Given that some high quality
annotations are needed to assess whether the LLM introduces bias, we argue that it is preferable to
train a bespoke model on these annotations than it is to use an LLM.
    We show that iQual, a method we developed with others (Ashwin et al., 2022) to analyze large-N
qualitative data by training supervised models on small human annotated samples, not only performs
better than LLMs in terms of out-of-sample prediction accuracy but also introduces much less bias.
LLMs can possibly assist this process by generating larger training sets (i.e. data augmentation, as
proposed by Dai et al. (2023)) but we only find evidence of marginal benefits in a few cases. This
suggests a potential way in which to reconcile the nuance and "reflexive" qualities of interpretative
qualitative analysis with large representative samples. Crucially, we see LLMs and other NLP methods
as assisting and extending traditional qualitative analysis, not replacing it. In order to create a coding
tree that captures important and interesting variation across documents in a nuanced and context-
aware manner, there is no substitute for a careful reading on at least a subset of those documents.
    Our application is based on open-ended interviews with displaced Rohingya people and their
Bangladeshi hosts in Cox’s Bazaar, Bangladesh. These interviews focused on subjects aspirations and
ambitions for their children (Callard, 2018) as well as their capacity to achieve those goals, i.e. their
navigational capacity (Appadurai, 2004). They are analysed in detail in Ashwin et al. (2022), so we
will not discuss the detail of data collection or related social science literature here. The substance of
these interviews is not critical to the methodological contribution of this paper, but it is important
to note that while "ambition" can be captured well by structured questions that yield quantitative
data, aspirations and navigational capacity are subtle and complex concepts not easily defined are
captured in structured surveys. It is precisely when dealing with these sorts of concepts that open-
ended interviews and interpretative qualitative analysis is valuable. The complexity and nuance of
the concepts may play a role in explaining the poor performance of LLMs in annotating interviews
compared with other studies where the annotation tasks were substantially more straightforward, e.g.
Mellon et al. (2022).
    Previous work has suggested that LLMs might outperform crowd-sourced human annotations
(Gilardi et al., 2023), or even that a substantial proportion of workers on crowd-sourcing platform
may be using LLMs in completing tasks (Veselovsky et al., 2023). Our results do not contradict these
as for many annotation tasks LLMs may indeed perform very well and save researchers the expense and
complication of crowd-sourcing. However, our results do suggest that researchers ought to be aware
of the possibility of biases introduced by LLM annotation, particularly on data where a nuanced,
contextual understanding of the documents is needed; LLMs, like other types of machine learning
models, reflect the data they are trained on (Kearns and Roth, 2020) and many of the contexts in
which qualitative analysis adds value require an understanding of communities and concepts that may
not be adequately represented in this training data.
    The paper is structured as follows. The remainder of this Section discusses this paper’s contribution



                                                    2
in the context of related literature. Section 2 then very briefly introduces our dataset of annotated
interview transcripts. Section 3 describes our approach to using LLMs for annotation (3.1) and the
supervised NLP method introduced by Ashwin et al. (2022) which we refer to as iQual going forward
(3.2). Section 4 then describes LLM-based out-of-sample performance in comparison to iQual (4.1)
and then shows that LLMs introduce more bias and illustrates this could cause researchers to draw
incorrect conclusions (4.2). Section 5 then concludes.


2    Data and Qualitative Analysis
The interview transcripts, data collection and the qualitative coding process are explained in detail in
Ashwin et al. (2022), so we restrict ourselves to a very brief description here. The population we sample
are displaced Rohingyas based in the Cox’s Bazzar camp and local Bangladeshi residents. Along with
a standard household survey including questions on demographics and economic conditions, the data
include transcripts of 2,407 open-ended interviews with subjects on their aspirations for their eldest
child. The interviews were conducted either in Bengali or in Rohingya which was then transcribed
into Bengali, but we work with machine translations into English. The interviews take the form of
an unstructured to-and-fro of question and answer (QA) pairs the interviewer and the subject. The
interviews are on average 12.6 QA pairs long, with the average answer in each QA pair being 13.7
words long.
    Based on a close reading of a subset of transcripts, and following a "flexible coding" process
(Detering and Waters, 2018), a coding tree was developed including 25 potentially overlapping categories,
19 of which we focus on in this paper. A full description of each code along with examples are
shown in Appendix A. Following Callard (2018) the distinction between aspiration and ambition
was adapted within the context and nature of "dreams" parents expressed for their children. For
example, concrete and measurable dreams for child (e.g wishing a child would become a doctor, teacher,
entrepreneur, or specific educational goals) was used as a definition for ambition while intangible, value
oriented goals (e.g wishing the child to live with dignity or be a good human being) was classified
as aspiration. Aspirations, were divided into "Religious" and "Secular" . Ambition was divided
into seven major categories – Education (further sub-coded into High, Low, Neutral and Religious),
Job Secular, Marriage, Entrepreneurship, Migration, Vocational Training, and No Ambition. While
ambition and aspiration came up at any point in an interview, "capacity to aspire" or Navigational
capacity was restricted to discussions of what parents were planning or able to do to fulfill dreams for
their children. Navigational Capacity was coded into seven sub-codes – Low and High "Ability", Low
and High "Budget", Low and High "Information Awareness", and Reliance on God.
    Of our sample of 2,407 interview transcripts, 789 are manually annotated by trained sociologists
(co-authors on the Ashwin et al. (2022) paper) according to this coding structure. The annotations
are defined at the level of QA pairs, allowing us to represent each annotation as a binary classification
problem at the QA level.


3    Methods
In this Section we first explain how we use LLMs to annotated our interview transcripts. We then
briefly describe the iQual method which trains supervised models on our expert human annotations,
as well as how we use LLMs for data augmentation in combination with iQual. We test three different
LLMs- the closed-source ChatGPT (gpt-3.5-turbo) by OpenAI, as well as two open-source LLMs by
Meta, the Llama-2 (13b) and its fine-tuned "chat" variant (Touvron et al., 2023).The base Llama-2 is
pretrained on publicly available online data sources. The chat variant is then fine-tuned on publicly
available instruction datasets and over 1 million human annotations. This fine-tuning is designed to
make the model align with human preferences using techniques such as Reinforcement Learning with
Human Feedback (RLHF). ChatGPT is also fine tuned using RLHFFor all three LLMs, our approach


                                                   3
                                       Figure 1: LLM instructions example
  Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
  aspirations concerning their children’s future education and careers solely in the context of religion, without implying
  any monetary incentives. Specifically, your role is to discern whether the parent explicitly expresses a desire for their
  child to pursue a religious path.
  Assign the “Religious Aspirations” label only if the parent explicitly articulates aspirations for their child
  that include becoming a religious scholar, Hafiz, attending a madrassa, learning Arabic, Quran reading, Islamic
  covering, regular prayer, working in Islamic banks, or being a Islamic scholar. It is crucial not to assign this label
  if the child is already engaged in any of the mentioned activities, as current religious activities are not considered
  future aspirations. If there is no such direct reference to religious aspirations in the parent’s statement, assign the
  “Not Applicable” label to indicate that the topic of religious aspirations was not addressed.
  Here are some examples:
  ### Input:
  Interviewer: What are your aspirations for your son’s future? Respondent: I hope that he will become a great Islamic
  scholar. I want him to learn Arabic and read the Quran regularly.
  ### Output:
  Interpretation: The parent explicitly expresses a desire for their child to pursue a path in religious education,
  specifically mentioning becoming an Islamic scholar, learning Arabic, and reading the Quran regularly. Therefore,
  this conversation aligns with the ‘Religious Aspirations’ label. Label: Religious Aspirations
  ### Input:
  Interviewer: Do you have any specific dreams for your daughter? Respondent: I want her to excel in her studies and
  become a successful engineer.
  ### Output:
  Interpretation: While the parent does express aspirations for their child’s future, these aspirations are not religious
  in nature. The parent hopes for their child to become an engineer, which doesn’t involve any religious activities as
  per the instruction. Therefore, this conversation should be labeled as ’Not Applicable’. Label: Not Applicable
  ### Input:
  Interviewer: What do you envision for your child’s future? Respondent: I would like my son to become a Hafiz. It’s
  a great honor in our religion.
  ### Output:
  Interpretation: The parent explicitly expresses a desire for their child to become a Hafiz, which is a religious
  aspiration as per the instruction. Therefore, this conversation aligns with the ‘Religious Aspiration’ label. Label:
  Religious Aspirations
  ### Input:
  Interviewer: What are your hopes for your children’s future careers? Respondent: I want them to have successful
  careers. I hope my daughter becomes a doctor and my son becomes a lawyer.
  ### Output:
  Interpretation: While the parent does express aspirations for their children’s future, these aspirations do not involve
  any religious activities or pursuits as per the instruction. Therefore, this conversation should be labeled as ‘Not
  Applicable’. Label: Not Applicable


  ### Input:
  Interviewer: What is your dream about your son who is studying in class one? Respondent: Still in Noorani. After
  reading Noorani, he will read Hafezi. By the grace of Allah, he will teach in a madrasa when he becomes a hafez.
  That is my dream.

  ### Output: (ChatGPT)
  Interpretation: The respondent expresses a clear religious aspiration for their child, which is to become a Hafiz and
  teach in a madrasa. There is no mention of any monetary incentives, so this response aligns with the criteria for
  "Religious Aspiration". Label: Religious Aspiration

  ### Output: (Llama-2-13b)
  Interpretation: The parent explicitly expresses a desire for their child to pursue a path in religious education,
  specifically mentioning becoming a Hafiz and teaching in a madrassa. Therefore, this conversation aligns with the
  "Religious Aspiration" label. Label: Religious Aspiration

  ### Output: (Llama-2-13b-chat)
  Interpretation: The parent explicitly expresses a desire for their child to pursue religious studies and become a Hafiz
  (one who has memorized the Quran). They also mention teaching in a madrasa, which is an Islamic school. Therefore,
  this conversation aligns with the "Religious Aspiration" label. Label: Religious Aspiration

Note : This Figure shows example instructions given to the LLMs for the "Aspirations: Religious" code. These results
are given to the LLM as a single prompt, but for clarity we have grouped the instructions into three parts here, as
indicated by the colors. The first section (in green) describes the task and gives the definition of the code. The labelled
examples (in blue) help the LLM understand the task at hand, often referred to as "few shot learning". For each
example, we also provide a reasoning for the annotation (in red) so that the LLM is also asked to explain why it applies
a certain label, which is known as "chain of thought" prompting. The second box shows an example of a QA pair to be
annotated and the subsequent boxes show the responses of the three LLMs to this prompt.




                                                             4
to prompting remains consistent.

3.1    Annotation with LLMs
We follow several well-established practices to improve the effectiveness of LLMs in annotating our
interview transcripts. We provide a prompt that includes precise directives for the LLM, and employ
"few-shot learning" (Brown et al., 2020) as well as "chain of thought" prompting (Wei et al., 2022), as
explained below. For each code, we created detailed textual instructions, similar to those one would
give to human annotators. These instructions include enough details to ensure that, in principle, the
model is fully aware of the specific standards and definitions required for coding transcripts. Each
code and each question-answer pair are annotated by the LLM independently. By incorporating both
few-shot learning and CoTP, we are in line with best practices and give the LLMs a good chance
at annotating accurately. Previous work has shown that using these techniques can help LLM out-
perform crowd workers in text annotation tasks (Gilardi et al., 2023).
    The choice of prompt given to an LLM when giving it a certain task can make a substantial
difference to its performance. We give the models a thorough briefing of what each code represents
and how to identify its presence in a conversation. This includes the context, certain specific terms
or activities that could be indicators, and the need to distinguish between current circumstances and
future aspirations, as shown in the green text of Figure 1. These instructions provide a benchmark for
the model to understand the coding system and thereby infer the respective codes from the interview
transcripts. Instructions for each of the codes are shown in Appendix A.
    Few-shot learning and chain of thought prompting (CoTP) are two powerful techniques that can
be combined to improve the performance and interpretability of LLMs. Few-shot learning provides
examples of a task to the model, which helps guide its behavior and understanding of the task at
hand. We provide the LLM with four examples that follow the detailed instructions, as shown by the
blue text in Figure 1, to demonstrate correct behavior to the model. These examples are chosen to
be instructive of the how the code should be applied and are similar to the examples one would use
to explain a code in traditional qualitative analysis.
    We also apply chain of thought prompting (CoTP) in these examples to nudge the model to
generate an interpretation of the transcript and articulate its line of reasoning before assignment of
the final code. It is beneficial in complex tasks where reasoning and interpretation play crucial roles,
such as our coding task. The underlying idea is that by having the model outline its thinking process,
we can encourage it to reason more deeply and accurately, while also producing outputs that are more
interpretable and trustworthy. For our task, we have used both few-shot learning and CoTP by asking
the model not only to provide a label for each transcript, but also to give an interpretation explaining
why it chose that label.
    An example of a full prompt for the ’Religious Aspirations’ code are shown in Figure 1, with the
instructions and few shot examples for all codes shown in Appendix A.

3.2    Training supervised models on interpretative annotations (iQual)
An alternative to using LLMs to annotate large corpora of text documents would be to create high
quality annotations on a smaller sub-sample and then training supervised models to predict these
annotations on the remainder of the documents. We thus train a separate classifier for each code on
a numerical representation of the text at the QA level. As discussed in Appendix B, there are many
options for both the classifier we can use here (e.g. random forest, logistic regression, neural networks,
SVM), as well as how to represent the text numerically (e.g. tf-idf ngram vectors, sentence embeddings,
translations or transliterations). Using k-fold cross-validation we select the best performing model,
the text representation and a variety of hyperparameters, so that the approach which performs best
in out-of-sample prediction is selected. In each case, we hold out a test set of 200 interviews in order
to assess out-of-sample performance. Details about this methodology are provided in Ashwin et al.



                                                    5
(2022), and it is implementable in an open source Python package.1 The crucial intuition though is
simply that we use a subset of high quality expert annotations to train a small bespoke model for each
code. These models rely only on the annotated training data, unlike the pre-trained LLMs which are
trained on huge quantities of text from a huge range of contexts.
    Rather than asking LLMs to directly annotate text, another potential use for them is for data
augmentation in combination with a supervised model, such as those described above. Data augmentation
is a common technique in machine learning to generate more variation in a training set while preserving
the important signals. For example, when training a model on a labelled dataset of images of
animals, one might generate extra variation in the training data by rotating the images by 90 degrees
or transforming them into a mirror image of themselves. The idea is to generate more training
observations where the noise in the data is different but the signals are the same. A good example of
this from the natural language processing literature is back-translation, where text is translated into
a different language and then back into the original, so that the exact phrasing and style of the text
is different but the meaning is the same (Edunov et al., 2018). Using LLMs for data augmentation
has been found to increasing prediction performance in some contexts, so we follow the approach set
out in Dai et al. (2023) as an additional experiment here. The example prompts and further details
on the augmentation are shown in Appendix A.1.
    We thus test two different versions of iQual: first, training supervised models on the human
annotations without the use of LLMs, and second, training the model on data augmented by the
LLMs to generate more variation in the text while preserving the meaning.


4      Results
We assess the performance of LLMs in our annotation tasks along two dimensions. Firstly, we
assess how accurate of the LLMs predictions relative to our expert human annotations, finding that
performance is poor relative to our simpler supervised models. Secondly, and more importantly, we
investigate whether the annotations provided by LLMs or iQual introduce bias. We here mean bias
in the technical sense that the prediction errors which the models make are not random.
    1 https://github.com/worldbank/iQual




                                                 6
                     Figure 2: Out-of-sample prediction performance of different methods

                    Aspiration: Religious
                      Aspiration: Secular
                  Ambition: No Ambition
                   Ambition: Job Secular
            Ambition: Vocational Training
                 Ambition: Entrepreneur                                                         Model
               Ambition: Education High                                                              ChatGPT
             Ambition: Education Neutral
                                                                                                     Llama−2 13B chat
                Ambition: Education Low
           Ambition: Education Religious                                                             Llama−2 13B
                      Ambition: Marriage                                                             iQual + ChatGPT aug
                     Ambition: Migration
                                                                                                     iQual
              Capacity: Reliance On God
                   Capacity: Ability High                                                            Random
                    Capacity: Ability Low
                  Capacity: Budget High
                   Capacity: Budget Low
   Capacity: Awareness Information High
   Capacity: Awareness Information Low
                                            0.00    0.25           0.50           0.75
                                                    Out−of−sample F1 score
Note : This Figure compares the out-of-sample prediction performance of LLM and supervised approaches, compared
to the expert human annotations. Each code is shown along the vertical axis, and the test set F1 scores are shown on
the horizontal axis. The F1 score that would be achieved by random guessing is shown as the black triangle and all
models perform better than this. The performance of each model for each code is shown as a separate point with the
color and shape of the point denoting the model. Averaging the F1 scores across all codes, iQual performs best with
0.542, followed by iQual + ChatGPT aug (0.541), ChatGPT (0.414), Llama-2 13B (0.290) and finally Llama-2B chat
(0.274).




4.1     Out-of-sample Performance
Given that the interview transcripts are annotated with a series of binary variables at the QA level,
we can assess LLM prediction accuracy with the out-of-sample F1 score for each code. We compare
the performance of each LLM to the supervised models trained on annotated data, with and without
augmentation.
    Figure 2 shows the results on the comparative performance of the different annotation approaches,
as measured by the test set F1 score.2 Given that many of the codes are very sparse, a useful
comparison is the F1 score that random guessing would achieve, which is shown as black triangles.
All models across all codes do better than random (i.e. have a higher F1 score than that which random
guessing would achieve). In all but one case (Capacity: Awareness Information High) ChatGPT is
the best performing LLM. However, in all but one case (Capacity: Budget Low) all LLMs perform
worse than iQual in terms of these F1 scores. When used for augmentation, ChatGPT does improve
performance slightly in some cases, but it slightly worsens performance just as often. If we measure
performance in terms of accuracy (i.e. the proportion of observations that are correctly classified)
rather than F1 score we get the same results. iQual achieves accuracy of 0.969. In contrast, ChatGPT
only achieves 0.909, Llama-2 13B 0.854 abd Llama-2 13B chat 0.851.
    These results are of course specific to our context, and a different annotation structure on a different
set of text data may lead to different results. However, in our case it is clear that LLMs generate less
accurate annotations than training much smaller models on a subset of human annotations does.
   2 The F1 score is the harmonic mean of the precision and recall, where precision is the number of true positive divided

by the sum of true positives and false positive, and recall is the number of true positive results divided by the sum of
true positives and false negatives. It thus symmetrically represents both type 1 and type 2 errors.




                                                            7
4.2         Bias
If the annotations generated by LLMs are inaccurate, this is not necessarily a hugely consequential
problem. If the mistakes they make are random, with a large enough sample we should still be able to
come to correct conclusions. However, if the mistakes are not random, then using LLM annotations can
lead to completely incorrect conclusions. In other words, if the LLMs errors are biased, then relying
on these annotations could lead researchers to identify relationships in the data that are purely a
result of these algorithmic biases rather than reality.
    We look at two ways in which the predicted annotations could be biased. Firstly, and most
straightforwardly, we show that LLMs over-predict annotations that are very sparse (i.e. there are
many more false positives than false negatives). Secondly, we show that in many cases LLM prediction
errors are systematically associated with characteristics of the interview subject (e.g. refugee status,
gender, education).
    Figure 3 shows the degree of over-prediction across different annotations. Each model is shown as
a separate panel and the bars show the degree of over-prediction as a percentage of all answers. All
three LLMs we tested systematically over-predict most of the annotations. This is a problem in itself,
as we might be interested in the prevalence of a particular concept, but it is especially problematic
if we want to compare the prevalence of different annotations. For example, if we wished to compare
the prevalence of secular and religious aspirations in our sample, using the annotations provided by
ChatGPT would lead us to very misleading conclusions. While ChatGPT over-predicts both the
"Aspirations: Secular" and "Aspirations: Religious" codes, as can be seen from the uppermost two
rows of Figure 3, "Aspirations: Secular" is over-predicted much more frequently than "Aspirations:
Religious"; in the expert human annotations "Aspirations: Secular" appears around 1.2 times more
frequently than "Aspirations: Religious", while in the ChatGPT annotations "Aspirations: Secular"
appears around 3 times more frequently than "Aspirations: Religious".

                                               Figure 3: LLMs systematically over predict annotations

                                                ChatGPT        Llama−2 13B chat        Llama−2 13B       iQual + ChatGPT aug       iQual
                   Aspiration: Religious
                     Aspiration: Secular
                 Ambition: No Ambition
                  Ambition: Job Secular
           Ambition: Vocational Training
                Ambition: Entrepreneur                                                                                                          Model
              Ambition: Education High
            Ambition: Education Neutral                                                                                                            ChatGPT
               Ambition: Education Low                                                                                                             Llama−2 13B chat
          Ambition: Education Religious
                     Ambition: Marriage                                                                                                            Llama−2 13B
                    Ambition: Migration                                                                                                            iQual + ChatGPT aug
             Capacity: Reliance On God
                                                                                                                                                   iQual
                  Capacity: Ability High
                   Capacity: Ability Low
                 Capacity: Budget High
                  Capacity: Budget Low
  Capacity: Awareness Information High
  Capacity: Awareness Information Low
                                           0     20   40   0       20     40       0       20   40       0     20     40       0   20      40
                                                                         Over−prediction (percentage of answers)

Note : This Figure shows the average percentage of answers in which each model over or under predicts each annotation.
Each model is shown as a separate panel, with each code shown along the vertical axis and the percentage of answers
in which there is an net over-prediction is shown on the horizontal axis. A score of 50% thus means that half of all
observations are a false positive. If the value is positive, then the model assigns the annotation too frequently while
if it is negative then the model doesn’t assign the annotation frequently enough. The LLM models systematically
over-predict most of the annotations.

    Of perhaps even greater concern than over-prediction we find that the LLM’s predictions are
systematically biased with respect to the interview subjects’ characteristics (e.g. refugee status,
demographics, education and income). To test whether prediction errors are systematically related to
subject characteristics, we regress prediction errors for each model on a range of subject characteristics.
We then calculate the F statistic of this regression, which tells us whether there is some statistically
significant relationship between the prediction errors and subject characteristics (e.g. a model might


                                                                                       8
over-predict a certain code for men but under-predict for women).

                                     Figure 4: LLM models fail bias test much more regularly that iQual

                                                           ChatGPT       Llama−2 13B chat            Llama−2 13B       iQual + ChatGPT aug        iQual
                              Aspiration: Religious
                                Aspiration: Secular
                            Ambition: No Ambition
                             Ambition: Job Secular
                      Ambition: Vocational Training
                           Ambition: Entrepreneur
                         Ambition: Education High
                       Ambition: Education Neutral                                                                                                            p−value
Annotation




                          Ambition: Education Low                                                                                                                 <1%
                     Ambition: Education Religious
                                Ambition: Marriage                                                                                                                <5%
                               Ambition: Migration                                                                                                                >5%
                        Capacity: Reliance On God
                             Capacity: Ability High
                              Capacity: Ability Low
                            Capacity: Budget High
                             Capacity: Budget Low
             Capacity: Awareness Information High
             Capacity: Awareness Information Low
                                                      −1   0    1    2   −1   0   1    2        −1     0    1      2   −1    0    1    2     −1   0   1   2
                                                                                                 log F statistic

Note : This Figure shows the result of an F-test for a statistical association between the prediction errors of each model
with the characteristics of the interview subject. Each model is shown as a separate panel, with each code shown along
the vertical axis. The log F statistic of this test is shown along the horizontal axis with the color of the points indicating
the statistical significance of the test statistic. The subject characteristics include refugee status; age and sex of eldest
child; age, education and sex of interview subject, "refugee"; total number of children; household assets and income;
and history of trauma experience. The LLM models display a bias much more frequently that the supervised models.
The full results for each regression are shown in Appendix C



    Figure 4 shows these F statistics that test whether the prediction errors of each annotation approach
are systematically related to the interview subjects’ characteristics, with the full regression in each
case reported in Appendix C. The higher the (log) F statistic is, the stronger the evidence of bias. The
color of the points indicates the level of statistical significance and each model is shown as a separate
panel. We see that for iQual in the left-most panel, there is evidence of bias in only one of the 19
codes. While we should be cautious in interpreting results with this code, there is not much cause for
concern. However, for the LLMs we find strong evidence of bias in many of the codes. This tells us
that the prediction errors the LLMs make are not random and conducting analysis on the basis of its
predictions is likely to result in misleading interpretations.
    The F tests shown in Figure 4 tell us that there is some statistical association between prediction
errors and subject characteristics. We can see concrete examples of how this can lead to misleading
conclusions in Figure 5. This Figure shows estimated coefficients for regressions of the prevalence of
an annotation in an interview on dummy variables for the subjects’ refugee status and the gender of
their eldest child. So if the coefficient on refugee status is positive than this code appears more in
interviews with refugees than in interviews with hosts. Six of the codes are shown as separate panels
and the results based on the annotations of the five different models are shown for each coefficient,
following the same color scheme as Figures 2 and 3, but with the coefficient based on only the expert
annotations shown in black. The differences in the estimates across annotation methods are because
the errors they make are not random with respect to refugee status and the gender of the eldest child.




                                                                                            9
                                        Figure 5: Examples of misleading conclusions when using LLM

                                        Aspiration: Secular                  Capacity: Awareness Information High               Capacity: Awareness Information Low



     Refugee status




    Male eldest child
                                                                                                                                                                            Model
                                                                                                                                                                               Expert annotations
                                                                                                                                                                               iQual
                        50



                                       25



                                                    0



                                                               5



                                                                     −0 0
                                                                          25



                                                                                          0



                                                                                                     5



                                                                                                                 0



                                                                                                                               4


                                                                                                                                         3


                                                                                                                                                  2


                                                                                                                                                           1

                                                                                                                                                                 00


                                                                                                                                                                       01
                                                 00



                                                            02



                                                                         05




                                                                                       00



                                                                                                  02



                                                                                                              05



                                                                                                                           .0


                                                                                                                                     .0


                                                                                                                                              .0


                                                                                                                                                       .0
                    .0



                                   .0




                                                                       .0




                                                                                                                                                                 0.


                                                                                                                                                                      0.
                                                                                                                          −0


                                                                                                                                    −0


                                                                                                                                             −0


                                                                                                                                                      −0
                                                 0.



                                                           0.



                                                                      0.




                                                                                      0.



                                                                                                 0.



                                                                                                            0.
                  −0



                                  −0




                                                                                                                                                                               iQual + ChatGPT aug

                                   Ambition: Education High                         Ambition: Education Neutral                          Ambition: Education Low               ChatGPT
                                                                                                                                                                               Llama−2 13B
                                                                                                                                                                               Llama−2 13B chat
     Refugee status




    Male eldest child
                              50


                                            25


                                                       0


                                                                5


                                                                         0



                                                                                  2


                                                                                           00


                                                                                                  02


                                                                                                            04


                                                                                                                     06


                                                                                                                                1



                                                                                                                                             00




                                                                                                                                                            01




                                                                                                                                                                      02
                                                  00


                                                           02


                                                                    05



                                                                               .0




                                                                                                                               .0
                             .0


                                        .0




                                                                                       0.


                                                                                                 0.


                                                                                                         0.


                                                                                                                     0.




                                                                                                                                             0.




                                                                                                                                                           0.




                                                                                                                                                                      0.
                                                                             −0




                                                                                                                           −0
                                                  0.


                                                           0.


                                                                    0.
                         −0


                                       −0




                                                                                              Coefficient

Note : This Figures shows the estimated coefficients for regressions of the prevalence of a code in an interview on dummy
variables indicating the subjects’ refugee status and the gender of their eldest child. Codes are shown as separate panels,
with the error bars represent 95% confidence intervals and color indicates which approach was used to generate the
annotations. The coefficient for a regression estimated on just the Coefficients for all codes are shown in Figure 7



    We see in Figure 5 that in many cases the coefficient based on the ChatGPT predicted annotations
(in pink) is very different from that based on the true human annotations (in black). For example,
for "Ambition: Education Low" in the bottom right panel, the coefficient on refugee status is
positive and significant using the expert annotations but negative and insignificant on the ChatGPT
annotations. The coefficient on whether the eldest child is male meanwhile is negative and insignificant
on the human annotations but positive and significant using the ChatGPT annotations. This is
just one example, but we can see here how relying on the LLM annotations can lead to potentially
dangerous misunderstandings. For example, based on the ChatGPT annotations we might conclude
that subjects are more likely to have low educational ambitions for their male children, while in the
expert annotations there is no evidence for that. In fact once we increase the sample size using iQual
we find a marginally significant effect of the opposite sign.
    We can also note here that the coefficients using iQual are not different from those using just
the expert annotations, but have much smaller standard errors. Using supervised models to scale up
expert human annotations thus increases precision while not introducing bias, as argued in Ashwin
et al. (2022). Using ChatGPT to extend the sample size through data augmentation does not appear
to introduce additional bias, although it does not have a substantial benefit either. Given that some
expert annotations will be necessary in order to identify whether LLM (or crowd sourced) annotations
are biased, this suggests that training smaller bespoke models on these annotations may be more
reliable than relying on LLMs to annotate large samples.


5        Discussion
LLMs are trained on a wide range of text and consequently may not be suited for nuanced and
context-specific tasks. First, they may introduce systematic biases when used to annotated text. In
our example, we see that the errors that LLMs make in annotations (compared to expert human


                                                                                                         10
annotations) are not random. Second, LLMs over-predict many of our codes. We can think of this as
the LLM bringing the "pre-conceptions" it has learned from its training data to the annotation task.
Consequently, LLMs are probably not suited for coding most qualitative data that requires nuanced
and contextual analysis. This sort of analysis has traditionally been the province of anthropology and
sociology but is increasingly being used by more quantitative fields such as economics and political
science.
    To analyze large-N qualitative data, such as those obtained from open-ended in-depth interviews,
we show that a coding scheme based on a close-reading of transcripts by experts in qualitative analysis
with a sub-sample of the full corpus of data is vital for interpretation and analysis. Firstly, high quality
annotations are necessary in order to assess whether the LLM is introducing bias in its annotations.
Secondly, these high quality annotations can then be used as a training set for smaller bespoke models.
These bespoke models may be able to leverage LLMs through data augmentation, but importantly
they are trained on context-specific data so researchers have better control, and an overview, of the
information that is used. We suspect that these limitations will continue even as LLMs improve, and
we encourage researchers using LLMs for annotation tasks to be aware of and check for bias.


References
Appadurai, A. (2004), ‘The capacity to aspire: Culture and the terms of recognition’, Culture and
 Public Action, ed. Vijayendra Rao and Michael Walton, Stanford, California: Stanford University
 Press pp. 59–84.
Ashwin, J., Rao, V., Biradavolu, M., Chhabra, A., Haque, A., Krishnan, N. and Khan, A. (2022), ‘A
  method to scale-up interpretative qualitative analysis, with an application to aspirations in cox’s
  bazaar, bangladesh’.
Atari, M., Xue, M. J., Park, P. S., Blasi, D. and Henrich, J. (2023), ‘Which humans?’,
  https://doi.org/10.31234/osf.io/5b26t .
Bonikowski, B. and Nelson, L. K. (2022), ‘From ends to means: The promise of computational text
  analysis for theoretically driven sociological research’, Sociological Methods & Research 51(4), 1469–
  1483.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam,
  P., Sastry, G., Askell, A. et al. (2020), ‘Language models are few-shot learners’, Advances in neural
  information processing systems 33, 1877–1901.

Callard, A. (2018), Aspiration: The agency of becoming, Oxford University Press.
Dai, H., Liu, Z., Liao, W., Huang, X., Wu, Z., Zhao, L., Liu, W., Liu, N., Li, S., Zhu, D. et al. (2023),
 ‘Chataug: Leveraging chatgpt for text data augmentation’, arXiv preprint arXiv:2302.13007 .
Detering, N. M. and Waters, M. (2018), ‘Flexible coding of in-depth interviews: A twenty-first century
  approach’, Sociological Methods and Research 50(2)), 708–738.
Edunov, S., Ott, M., Auli, M. and Grangier, D. (2018), ‘Understanding back-translation at scale’,
  arXiv preprint arXiv:1808.09381 .
Gilardi, F., Alizadeh, M. and Kubli, M. (2023), ‘Chatgpt outperforms crowd-workers for text-
  annotation tasks’, Proceedings of the National Academy of Sciences 120(30), e2305016120.

Kearns, M. and Roth, A. (2020), The Ethical Algorithm, Oxford University Press.




                                                    11
Mellon, J., Bailey, J., Scott, R., Breckwoldt, J. and Miori, M. (2022), ‘Does gpt-3 know what the most
 important issue is? using large language models to code open-text social survey responses at scale’,
 Using Large Language Models to Code Open-Text Social Survey Responses At Scale (December 22,
 2022) .
Rao, V. (2023), Can economics become more reflexive? exploring the potential of mixed-methods,
  in ‘Handbook on the Economics of Discrimination and Affirmative Action, A. Deshpande Editor’,
  Springer.
Small, M. L. and Calarco, J. M. (2022), Qualitative Literacy: A Guide to Evaluating Ethnographic
  and Interview Research, University of California Press.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal,
  N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E. and Lample, G. (2023), ‘Llama:
  Open and efficient foundation language models’.
Veselovsky, V., Ribeiro, M. H. and West, R. (2023), ‘Artificial artificial artificial intelligence:
  Crowd workers widely use large language models for text production tasks’, arXiv preprint
  arXiv:2306.07899 .

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D. et al.
 (2022), ‘Chain-of-thought prompting elicits reasoning in large language models’, Advances in Neural
 Information Processing Systems 35, 24824–24837.




                                                 12
A   Prompts for Classification




                                 13
                                                   High Ability

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
demonstrated capacity to support their children’s education and future aspirations. Specifically, your role is to
discern whether the parent explicitly expresses any steps they have taken that go above and beyond their capacity
to ensure a better future for their child.

Assign the “High Ability” label only if the parent explicitly articulates measures that they’ve taken that could be
considered extraordinary efforts to secure their child’s future. This can include, but is not limited to, investments in
education, personal sacrifices, or unique strategies. Keep in mind that financial capacity is not the sole determinant
of “High Ability”; parents can still be considered high ability even if they have a limited budget.

If there is no such direct reference to high ability in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of high ability was not addressed.


Few-shot examples

Input:                                                         Input:
Interviewer: How are you supporting your child’s               Interviewer: What are you doing to ensure your
ambition to become a scientist?                                daughter’s future?
Respondent: I’ve been saving up to send him to a               Respondent: I have been working two jobs to save up
science summer camp every year since he was 10. I              for her college education. I also spend my free time
also spend time with him doing experiments from his            helping her with her homework and encouraging her
science kit every weekend.                                     to participate in extracurricular activities that will
                                                               enhance her college application.
Output:
Interpretation: The parent is making significant               Output:
efforts to support their child’s ambition, including           Interpretation:  The parent is making personal
saving money for summer camps and dedicating                   sacrifices, such as working multiple jobs and
personal time for science experiments.          This           dedicating personal time to support their child’s
demonstrates a high ability to support their child’s           education. This shows a high ability to secure their
education and future aspirations.                              child’s future.
Label: High Ability                                            Label: High Ability



Input:                                                         Input:
Interviewer: How do you plan to help your son                  Interviewer: What are you doing to help your
achieve his dreams?                                            daughter become a doctor?
Respondent: I want him to be happy and successful.             Respondent: I am doing my best to support her. I
I will support him in whatever he chooses to do.               always tell her to study hard.

Output:                                                        Output:
Interpretation: While the parent expresses a desire to         Interpretation: Although the parent expresses a desire
support their child’s dreams, they do not specify any          to support their child’s ambition, they do not specify
extraordinary measures they are taking to ensure a             any extraordinary measures they are taking to ensure
better future for their child. Therefore, this dialogue        a better future for their child. Therefore, this dialogue
does not demonstrate high ability.                             does not demonstrate high ability.
Label: Not Applicable                                          Label: Not Applicable




                                                          14
                                                     Low Ability

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
expressed capacity to support their children’s education and future aspirations. Specifically, your role is to discern
whether the parent explicitly indicates any limitations or constraints in their means to aid their child’s future.

Assign the “Low Ability” label only if the parent explicitly states that they have limited or no means (e.g., food,
income, resources) to support their child’s education and future aspirations.

If there is no such direct reference to low ability in the parent’s statement, assign the “Not Applicable” label to indicate
that the topic of low ability was not addressed.


Few-shot examples

Input:                                                           Input:
Interviewer: What are your aspirations for your                  Interviewer: Do you think your children will be able
child’s education?                                               to pursue their dreams?
Respondent: I wish I could give them the best                    Respondent: They are very ambitious and I’m trying
education possible, but I barely make enough to feed             my best to support them.
them.
                                                                 Output:
Output:                                                          Interpretation: Although the parent mentions their
Interpretation:     The parent expresses a direct                effort to support their child’s dreams, there is no
limitation in their ability to support their child’s             explicit reference to the limitations or constraints in
education due to lack of sufficient income. This aligns          their means to do so. Hence, the “Low Ability” label
with the criteria for the “Low Ability” label.                   is not applicable.
Label: Low Ability                                               Label: Not Applicable



Input:                                                           Input:
Interviewer: How are you planning to support your                Interviewer: Are you able to provide the resources
child’s dream of becoming a doctor?                              your child needs for his studies?
Respondent: I would love to, but we are struggling               Respondent: Yes, we have managed to provide them
to make ends meet. I can’t afford to pay for their               with the necessary books and materials.
higher education.
                                                                 Output:
Output:                                                          Interpretation: The parent confirms their ability to
Interpretation: The parent directly states their                 provide resources for their child’s education. There
inability to support their child’s future aspirations due        is no explicit reference to limitations or constraints in
to financial constraints. This meets the criteria for the        their means to support their child’s future. Therefore,
“Low Ability” label.                                             the “Low Ability” label is not applicable.
Label: Low Ability                                               Label: Not Applicable




                                                            15
                                                  High Awareness

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
awareness and understanding of the means to achieve their aspirations for their children’s futures. Specifically, your
role is to discern whether the parent demonstrates a comprehensive and logical understanding or has the capacity to
describe the means of reaching their dreams for their child.

Assign the “High Awareness” label only if the parent demonstrates in their statements a clear understanding and
awareness of the processes, resources, and strategies necessary to achieve their aspirations for their child’s future. The
assignment of this label is inferential, based on your analysis of their comprehension, rather than explicit statements.

If there is no evidence of high awareness in the parent’s statement, assign the “Not Applicable” label to indicate that
the topic of high awareness was not addressed.


Few-shot examples

Input:
Interviewer: What do you envision for your son’s
future?
Respondent: I want my son to become a doctor. I’ve              Input:
already started saving for his medical school fees              Interviewer: How are you preparing your child for
and we spend extra time every day working on his                their future career?
science and math homework. I also plan to get him               Respondent: I believe in my son’s capabilities. He’s
into extracurricular activities that will enhance his           smart and he’ll figure it out.
application for medical school.
                                                                Output:
Output:                                                         Interpretation: The respondent does not provide any
Interpretation: The respondent demonstrates a clear             specific strategies, resources, or processes to prepare
understanding of the steps necessary to achieve their           their child for their future career. They express faith
aspiration for their child’s future, which involves             in their child’s abilities but do not demonstrate a high
him becoming a doctor. They have financial plans                level of awareness about the steps necessary to achieve
in place, focus on relevant academic subjects, and              their career goals.
have considered the importance of extracurricular               Label: Not Applicable
activities. This indicates a high level of awareness
about the process.
Label: High Awareness



                                                                Input:
                                                                Interviewer:   What are your dreams for your
                                                                daughter’s future?
Input:                                                          Respondent: My dream is for her to become a
Interviewer: What are your thoughts about your                  successful engineer. We are ensuring she excels in
daughter’s future?                                              her math and science classes, and we’ve enrolled
Respondent: I want her to be independent and happy.             her in a robotics club to give her early exposure
                                                                to engineering concepts.       We’re also exploring
Output:                                                         scholarship opportunities for engineering programs.
Interpretation: While the respondent has a general
aspiration for their child’s future, they do not provide        Output:
any specific strategies, resources, or processes to             Interpretation:     The respondent demonstrates a
achieve this aspiration.       Therefore, it does not           clear understanding and awareness of the processes,
demonstrate a high level of awareness about the steps           resources, and strategies necessary to achieve their
necessary to achieve this goal.                                 aspiration for their child’s future. They have identified
Label: Not Applicable                                           relevant academic subjects, extracurricular activities,
                                                                and potential financial resources, indicating a high
                                                                level of awareness about the process.
                                                                Label: High Awareness




                                                           16
                                                 Low Awareness

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
awareness and understanding of the means to achieve their aspirations for their children’s futures. Specifically, your
role is to discern whether the parent demonstrates a lack of knowledge, uncertainty, or confusion regarding how to
help their child achieve their aspirations.

Assign the “Low Awareness” label only if the parent demonstrates in their statements a lack of understanding,
cluelessness, or uncertainty about the processes, resources, and strategies necessary to achieve their aspirations for
their child’s future. The assignment of this label is inferential, based on your analysis of their comprehension, rather
than explicit statements.

If there is no evidence of low awareness in the parent’s statement, assign the “Not Applicable” label to indicate that
the topic of low awareness was not addressed.


Few-shot examples

Input:                                                         Input:
Interviewer: What are your plans to ensure your                Interviewer: What are your dreams for your children’s
daughter becomes a doctor, which you mentioned is              future?
her dream?                                                     Respondent: I just want them to be happy and
Respondent: I don’t really know. I suppose she will            successful. I’m not sure how to make that happen,
have to study a lot.                                           though.

Output:                                                        Output:
Interpretation: The respondent shows uncertainty               Interpretation: Although the respondent expresses
about the specific steps and resources required to             aspirations for their children’s future,        they
help their daughter achieve her dream of becoming              acknowledge their uncertainty about how to
a doctor. This indicates a lack of awareness about the         help them achieve success. This suggests a lack
processes involved, hence the “Low Awareness” label            of awareness about the necessary strategies and
is assigned.                                                   resources, warranting the “Low Awareness” label.
Label: Low Awareness                                           Label: Low Awareness


                                                               Input:
                                                               Interviewer: How do you plan to help your children
Input:                                                         achieve their dreams?
Interviewer: How are you supporting your son’s                 Respondent: I believe education is the key. I’m doing
ambition to be an engineer?                                    my best to save money for their schooling and to
Respondent: I’m not sure how to help him. I didn’t             provide them with a conducive learning environment
receive much education myself.                                 at home.
Output:                                                        Output:
Interpretation:    The respondent admits to not                Interpretation: The respondent demonstrates an
knowing how to assist their son in achieving his               understanding of the importance of education in
ambition of becoming an engineer. This demonstrates            achieving their children’s dreams and has a plan to
a lack of understanding about the necessary steps and          save money for their schooling. This indicates an
resources, thus the “Low Awareness” label is assigned.         awareness of the necessary resources and strategies,
Label: Low Awareness                                           so the “Low Awareness” label is not applicable.
                                                               Label: Not Applicable




                                                          17
                                                   High Budget

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
financial capacity to support their children’s education and future aspirations. Specifically, your role is to discern
whether the parent explicitly demonstrates having substantial financial resources, either through an ability to save
or spend on significant items (e.g., setting up a shop) or services like private tuition or coaching centers.

Assign the “High Budget” label only if the parent explicitly articulates possessing or having access to substantial
financial resources that they are able or willing to utilize for their child’s education and future aspirations.

If there is no such direct reference to high budget in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of high budget was not addressed.

It is important to note that the presence of a parent-child relationship in the conversation, or the discussion of the
child’s future unrelated to high budget, does not automatically warrant the “High Budget” label. Only discussions
that directly and explicitly relate to the parent’s financial capacity in supporting their child’s future should receive
this label.


Few-shot examples

Input:
                                                               Input:
Interviewer: What are your plans for your daughter’s
                                                               Interviewer: How are you helping your children with
education?
                                                               their studies?
Respondent: I am planning to send her to a private
                                                               Respondent: We are managing somehow. I try to
school. I have also arranged for a private tutor to
                                                               teach them myself when I get time after work.
help her with her studies at home.
                                                               Output:
Output:
                                                               Interpretation: The respondent doesn’t mention
Interpretation: The respondent explicitly mentions
                                                               any significant financial investment towards their
that they have the financial capacity to send their
                                                               children’s education, indicating that the topic of high
child to a private school and afford a private tutor,
                                                               budget is not addressed.
indicating a high budget.
                                                               Label: Not Applicable
Label: High Budget



Input:                                                         Input:
Interviewer: How do you support your son’s dream               Interviewer: What are your future plans for your
to become a scientist?                                         child’s education?
Respondent: I have enrolled him in a science coaching          Respondent: I want to send him to a good college,
center and I am also planning to buy him a personal            but I’m not sure if I can afford it.
computer for his research work.
                                                               Output:
Output:                                                        Interpretation: The respondent mentions a desire
Interpretation: The respondent mentions that they              to support their child’s education but also expresses
are able to afford enrollment in a coaching center and         uncertainty about their financial capacity to do so,
a personal computer for their child, indicating a high         indicating that the topic of high budget is not
budget.                                                        addressed.
Label: High Budget                                             Label: Not Applicable




                                                          18
                                                    Low Budget

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
financial capacity to support their children’s education and future aspirations. Specifically, your role is to discern
whether the parent explicitly indicates a limited financial capacity, directly related to a lack of funds, regardless of
their willingness to allocate a budget for their child’s education.

Assign the “Low Budget” label only if the parent explicitly articulates a scarcity or lack of financial resources that
they have available for their child’s education and future aspirations.

If there is no such direct reference to low budget in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of low budget was not addressed.


Few-shot examples

Input:                                                          Input:
Interviewer: What are your dreams for your children’s           Interviewer: How do you plan to support your child’s
future?                                                         education?
Respondent: I want them to get a good education                 Respondent: We’re doing our best to save money,
and have a successful career, but I’m not sure how              but it’s tough. We might have to consider taking out
we’ll afford it. We’re barely making ends meet as it is.        a loan.

Output:                                                         Output:
Interpretation: The respondent explicitly expresses a           Interpretation:    The respondent mentions the
concern about their financial capacity to support their         difficulty in saving money and the consideration
child’s education. The respondent is uncertain about            of a loan, which suggests a struggle with financial
how to afford the costs of their children’s education,          constraints.    This indicates a lack of financial
which indicates a lack of financial resources.                  resources to support their child’s education.
Label: Low Budget                                               Label: Low Budget



                                                                Input:
Input:                                                          Interviewer: What are your plans for your child’s
Interviewer: How do you see your child’s future?                education?
Respondent: I see my child becoming a doctor. We’re             Respondent: We’ve been exploring different schools
doing all we can to support this dream.                         and educational programs. We want the best for our
                                                                child.
Output:
Interpretation: The respondent expresses a hopeful              Output:
vision for their child’s future and their commitment            Interpretation: The respondent talks about exploring
to support it. However, there is no explicit mention            different education options for their child, indicating a
of financial constraints or limitations in their ability        proactive approach. There is no explicit mention of a
to support their child’s education.                             lack of financial resources or constraints in supporting
Label: Not Applicable                                           their child’s education.
                                                                Label: Not Applicable




                                                           19
                                                  High Education

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
educational aspirations for their children’s futures. Specifically, your role is to discern whether the parent explicitly
expresses an intention or desire for their child to achieve a high level of education, either secular or religious,
regardless of whether the level is explicitly specified.

Assign the “High Education” label only if the parent explicitly articulates a desire or intention for their child to
pursue a high level of education in the future.

If there is no such direct reference to high education in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of high education was not addressed.


Few-shot examples

Input:
                                                                Input:
Interviewer: What are your aspirations for your
                                                                Interviewer: Do you have any specific goals for your
children’s future?
                                                                children’s education?
Respondent: I have always dreamt of them becoming
                                                                Respondent: Yes, I want them to study hard and
successful in their respective fields. I want my eldest
                                                                excel in their academics. I hope they can attend
son to pursue medicine and my daughter to become
                                                                university and get a good degree.
an engineer.
                                                                Output:
Output:
                                                                Interpretation:  The respondent explicitly states
Interpretation: The respondent clearly articulates a
                                                                a desire for their children to attend university
desire for their children to pursue careers in fields
                                                                and achieve a degree, which is considered high
that require high education, such as medicine and
                                                                education. Therefore, the dialogue aligns with the
engineering. Therefore, the dialogue aligns with the
                                                                “High Education” label.
“High Education” label.
                                                                Label: High Education
Label: High Education



Input:                                                          Input:
Interviewer: What are your plans for your children’s            Interviewer: What are your dreams for your children’s
education?                                                      future?
Respondent: I just want them to be happy and                    Respondent: I want them to be good people, to be
healthy. Whatever they choose to do in life, I will             respectful and kind. I hope they find happiness in
support them.                                                   whatever they choose to do.

Output:                                                         Output:
Interpretation: Although the respondent expresses               Interpretation:    While the respondent expresses
a supportive attitude towards their children’s future           aspirations for their children’s character and
plans, there is no explicit mention or intention of high        happiness, there is no explicit mention of a desire for
education. Therefore, the dialogue aligns with the              high education. Therefore, the dialogue aligns with
“Not Applicable” label.                                         the “Not Applicable” label.
Label: Not Applicable                                           Label: Not Applicable




                                                           20
                                                   Low Education

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
educational aspirations for their children’s futures. Specifically, your role is to discern whether the parent explicitly
expresses an intention or desire for their child to achieve a low level of education, either secular or religious, equivalent
to or lower than higher secondary education. Do not assign this label if the child is currently studying below lower
or secondary education, or if the scale of education is unspecified.

Assign the “Low Education” label only if the parent explicitly articulates a desire or intention for their child to pursue
a level of education that is equivalent to or lower than higher secondary education in the future.

If there is no such direct reference to low education in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of low education was not addressed.


Few-shot examples

Input:                                                            Input:
Interviewer: What are your aspirations for your son’s             Interviewer: Do you have any educational goals for
education?                                                        your child?
Respondent: I want him to finish his schooling up to              Respondent: Yes, I want my child to pursue a degree
the 10th grade and then start working.                            in medicine.

Output:                                                           Output:
Interpretation: The respondent explicitly states that             Interpretation: The respondent expresses a desire for
they want their child to complete education up to the             their child to pursue a degree in medicine, which
10th grade, which is equivalent to or lower than higher           is a higher level of education than higher secondary
secondary education. Therefore, the dialogue should               education. Therefore, the dialogue should be assigned
be assigned the “Low Education” label.                            the “Not Applicable” label.
Label: Low Education                                              Label: Not Applicable



Input:                                                            Input:
Interviewer: What do you wish for your daughter’s                 Interviewer: What are your hopes for your son’s
future?                                                           future?
Respondent: I hope she completes her education till               Respondent:   I hope he becomes a successful
grade 8 and then gets married.                                    businessman.

Output:                                                           Output:
Interpretation: The respondent expresses a desire for             Interpretation: The respondent expresses a hope for
their daughter to finish education up to the 8th grade,           their son’s future but does not specify a level of
which is lower than higher secondary education. This              education. Therefore, the dialogue should be assigned
aligns with the “Low Education” label.                            the “Not Applicable” label.
Label: Low Education                                              Label: Not Applicable




                                                             21
                                                Neutral Education

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
educational aspirations for their children’s futures. Specifically, your role is to discern whether the parent expresses
a vague or general desire for their child’s education, accompanied by a future job or religious education ambition,
without implying a high or low level of education.

Assign the “Neutral Education” label only if the parent articulates a general or ambiguous desire for their child to
study in the future, accompanied by the mention of a future job or religious education ambition, without specifically
indicating a high or low level of education.

If there is no such direct reference to neutral education in the parent’s statement, assign the “Not Applicable” label
to indicate that the topic of neutral education was not addressed.


Few-shot examples

Input:                                                          Input:
Interviewer: What are your aspirations for your                 Interviewer: What are your expectations for your
daughter’s future?                                              child’s education?
Respondent: My wish is for her to receive a good                Respondent: I want my child to be an engineer.
education and perhaps find a good job in the future.
                                                                Output:
Output:                                                         Interpretation: The respondent expresses a specific
Interpretation: The respondent expresses a general              career ambition for their child, which implies a high
desire for their child to receive education and hints           level of education. This does not align with the
at a future job ambition. However, there is no specific         “Neutral Education” label.
mention of a high or low level of education.                    Label: Not Applicable
Label: Neutral Education


                                                                Input:
Input:                                                          Interviewer: How do you envision your child’s future?
Interviewer: What are your dreams for your son’s                Respondent: I just want them to be happy and
future?                                                         healthy. If they choose to pursue education or a
Respondent: I want him to be successful in whatever             career, that’s up to them.
he chooses to do, be it in his education or career.
                                                                Output:
Output:                                                         Interpretation: The respondent expresses a general
Interpretation: The respondent expresses a general              wish for their child’s happiness and health, but doesn’t
desire for their child’s success in education and career        specify any educational or career ambitions. This does
but doesn’t specify a level of education.                       not align with the “Neutral Education” label.
Label: Neutral Education                                        Label: Not Applicable




                                                           22
                                              Religious Education

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations for their children’s future education with a particular focus on religious education. Specifically, your
role is to discern whether the parent explicitly expresses an intention or desire for their child to pursue religious
education or become a religious figure in the future. Do not assign this label if the child is already engaged in any of
the mentioned activities, as current religious education is not considered a future ambition.

Assign the “Religious Education” label only if the parent explicitly articulates a desire or intention for their child to
engage in religious education or assume a religious role in the future.


Few-shot examples

Input:
                                                                Input:
Interviewer: What are your aspirations for your
                                                                Interviewer: How do you envision your son’s future?
youngest son?
                                                                Respondent: I want him to be a scholar of our
Respondent: I want him to become a doctor, but I
                                                                religion.   He should learn and teach our holy
also want him to get a religious education. I have
                                                                scriptures.
plans to enroll him in a Madrasa when he’s a bit older.
                                                                Output:
Output:
                                                                Interpretation: The parent expresses a clear intention
Interpretation: The parent explicitly expresses a
                                                                for their child to engage in religious education and
desire for their child to pursue religious education in
                                                                even become a religious figure in the future. This
the future, alongside becoming a doctor. This aligns
                                                                aligns with the “Religious Education” label.
with the “Religious Education” label.
                                                                Label: Religious Education
Label: Religious Education



Input:
                                                                Input:
Interviewer:   What are your dreams for your
                                                                Interviewer: What are your plans for your children’s
daughter’s future?
                                                                education?
Respondent:     I hope she becomes a successful
                                                                Respondent: I want them to study hard and go to
engineer. I want her to have a good education and a
                                                                university. A good education is the key to a better life.
stable career.
                                                                Output:
Output:
                                                                Interpretation: The parent talks about their child’s
Interpretation: While the parent has clear aspirations
                                                                education, but there is no explicit reference to
for their child’s future, there is no explicit mention
                                                                religious education or the child assuming a religious
of religious education or the desire for the child to
                                                                role in the future.       Therefore, the “Religious
take on a religious role. Therefore, the “Religious
                                                                Education” label is not applicable.
Education” label is not applicable.
                                                                Label: Not Applicable
Label: Not Applicable




                                                           23
                                                   Entrepreneur

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s futures. Specifically, your role is to discern whether the parent explicitly
expresses any intention or wishes related to their child’s potential future entrepreneurship.

Assign the “Entrepreneur” label only if the parent explicitly articulates a desire or intention for their child to pursue
a non-wage enterprise job or start and run their own business, regardless of whether the business type is specified.

If there is no such direct reference to entrepreneurship in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of entrepreneurship was not addressed.


Few-shot examples

                                                                Input:
Input:
                                                                Interviewer: What are your dreams for your children’s
Interviewer: What are your aspirations for your
                                                                future?
daughter’s future?
                                                                Respondent: I want them to be happy and healthy.
Respondent: I want her to be independent and
                                                                I hope they get a good education and find jobs that
successful. I am thinking of setting up a small
                                                                they love.
business for her, maybe a bakery or a boutique.
                                                                Output:
Output:
                                                                Interpretation: While the respondent talks about
Interpretation: The respondent explicitly expresses a
                                                                their children’s future, there is no direct reference to
desire for their child to run a business in the future.
                                                                entrepreneurship or starting a business. Therefore,
This aligns with the definition of the “Entrepreneur”
                                                                this does not meet the criteria for the “Entrepreneur”
label.
                                                                label.
Label: Entrepreneur
                                                                Label: Not Applicable



                                                                Input:
Input:
                                                                Interviewer: What do you envision for your child’s
Interviewer: What do you hope your son will do in
                                                                future?
the future?
                                                                Respondent: I want my child to be a doctor. I believe
Respondent: I want him to be a successful person. I
                                                                that is a noble profession and I want him to serve the
am encouraging him to start his own tech firm.
                                                                community.
Output:
                                                                Output:
Interpretation: The respondent has directly expressed
                                                                Interpretation: The respondent has expressed a
their desire for their child to start and run their own
                                                                specific desire for their child to become a doctor, which
business in the future, which fits the criteria for the
                                                                is distinct from entrepreneurship. This does not meet
“Entrepreneur” label.
                                                                the criteria for the “Entrepreneur” label.
Label: Entrepreneur
                                                                Label: Not Applicable




                                                           24
                                                      Secular Job

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s future careers in secular jobs. Specifically, your role is to discern whether the
parent explicitly expresses a desire or intention related to their child pursuing a specific secular job, occupation, or
work type, such as a doctor, government job, NGO job, or teacher in a non-religious school.

Assign the “Secular Job” label only if the parent explicitly articulates a desire or intention for their child to pursue a
particular secular career in the future.

If there is no such direct reference to a secular job in the parent’s statement, or if there is a reference to a religious
job instead, assign the “Not Applicable” label to indicate that the topic of secular jobs was not addressed.


Few-shot examples

                                                                 Input:
Input:                                                           Interviewer: What are your aspirations for your son’s
Interviewer: What are your hopes for your daughter’s             future?
future?                                                          Respondent: I just want him to be happy and
Respondent: I hope she becomes a successful lawyer.              healthy. Whatever career he chooses, I will support
She has always been good at debating and I believe               him.
she would excel in that field.
                                                                 Output:
Output:                                                          Interpretation: While the parent expresses a general
Interpretation: The parent explicitly expresses a                desire for their child’s well-being and happiness, there
desire for their child to pursue a career as a lawyer,           is no explicit mention of a specific secular job or
which is a secular job. Therefore, the “Secular Job”             career. Therefore, the “Not Applicable” label should
label should be assigned.                                        be assigned.
Label: Secular Job                                               Label: Not Applicable



Input:                                                           Input:
Interviewer: Do you have any specific career in mind             Interviewer: What are your dreams for your child’s
for your daughter?                                               future?
Respondent: Yes, I would like her to pursue a career             Respondent: My dream is for my child to be a good
in engineering. She has always been fascinated by                person and contribute positively to society.
how things work.
                                                                 Output:
Output:                                                          Interpretation: While the parent expresses a general
Interpretation: The parent explicitly expresses a                desire for their child’s future, there is no explicit
desire for their child to pursue a career in engineering,        mention of a specific secular job or career. Therefore,
which is a secular job. Therefore, the “Secular Job”             the “Not Applicable” label should be assigned.
label should be assigned.                                        Label: Not Applicable
Label: Secular Job




                                                            25
                                                        Marriage

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s futures. Specifically, your role is to discern whether the parent explicitly
expresses any intention or wishes related to their child’s potential future marriage.

Assign the “Marriage” label only if the parent articulates a desire or intention for their child to get married in the future.

If there is no such reference to marriage in the parent’s statement, assign the “Not Applicable” label to indicate that
the topic of marriage was not addressed.


Few-shot examples

Input:
                                                                  Input:
Interviewer: What are your aspirations for your son,
                                                                  Interviewer: What do you hope for your eldest son,
Ahmed?
                                                                  Omar?
Respondent: My son is very smart, I hope he will get
                                                                  Respondent: Omar is a good boy. I hope he will find
a good job in the future. And when he is settled, I
                                                                  a good wife and start a family of his own.
would like him to marry a good woman.
                                                                  Output:
Output:
                                                                  Interpretation: The respondent explicitly mentions
Interpretation: The respondent explicitly mentions
                                                                  their hope for their son to find a good wife and start
their desire for their son to get married in the future,
                                                                  a family, which directly relates to the child’s potential
after getting a good job. This directly relates to
                                                                  future marriage. Therefore, the label “Marriage”
the child’s potential future marriage, so the label
                                                                  should be assigned.
“Marriage” should be assigned.
                                                                  Label: Marriage
Label: Marriage



Input:
                                                                  Input:
Interviewer: Can you tell me about your dreams for
                                                                  Interviewer: What are your dreams for your children?
your daughter’s future?
                                                                  Respondent: I want them to be happy and successful
Respondent: I want my daughter to become a
                                                                  in their respective fields. I have always encouraged
successful lawyer. She’s always had a passion for
                                                                  them to pursue their passions.
justice.
                                                                  Output:
Output:
                                                                  Interpretation: The respondent talks about their
Interpretation:     The respondent mentions their
                                                                  aspirations for their children’s happiness and success,
aspirations for their daughter’s future, but it is related
                                                                  but there is no explicit mention of marriage.
to her career, not her marital status. Since there
                                                                  Therefore, the label “Not Applicable” should be
is no explicit mention of marriage, the label “Not
                                                                  assigned.
Applicable” should be assigned.
                                                                  Label: Not Applicable
Label: Not Applicable




                                                             26
                                                     Migration

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s futures. Specifically, your role is to discern whether the parent explicitly
expresses any intention or wishes related to their child’s potential future migration.

Assign the “Migration” label only if the parent explicitly articulates a desire or intention for their child to migrate in
the future for reasons such as work, studying, or resettling.

If there is no such direct reference to migration in the parent’s statement, assign the “Not Applicable” label to indicate
that the topic of migration was not addressed.


Few-shot examples

Input:                                                          Input:
Interviewer:    What are your thoughts on your                  Interviewer: Have you considered options for your
daughter’s future after she completes her studies?              daughter’s higher education?
Respondent: I am hoping to send her to a foreign                Respondent: Yes, I am considering sending her
university for her post-graduation studies. I believe           abroad for her Master’s degree. I think she will
it will provide her with better opportunities.                  benefit from the exposure and opportunities there.

Output:                                                         Output:
Interpretation: The parent is explicitly expressing a           Interpretation: The parent is explicitly expressing a
desire for their child to migrate for further studies.          desire for their child to migrate for higher education.
This aligns with the “Migration” label as the parent            This aligns with the “Migration” label as the parent
is directly referring to the child’s potential future           is directly referring to the child’s potential future
migration.                                                      migration.
Label: Migration                                                Label: Migration



Input:                                                          Input:
Interviewer: What are your aspirations for your son’s           Interviewer: What are your plans for your son after
future?                                                         he completes his high school?
Respondent: I want him to have a good education                 Respondent: I want him to enrol in a good university
and a successful career.    I hope he becomes a                 here and study engineering.     We have excellent
respected doctor.                                               universities in our country.

Output:                                                         Output:
Interpretation: The parent is discussing their child’s          Interpretation: The parent is discussing their child’s
future but there is no explicit mention of migration.           future but there is no explicit mention of migration.
The parent is expressing aspirations related to the             The parent is expressing aspirations related to the
child’s career but not necessarily in the context of            child’s education but it is specifically mentioned to
moving abroad. Hence, this does not align with the              be within their own country. Hence, this does not
“Migration” label.                                              align with the “Migration” label.
Label: Not Applicable                                           Label: Not Applicable




                                                           27
                                                   No Ambition

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations for their children’s futures. Specifically, your role is to discern whether the parent expresses helplessness,
unwillingness, or a lack of dreams or plans for their child’s future.

Assign the “No Ambition” label only if the parent explicitly conveys a lack of ambitions or plans for their child’s
future or if they demonstrate a sense of helplessness or unwillingness concerning their child’s future.

If there is no such direct expression of a lack of ambition in the parent’s statement, assign the “Not Applicable” label
to indicate that the topic of ambition was not addressed.


Few-shot examples

Input:
                                                                Input:
Interviewer: What are your hopes for your daughter’s
                                                                Interviewer: What are your plans for your child’s
future?
                                                                education?
Respondent: What future? We are just surviving.
                                                                Respondent: I can’t afford to send him to school.
She will just have to get married and that’s it.
                                                                He’ll have to work like the rest of us.
Output:
                                                                Output:
Interpretation: The respondent expresses a sense of
                                                                Interpretation: The respondent expresses a sense
helplessness regarding their daughter’s future and
                                                                of helplessness and lack of plans for their child’s
does not express any dreams or plans for her future
                                                                education, indicating a lack of ambition for the child’s
beyond getting married. This aligns with the “No
                                                                future. This aligns with the “No Ambition” label.
Ambition” label.
                                                                Label: No Ambition
Label: No Ambition


Input:
                                                                Input:
Interviewer: What are your dreams for your son’s
                                                                Interviewer: How do you see your child’s future?
future?
                                                                Respondent: I hope he will be able to get a good
Respondent: I want him to become a doctor and
                                                                education and have a successful career.
serve the community.
                                                                Output:
Output:
                                                                Interpretation:   The respondent expresses hopes
Interpretation: The respondent clearly expresses a
                                                                for their child’s future, indicating that they have
dream for their child’s future, indicating that they
                                                                ambitions for the child. This does not align with the
have ambitions for the child. This does not align with
                                                                “No Ambition” label.
the “No Ambition” label.
                                                                Label: Not Applicable
Label: Not Applicable




                                                           28
                                                Reliance on God

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
reliance on divine intervention or faith in God to support their children’s education and future aspirations.
Specifically, your role is to discern whether the parent explicitly expresses full or partial reliance on God to fulfill
their dreams for their children or if they currently rely on God’s will in their lives.

Assign the “Reliance on God” label only if the parent explicitly articulates a full or partial dependence on God to
accomplish their aspirations for their children’s future or if they demonstrate a current reliance on divine will.

If there is no such direct reference to reliance on God in the parent’s statement, assign the “Not Applicable” label to
indicate that the topic of reliance on divine intervention was not addressed.


Few-shot examples

Input:                                                         Input:
Interviewer: What are your hopes for your children’s           Interviewer: Do you have any dreams for your
future?                                                        daughter’s future?
Respondent: I hope they will do well in their studies          Respondent: Yes, I dream that she will become a
and become successful. But ultimately, it’s all up to          doctor. But we leave everything in God’s hands. If
God’s will.                                                    it’s His will, it will happen.

Output:                                                        Output:
Interpretation: The respondent clearly states their            Interpretation: The respondent expresses a dream for
hopes for their children’s future, but also expresses a        their daughter’s future but also mentions that the
reliance on God’s will to determine the outcome. This          fulfillment of this dream is dependent on God’s will.
demonstrates a partial reliance on divine intervention         This shows a reliance on divine intervention for their
for their children’s future.                                   child’s future.
Label: Reliance on God                                         Label: Reliance on God


Input:
                                                               Input:
Interviewer: How do you plan to support your
                                                               Interviewer: What are your plans for your son’s
children’s education?
                                                               education?
Respondent: I work hard every day to provide for
                                                               Respondent: I just want him to do his best. The rest
their needs. I believe in the power of hard work and
                                                               is up to him.
dedication.
                                                               Output:
Output:
                                                               Interpretation: The respondent expresses a desire for
Interpretation:    The respondent talks about
                                                               their son to do well in his education, but does not
supporting their children’s education through their
                                                               mention any reliance on God or divine intervention.
own efforts and does not mention any reliance on
                                                               Therefore, the “Reliance on God” label does not apply
God or divine intervention. Therefore, the “Reliance
                                                               in this case.
on God” label does not apply in this case.
                                                               Label: Not Applicable
Label: Not Applicable




                                                          29
                                               Religious Aspirations

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s future education and careers solely in the context of religion, without implying
any monetary incentives. Specifically, your role is to discern whether the parent explicitly expresses a desire for their
child to pursue a religious path.

Assign the “Religious Aspiration” label only if the parent explicitly articulates aspirations for their child that include
becoming a religious scholar, Hafiz, attending a madrassa, learning Arabic, Quran reading, Islamic covering, regular
prayer, working in Islamic banks, or being a great Islamic scholar. It is crucial not to assign this label if the child is
already engaged in any of the mentioned activities, as current religious activities are not considered future aspirations.

If there is no such direct reference to religious aspirations in the parent’s statement, or if there is a reference to secular
aspirations, assign the “Not Applicable” label to indicate that the topic of religious aspirations was not addressed.


Few-shot examples

Input:
Interviewer: What are your aspirations for your son’s             Input:
future?                                                           Interviewer: What do you envision for your child’s
Respondent: I hope that he will become a great                    future?
Islamic scholar. I want him to learn Arabic and read              Respondent: I would like my son to become a Hafiz.
the Quran regularly.                                              It’s a great honor in our religion.

Output:                                                           Output:
Interpretation: The parent explicitly expresses a                 Interpretation: The parent explicitly expresses a
desire for their child to pursue a path in religious              desire for their child to become a Hafiz, which
education, specifically mentioning becoming an                    is a religious aspiration as per the instruction.
Islamic scholar, learning Arabic, and reading the                 Therefore, this conversation aligns with the “Religious
Quran regularly. Therefore, this conversation aligns              Aspiration” label.
with the “Religious Aspiration” label.                            Label: Religious Aspiration
Label: Religious Aspiration



Input:
                                                                  Input:
Interviewer: Do you have any specific dreams for
                                                                  Interviewer: What are your hopes for your children’s
your daughter?
                                                                  future careers?
Respondent: I want her to excel in her studies and
                                                                  Respondent: I want them to have successful careers.
become a successful engineer.
                                                                  I hope my daughter becomes a doctor and my son
                                                                  becomes a lawyer.
Output:
Interpretation:    While the parent does express
                                                                  Output:
aspirations for their child’s future, these aspirations
                                                                  Interpretation:   While the parent does express
are not religious in nature. The parent hopes for
                                                                  aspirations for their children’s future, these
their child to become an engineer, which doesn’t
                                                                  aspirations do not involve any religious activities
involve any religious activities as per the instruction.
                                                                  or pursuits as per the instruction. Therefore, this
Therefore, this conversation should be labeled as “Not
                                                                  conversation should be labeled as “Not Applicable”.
Applicable”.
                                                                  Label: Not Applicable
Label: Not Applicable




                                                             30
                                                Secular Aspirations

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s future education, career, and personal development without religious context.
Specifically, your role is to discern whether the parent explicitly expresses a desire for their child to develop positive
character traits, moralistic values, or general well-being.

Assign the “Secular Aspiration” label only if the parent explicitly articulates aspirations for their child that include
becoming a good person, being responsible, having good manners, living a healthy life, achieving professional success,
or having a good family.

If there is no such direct reference to secular aspirations in the parent’s statement, or if there is a reference to religious
aspirations, assign the “Not Applicable” label to indicate that the topic of secular aspirations was not addressed.


Few-shot examples

Input:
                                                                  Input:
Interviewer: Can you share your aspirations for your
                                                                  Interviewer: What are your dreams for your children?
children?
                                                                  Respondent: My dream for my children is that
Respondent: I want them to be kind, respectful and
                                                                  they grow up to be responsible, hardworking, and
educated individuals. I hope they lead a healthy life
                                                                  successful in their careers. I also hope they have a
and contribute positively to society.
                                                                  happy family life.
Output:
                                                                  Output:
Interpretation: The respondent explicitly expresses
                                                                  Interpretation: The respondent clearly articulates
a desire for their children to develop positive traits
                                                                  secular aspirations for their children, which include
such as kindness and respectfulness.      They also
                                                                  becoming responsible and hardworking individuals,
mention the importance of education and leading a
                                                                  achieving professional success, and having a good
healthy life, which are secular aspirations for their
                                                                  family life. Therefore, the dialogue should be assigned
children. Therefore, the dialogue should be assigned
                                                                  the “Secular Aspiration” label.
the “Secular Aspiration” label.
                                                                  Label: Secular Aspiration
Label: Secular Aspiration



                                                                  Input:
Input:
                                                                  Interviewer: What kind of person would you like your
Interviewer: What kind of future do you envision for
                                                                  child to become and how do you want them to live
your children?
                                                                  their life?
Respondent: I just hope they’re happy, whatever
                                                                  Respondent: I hope they will follow our religious
they choose to do.
                                                                  teachings, pray regularly, and become a great Islamic
                                                                  scholar.
Output:
Interpretation: While the respondent expresses a
                                                                  Output:
desire for their children’s happiness, they do not
                                                                  Interpretation: The respondent discusses faith-based
explicitly state any secular aspirations such as
                                                                  aspirations for their child, such as following religious
developing positive character traits or achieving
                                                                  teachings, praying regularly, and becoming a great
professional success. Therefore, the dialogue should
                                                                  Islamic scholar. Therefore, the dialogue should be
be assigned the “Not Applicable” label.
                                                                  assigned the “Not Applicable” label.
Label: Not Applicable
                                                                  Label: Not Applicable




                                                             31
                                              Vocational Training

System prompt
Your task involves interpreting and analyzing transcripts of interviews conducted with parents, focusing on their
aspirations concerning their children’s future vocational training. Specifically, your role is to discern whether the
parent explicitly expresses a desire or intention related to their child pursuing vocational training or learning a
specific skill in the context of ambition, such as tailoring, handicrafts, or sewing. The label should only be assigned
if training, teaching, or learning is mentioned in the context of ambition.

Assign the “Vocational Training” label only if the parent explicitly articulates a desire or intention for their child to
undergo vocational training or learn a particular skill in the future as part of their ambition.

If there is no such direct reference to vocational training in the parent’s statement, assign the “Not Applicable” label
to indicate that the topic of vocational training was not addressed.


Few-shot examples

Input:                                                          Input:
Interviewer: What are your aspirations for your                 Interviewer: How do you envision your child’s future?
youngest daughter?                                              Respondent: Well, I believe in letting him choose his
Respondent: I want her to learn sewing. It’s a skill            own path. But I do encourage him to learn carpentry.
that can help her in the future.                                It’s a useful skill to have.

Output:                                                         Output:
Interpretation: The parent explicitly mentions a                Interpretation: The parent expresses a desire for their
desire for their daughter to learn sewing, a vocational         child to learn carpentry, a vocational skill. Therefore,
skill. Therefore, the dialogue should be labeled as             the dialogue should be labeled as ’Vocational
’Vocational Training’.                                          Training’.
Label: Vocational Training                                      Label: Vocational Training



Input:
                                                                Input:
Interviewer: Do you have any specific plans for your
                                                                Interviewer: What are your dreams for your children?
son’s future?
                                                                Respondent: I just want them to be happy and
Respondent: I hope he continues his studies and
                                                                healthy. That’s all that matters.
becomes a doctor one day.
                                                                Output:
Output:
                                                                Interpretation: The parent expresses a general wish
Interpretation: Although the parent expresses a
                                                                for their children’s well-being, but there is no explicit
desire for their child’s future, there is no mention
                                                                mention of vocational training or learning a specific
of vocational training or learning a specific skill.
                                                                skill. Therefore, the dialogue should be labeled as
Therefore, the dialogue should be labeled as ’Not
                                                                ’Not Applicable’.
Applicable’.
                                                                Label: Not Applicable
Label: Not Applicable




                                                           32
A.1     LLM prompts for Data Augmentation
We directed the LLM with the following “system prompt”:
Revise a given dialogue between an interviewer and respondent while maintaining the core meaning.
Additionally, you may alter names, places, and other identifying information while preserving the core meaning of
the dialogue.

Create a single revision of the dialogue in the following format:
Interviewer: (question/statement by the interviewer)           Respondent: (answer by the respondent)



B     iQual Appendix
This Appendix summarises the key features of the iQual approach to supervised NLP for qualitative
analysis. The method is set out in more detail in Ashwin et al. (2022) and available through a Python
package at https://github.com/worldbank/iQual.

B.1     Summary of Approach
The overall goal of our iQual is to use our subset of annotated interviews to provide reliable annotations
for the remainder of the sample. Broadly, we do this by training a series of classifier models on our
annotated set and then using this model to predict annotations for the unannotated set. We train
and predict for each annotations separately, so the model for Religious Aspiration will be trained and
make its predictions separately from the model for Secular Aspiration. As the qualitative annotations
are defined at the level of question-answer pairs (QA), we can represent each annotation as a binary
classification problem at the QA level.
    Figure 6, also shown in Ashwin et al. (2022), illustrates our overall methodology for a single
annotation. On the left hand side we see a "human" sample of size Nh , in which interviews include
both text w and annotations y , and a "machine" sample in which interviews include only the text.
                                                                                             h
As annotations are defined at the QA (question-answer pair) level, so we represent wi,s         as the sth
                                                   h
QA in interview i in the human sample, with yi,s being the binary annotation on that QA. In other
                                                   h
words, if the annotation Religious Aspiration, yi,s   will be equal to one if that QA has been annotated
as displaying religious aspirations, and will be zero otherwise.




                                                        33
                                                                     Figure 6: Methodology

                                  Aggregate to interview level
        Human (������# )                              #
                                      ������!# = ������(������!,!           #
                                                       , … , ������!,$ )
                                                  ⋮
         Interview 1                            #               #
                                   ������!# = ������(������%  ! ,!
                                                       , … , ������% ! ,$ )
         #                #
       ������!,!            ������!,!                                                    Measurement errors
                 ⋮                                                                          .#
                                                                                     ������ # − ������
         #                #
       ������!,$            ������!,$
                                  Aggregate to interview level
                 ⋮                    .!# = ������(������
                                      ������          #
                                               /!,!   , … , ������ #
                                                            /!,$  )
                                                  ⋮
       Interview ������#               .!# = ������(������ #               #
                                   ������         /% ! ,!
                                                      , … , /
                                                            ������% ! ,$ )                                                     Assess bias, efficiency and
        #                #
      ������% ! ,!
                       ������% ! ,!
                                                                                                                                interpretability
                 ⋮
        #                #
      ������% ! ,$
                       ������% ! ,$
                                                                                                         Enhanced sample
                                          Train classifier                                                            .&
                                                                                                           4 = ������ # , ������
                                                                                                          ������
                                            #        # 2
                                         /',(
                                         ������   = ������(������',( ; ������)
      Machine (������& )
                                                                                                                              Substantive analysis

         Interview 1
                 &
               ������!,!
                 ⋮                                                           Aggregate to interview level
                 &
               ������!,$                   Predict annotations                       .!& = ������(������
                                                                                 ������          &
                                                                                          /!,! , … , ������ &
                                                                                                     /!,$ )
                                           &        & 2
                                        /',(
                                        ������   = ������(������',( ; ������)                                ⋮
                 ⋮                                                            .  &        &             &
                                                                                        /%" ,! , … , ������
                                                                              ������! = ������(������            /%" ,$ )
       Interview ������&
              &
            ������%" ,!

                ⋮
              &
            ������%" ,$




                                                                         h                            h
    We then train some classifier f () parameterised by θ to predict yi,s   based on the QA text wi,s   .
As we will discuss below, there are many options for both the classifier we can use here, as well
as how to represent the text numerically. A key point here is that the text representation must by
full unsupervised - i.e. we do not use any information about y or any further information about
the interview subject when creating a numerical representation of the text. The text representation,
classifier and a variety of hyperparameters are chosen using k-fold cross-validation. Given this trained
classifier we can then predict annotations a the QA level for our unannotated "machine" sample. This
                                       m
gives us the predicted annotations y  ˆi,s .
    We then aggregate the QA level annotations to the interview level using aggregation function g ().
The choice of this aggregation function is at least in part a substantive question that depends on the
research question. For example, if we take the mean value of y across QA pairs for each interview this
gives us a measure of the intensity with which this concept comes up. On the other hand, if we take
the maximum value across the interview this gives us a measure of interviews in which this concept
comes up at least once. We perform this aggregation for both the observed human annotations Y h , the
"in-sample" predicted human annotations Y    ˆ h and the "out-of-sample" predicted machine annotations
ˆ . The predicted annotations for the human sample can then by used to assess the measurement
Y m

errors introduced by the model. For the quantification of measurement errors, we also make use
of bootstrapping. The observed human annotations and machine annotations are then combined to
give an enhanced sample Y    ˜ . Once we have verified that the enhancement does indeed add value, we
proceed with substantive analysis.

B.2        Choices
There are many possible options for the numerical representation of the text representation w, the
classifier f () and the aggregation function g (). We fix the aggregation function as taking the mean
across QA pairs and we use cross-validation to select the text representation and the classifier. As
we train the classifier for each annotation independently, this allows for the fact that a different


                                                                               34
classification model of text representation may be optimal for different annotations. THe following
gives an exhaustive list of the text representations, models and hyperparameters that are selected over
during cross validation. In addition to these options we also cross-validate whether the question is
included along with the answer in the text representation.




                                                  35
                    Table 1: Statistical methods for text vectorization

Method Name       Description                       Hyperparameters                 Hyperparameters (Used)
                                                    (Options)


TfidfVectorizer   TfidfVectorizer is a method
                  for converting text into          • ngram_range:          The     • max_features:        The
                  numerical      representations,     range    of  n-grams    to      maximum      number    of
                  specifically term frequency-        consider when creating the      words to keep in the
                  inverse document frequency          vocabulary.                     vocabulary based on word
                  (TF-IDF) vectors. It counts                                         frequency. [1000, 10000]
                                                    • min_df : The minimum
                  the frequency of words in           number of documents a         • ngram_range: The lower
                  a document and down-                word must be in to be           and upper boundary of
                  weights      the   importance       included in the vocabulary.     the range of n-values for
                  of commonly used words.                                             different word n-grams to
                  This can be useful for text       • max_df : The maximum
                                                                                      be extracted.    { (1,1),
                  classification tasks, as it         number of documents a
                                                                                      (1,2), (1,3) }
                  allows the classifier to focus      word can be in to be
                  on the words that are most          included in the vocabulary.
                  distinctive to a particular       • max_features:         The
                  document.                           maximum      number    of
                                                      words to keep in the
                                                      vocabulary, based on word
                                                      frequency.
                                                    • use_idf : A boolean flag
                                                      indicating whether to use
                                                      the      inverse-document-
                                                      frequency weighting.
                                                    • norm:       The type of
                                                      normalization to apply to
                                                      the vectors.
                                                    • smooth_idf : A boolean
                                                      flag indicating whether to
                                                      smooth the idf values.
                                                    • sublinear_tf : A boolean
                                                      flag indicating whether to
                                                      apply sublinear scaling to
                                                      the term frequency.


CountVectorizer   CountVectorizer is a method
                  for converting text into          • ngram_range:          The     • max_features:         The
                  numerical     representations,      range    of  n-grams    to      maximum      number    of
                  specifically a sparse matrix        consider when creating the      words to keep in the
                  of word counts.     It counts       vocabulary.                     vocabulary, based on word
                  the frequency of words in                                           frequency. [1000, 10000]
                                                    • min_df : The minimum
                  a document and does not             number of documents a         • ngram_range: The lower
                  down-weight the importance          word must be in to be           and upper boundary of
                  of commonly used words.             included in the vocabulary.     the range of n-values for
                  This can be useful for                                              different word n-grams to
                  text classification tasks, as     • max_df : The maximum
                                                                                      be extracted.    { (1,1),
                  it allows the classifier to         number of documents a
                                                                                      (1,2), (1,3) }
                  consider all words equally,         word can be in to be
                  rather than down-weighting          included in the vocabulary.   • binary: Whether to use
                  the importance of commonly                                          binary or frequency counts.
                                                    • max_features:         The
                  used words.                         maximum      number    of
                                                                                      {True, False}
                                                      words to keep in the
                                                      vocabulary, based on word
                                                      frequency.
                                                    • binary: A boolean flag
                                                      indicating   whether     to
                                                      create binary vectors, with
                                                      0/1 values indicating the
                                                      presence/absence     of   a
                                                      word in a document.




                                                    36
                               Table 2: Pre-trained embeddings for text vectorization


Model Name                            Dimensions     Description


all-mpnet-base-v2                     768            This a pre-trained language understanding model that combines
                                                     the advantages of masked language modeling (MLM) and permuted
                                                     language modeling (PLM) to address the limitations of both
                                                     methods. It leverages the dependency among predicted tokens
                                                     through PLM and takes auxiliary position information as input
                                                     to make the model see a full sentence, reducing the position
                                                     discrepancy between pre-training and fine-tuning. This model was
                                                     pre-trained on a large-scale dataset and generates a vector of 768
                                                     dimensions.


all-roberta-large-v1                  1024           This is a pre-trained language understanding model with a vector
                                                     representation of 1024 dimensions.        It was developed as an
                                                     improvement upon the BERT model and was trained using the
                                                     masked language modeling (MLM) objective. It has achieved strong
                                                     performance on natural language processing tasks and can be fine-
                                                     tuned on labeled datasets for specific tasks such as classification or
                                                     language translation.


average_word_embeddings               300            This is a method for converting text into numerical representations,
_glove.6B.300d                                       specifically word embeddings. It uses a pre-trained GloVe model
                                                     to generate 300-dimensional vector representations for each word
                                                     in a document, and then averages these vectors to create a single
                                                     representation for the entire document. This can be useful for
                                                     text classification tasks, as it allows the classifier to consider
                                                     the semantic relationships between words, rather than just their
                                                     frequencies.


distiluse-base-multilingual-          512            This is a pre-trained language understanding model that maps text
cased-v2                                             into a 512-dimensional vector representation. It is a smaller and
                                                     faster version of the popular transformer model, BERT, and has
                                                     been trained on a large multilingual dataset, allowing it to process
                                                     text in multiple languages. It has also been cased, meaning it
                                                     can distinguish between upper and lower case letters. This model
                                                     is useful for natural language processing tasks such as language
                                                     translation and text classification, and can be fine-tuned on labeled
                                                     datasets for specific tasks.




                                                         37
                                          Table 3: Classifier Options I


Method               Description                    Hyperparameters (Options)            Hyperparameters (Used)


LogisticRegression   This is a linear classifier
                     that uses a logistic           • C:     The    inverse    of    the • penalty:       The     type   of
                     function     to     predict      regularization strength,      with   regularization to use:     L1 or
                     the probability of a             higher values indicating less        L2.
                     sample belonging to              regularization.                    • C: Inverse of regularization
                     a particular class.       It   • penalty:        The     type    of   strength. [0.00002, 10000]
                     is commonly used for             regularization to use, either
                     binary       classification      L1 or L2.
                     tasks, but can also            • fit_intercept: A boolean flag
                     be used for multi-               indicating whether to fit an
                     class classification by          intercept term.
                     implementing a one-            • tol: The tolerance for stopping
                     versus-rest approach.            criteria.
                                                    • intercept_scaling: The scaling of
                                                      the intercept term, if it is being
                                                      fitted.
                                                    • class_weight: The class weights
                                                      to use for unbalanced classes.
                                                    • max_iter:       The     maximum
                                                      number of iterations for the
                                                      optimization algorithm.


SGDClassifier        This is a linear classifier
                     that    uses    stochastic • loss: The loss function to use,        • loss: The loss function to use.
                     gradient     descent     to   with options such as "hinge",           ("modified_huber")
                     learn the parameters of       "log",         "modified_huber",      • penalty:       The     type    of
                     the model. The modified       "squared_hinge",               and      regularization to use:     L1 or
                     huber loss function is a      "perceptron".                           L2.
                     smooth approximation • penalty:               The     type     of   • learning_rate: The learning rate
                     of the hinge loss, which      regularization to use,        with      schedule to use. ("optimal")
                     is commonly used for          options    such    as   L1,    L2,    • alpha:     The constant that
                     linear       classification   "elasticnet", and "none".               multiplies   the   regularization
                     tasks.                      • alpha:       The    regularization      term. [0.00002, 1000]
                                                   strength,        with       higher
                                                   values     indicating     stronger
                                                   regularization.
                                                 • l1_ratio:    The proportion of
                                                   L1 regularization to use in the
                                                   elasticnet penalty.
                                                 • tol:    The tolerance for the
                                                   stopping criteria.
                                                 • learning_rate: The learning rate
                                                   for the optimization algorithm,
                                                   with options such as "constant",
                                                   "optimal", and "invscaling".
                                                 • eta0: The initial learning rate for
                                                   the "constant" and "invscaling"
                                                   learning rate schedules.
                                                 • power_t:      The exponent for
                                                   the "invscaling" learning rate
                                                   schedule.




                                                            38
                                            Table 4: Classifier Options II


Method                   Description                  Hyperparameters (Options)            Hyperparameters (Used)


RandomForestClassifier   This is an ensemble
                         classifier    that  uses     • n_estimators: The number of • n_estimators: The number of
                         multiple decision trees        decision trees in the forest.      trees in the forest. [100, 1000]
                         to make predictions.         • criterion:    The function to • max_depth:           The maximum
                         It randomly selects a          measure the quality of a split,    depth of the tree. [10, 100]
                         subset of features to          with options such as "gini" and
                         consider at each split         "entropy".
                         in the tree, which helps     • max_depth:        The maximum
                         to reduce overfitting          depth of the decision tree.
                         and        improve   the     • min_samples_split:            The
                         generalization of the          minimum number of samples
                         model.                         required to split an internal
                                                        node.
                                                      • min_samples_leaf:             The
                                                        minimum number of samples
                                                        required to be at a leaf node.
                                                      • min_weight_fraction_leaf: The
                                                        minimum weighted fraction of
                                                        the sum total of weights required
                                                        to be at a leaf node.
                                                      • max_features: The number of
                                                        features to consider when looking
                                                        for the best split.
                                                      • max_leaf_nodes: The maximum
                                                        number of leaf nodes in the tree.
                                                      • min_impurity_decrease:        The
                                                        minimum decrease in impurity
                                                        required to split the node.
                                                      • bootstrap:      A boolean flag
                                                        indicating    whether     to   use
                                                        bootstrap samples when building
                                                        the trees.
                                                      • oob_score:      A boolean flag
                                                        indicating whether to use out-
                                                        of-bag samples to estimate the
                                                        generalization error.


DecisionTreeClassifier   This is a classifier that
                         uses a tree structure to     • criterion:    The function to • max_depth:           The maximum
                         make decisions based on        measure the quality of a split,   depth of the tree. [5, 100]
                         the features of a sample.      with options such as "gini" and • min_impurity_decrease: A node
                         At each node in the tree,      "entropy".                        will be split if this split induces a
                         the classifier considers a   • splitter: The strategy to use     decrease of the impurity greater
                         single feature and splits      when searching for a split, with  than or equal to this value.
                         the data based on the          options such as "best" and        [0.00002,10000]
                         value of that feature.         "random".
                         The final decision is        • max_depth:        The maximum
                         made based on the path         depth of the tree.
                         taken through the tree.      • min_samples_split:           The
                                                        minimum number of samples
                                                        required to split an internal
                                                        node.
                                                      • min_samples_leaf:            The
                                                        minimum number of samples
                                                        required to be at a leaf node.
                                                      • min_weight_fraction_leaf: The
                                                        minimum weighted fraction of
                                                        the sum total of weights required
                                                        to be at a leaf node.
                                                      • max_features: The number of
                                                        features to consider when looking
                                                        for the best split.
                                                      • max_leaf_nodes: The maximum
                                                        number of leaf nodes in the tree.
                                                      • min_impurity_decrease:       The
                                                        minimum decrease in impurity
                                                        required to split the node.



                                                              39
                                   Table 5: Classifier Options III


Method          Description                   Hyperparameters (Options)           Hyperparameters (Used)


MLPClassifier   This is a classifier that
                uses a neural network         • hidden_layer_sizes:             The • hidden_layer_sizes:     The ith
                with     multiple    layers     number of neurons in each             element represents the number of
                to make predictions.            hidden layer.                         neurons in the ith hidden layer.
                It is commonly used           • activation:      The     activation   [(100,), (100, 100), (100, 100,
                for          classification     function to use, with options         100)]
                tasks and can handle            such as "identity", "logistic", • activation: Activation function
                both continuous and             "tanh", and "relu".                   for the hidden layer. ("tanh",
                categorical data.      The    • solver: The algorithm to use for      "relu")
                number of layers and the        optimization, with options such • alpha: L2 penalty (regularization
                number of units in each         as "lbfgs", "sgd", and "adam".        term) parameter. [0.01, 1]
                layer can be adjusted to      • alpha:       The     regularization
                fit the complexity of the       strength,         with       higher
                task.                           values     indicating      stronger
                                                regularization.
                                              • batch_size:      The number of
                                                samples to use in each iteration
                                                of the optimization algorithm.
                                              • learning_rate: The learning rate
                                                for the optimization algorithm,
                                                with options such as "constant",
                                                "invscaling", and "adaptive".
                                              • learning_rate_init: The initial
                                                learning rate for the "constant"
                                                and "invscaling" learning rate
                                                schedules.
                                              • power_t:      The exponent for
                                                the "invscaling" learning rate
                                                schedule.
                                              • max_iter:        The     maximum
                                                number of iterations to run the
                                                optimization algorithm.
                                              • shuffle: A boolean flag indicating
                                                whether to shuffle the training
                                                data before each epoch.
                                              • tol:    The tolerance for the
                                                stopping criteria.
                                              • warm_start:      A boolean flag
                                                indicating whether to reuse the
                                                solution of the previous call to fit.
                                              • momentum: The momentum for
                                                the optimization algorithm.
                                              • nesterovs_momentum:                A
                                                boolean flag indicating whether
                                                to use Nesterov’s momentum.
                                              • early_stopping:        A boolean
                                                flag indicating whether to use
                                                early stopping to terminate the
                                                optimization early.
                                              • validation_fraction:            The
                                                fraction of the training data
                                                to use as validation data for
                                                early stopping.
                                              • beta_1: The beta 1 parameter
                                                for the Adam optimization
                                                algorithm.




                                                      40
                                         Table 6: Classifier Options III


Method                 Description                  Hyperparameters (Options)          Hyperparameters (Used)


KNeighborsClassifier   This is a non-parametric
                       classifier that uses the     • n_neighbors: The number of • n_neighbors:             Number      of
                       K nearest neighbors            neighbors to use when making a     neighbors to use by default for
                       of a sample to make            prediction.                        kneighbors queries. [10,10000]
                       a prediction.       It is    • weights: The weight function to • weights: weight function used
                       commonly       used    for     use when making a prediction,      in prediction.       ("uniform",
                       classification      tasks      with options such as "uniform"     "distance")
                       and       can      handle      and "distance".
                       both continuous and          • algorithm: The algorithm to use
                       categorical data.     The      for finding the nearest neighbors,
                       number of neighbors            with options such as "brute" and
                       to consider (K) is a           "kd_tree".
                       hyperparameter       that    • leaf_size: The number of points
                       can be adjusted to fit         at which to switch to a brute
                       the complexity of the          force search for the nearest
                       task.                          neighbors.
                                                    • p: The power parameter for the
                                                      Minkowski distance metric.
                                                    • metric:    The distance metric
                                                      to use, with options such as
                                                      "euclidean", "manhattan", and
                                                      "minkowski".
                                                    • metric_params:         Additional
                                                      parameters for the distance
                                                      metric.


SVC                    This is a classifier that
                       uses a support vector        • C: The regularization strength, • C: Penalty parameter C of the
                       machine (SVM) to find          with higher values indicating      error term. [0.00001, -00]
                       the optimal hyperplane         stronger regularization.
                       to separate the different    • kernel: The kernel to use for the
                       classes. It is commonly        decision function, with options
                       used for classification        such as "linear", "poly", "rbf",
                       tasks and can handle           "sigmoid", and "precomputed".
                       both continuous and          • degree:    The degree of the
                       categorical data.    The       polynomial kernel.
                       kernel function used to      • gamma: The kernel coefficient
                       project the data into          for the rbf, poly, and sigmoid
                       a higher dimensional           kernels.
                       space can be adjusted to     • coef0: The independent term
                       fit the complexity of the      in the polynomial and sigmoid
                       task.                          kernels.
                                                    • shrinking:     A boolean flag
                                                      indicating whether to use the
                                                      shrinking heuristic.
                                                    • probability:    A boolean flag
                                                      indicating whether to enable
                                                      probability estimates.
                                                    • tol:    The tolerance for the
                                                      stopping criteria.
                                                    • class_weight: The class weights
                                                      to use for unbalanced classes.
                                                    • verbose: The level of verbosity in
                                                      the output.
                                                    • decision_function_shape: The
                                                      shape of the decision function,
                                                      with options such as "ovo" and
                                                      "ovr".




                                                            41
C    Bias tests

                             Table 7: Bias tests for Aspiration: Religious


Dependent Variables:     ChatGPT        Llama213Bchat        Llama213B   iQualChatGPTaug      iQual
Model:                      (1)              (2)                 (3)            (4)            (5)
Variables
(Intercept)              -0.0347∗∗∗          -0.0716∗∗∗        -0.0139        -0.0032      −1.55 × 10−5
                          (0.0132)            (0.0185)        (0.0126)       (0.0104)        (0.0107)
data_roundR3                0.0059             -0.0206        -0.0233∗         0.0116        0.0248∗∗
                          (0.0127)            (0.0178)        (0.0121)       (0.0100)        (0.0103)
refugee                    -0.0101             -0.0167         -0.0005        0.0039          -0.0044
                          (0.0083)            (0.0116)        (0.0079)       (0.0065)        (0.0067)
num_child                   0.0007             -0.0008         0.0008         -0.0003         -0.0018
                          (0.0021)            (0.0029)        (0.0020)       (0.0017)        (0.0017)
hh_head_sex                -0.0076              0.0115          0.0003        -0.0091         -0.0009
                          (0.0075)            (0.0105)        (0.0072)       (0.0059)        (0.0061)
hh_head_age                 0.0004              0.0006         -0.0004         0.0004         0.0004
                          (0.0003)            (0.0005)        (0.0003)       (0.0003)        (0.0003)
parent_eduyears         1.25 × 10−5           -0.0020∗         -0.0004        -0.0004      1.67 × 10−5
                          (0.0008)            (0.0012)        (0.0008)       (0.0007)        (0.0007)
parent_reledu              -0.0088             -0.0231         -0.0039         0.0073          0.0063
                          (0.0142)            (0.0199)        (0.0136)       (0.0112)        (0.0115)
eld_sex                    -0.0052             -0.0113         -0.0012        -0.0074        -0.0087∗
                          (0.0058)            (0.0081)        (0.0055)       (0.0046)        (0.0047)
eld_age                 −9.15 × 10−5            0.0003         0.0007         -0.0003        -0.0006∗
                          (0.0005)            (0.0007)        (0.0004)       (0.0004)        (0.0004)
hh_asset_index             -0.0022             -0.0041          0.0002        -0.0016         -0.0015
                          (0.0024)            (0.0033)        (0.0023)       (0.0019)        (0.0019)
hh_income                   0.0008              0.0027         0.0010         -0.0009         -0.0006
                          (0.0017)            (0.0024)        (0.0016)       (0.0014)        (0.0014)
int_trauma_exp             0.0019               0.0010        0.0023∗         -0.0004          0.0012
                          (0.0012)            (0.0017)        (0.0012)       (0.0010)        (0.0010)
Fit statistics
Observations                 667                667              667           667              667
R2                         0.01430            0.02935         0.01555        0.02448         0.02312
Adjusted R2               -0.00378            0.01154         -0.00252       0.00659         0.00520
F-test                     0.79090            1.6477           0.86071        1.3679          1.2898
F-test, p-value           0.66022             0.07445         0.58739        0.17634         0.21956
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                        42
                              Table 8: Bias tests for Aspiration: Secular


Dependent Variables:     ChatGPT       Llama213Bchat        Llama213B     iQualChatGPTaug      iQual
Model:                      (1)             (2)                 (3)              (4)            (5)
Variables
(Intercept)              -0.1790∗∗∗       -0.0602∗∗          -0.0812∗∗∗        -0.0247         -0.0113
                           (0.0301)        (0.0243)           (0.0242)        (0.0152)        (0.0137)
data_roundR3             0.0877∗∗∗           0.0308            -0.0205          0.0064          0.0075
                           (0.0289)        (0.0233)           (0.0233)        (0.0146)        (0.0132)
refugee                     0.0243           0.0054            0.0166           0.0137          0.0052
                           (0.0188)        (0.0152)           (0.0152)        (0.0095)        (0.0086)
num_child                    0.0052          0.0054             0.0013          0.0039         0.0012
                           (0.0048)        (0.0039)           (0.0038)        (0.0024)        (0.0022)
hh_head_sex                 0.0130           0.0018            -0.0039         -0.0026         -0.0008
                           (0.0171)        (0.0138)           (0.0138)        (0.0086)        (0.0078)
hh_head_age                 -0.0006      8.01 × 10−5        3.67 × 10−5         0.0002      −4.79 × 10−6
                           (0.0007)        (0.0006)           (0.0006)        (0.0004)        (0.0003)
parent_eduyears           -0.0039∗∗        -0.0030∗            -0.0012        -0.0018∗         -0.0004
                           (0.0019)        (0.0015)           (0.0015)        (0.0010)        (0.0009)
parent_reledu               0.0294          0.0085             -0.0036         -0.0015          0.0127
                           (0.0323)        (0.0261)           (0.0261)        (0.0163)        (0.0147)
eld_sex                     0.0043          -0.0086             0.0100         -0.0008         -0.0012
                           (0.0132)        (0.0106)           (0.0106)        (0.0067)        (0.0060)
eld_age                 −9.1 × 10−5         -0.0006             0.0004      −6.99 × 10−5       -0.0001
                           (0.0011)        (0.0009)           (0.0009)        (0.0005)        (0.0005)
hh_asset_index              -0.0026         -0.0024            -0.0008        0.0046∗          0.0028
                           (0.0054)        (0.0044)           (0.0044)        (0.0027)        (0.0025)
hh_income                   -0.0022          0.0012             0.0036         -0.0008         -0.0008
                           (0.0039)        (0.0032)           (0.0032)        (0.0020)        (0.0018)
int_trauma_exp              -0.0021         -0.0010            -0.0012          0.0003      −7.47 × 10−5
                           (0.0028)        (0.0023)           (0.0023)        (0.0014)        (0.0013)
Fit statistics
Observations                  667              667              667             667              667
R2                         0.09448           0.02157         0.01093          0.01966          0.00522
Adjusted R2                0.07786           0.00362         -0.00721         0.00167         -0.01303
F-test                      5.6861           1.2017          0.60246           1.0930         0.28618
F-test, p-value          2.3 × 10−9          0.27754         0.84103          0.36292         0.99143
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                       43
                           Table 9: Bias tests for Ambition: No Ambition


Dependent Variables:    ChatGPT       Llama213Bchat        Llama213B      iQualChatGPTaug     iQual
Model:                     (1)             (2)                 (3)               (4)           (5)
Variables
(Intercept)              -0.1563∗∗∗      -0.4332∗∗∗         -0.5035∗∗∗          -0.0085         -0.0087
                           (0.0219)        (0.0336)           (0.0352)         (0.0075)        (0.0081)
data_roundR3               0.0406∗       0.0864∗∗∗             -0.0430          -0.0044         -0.0061
                           (0.0210)        (0.0323)           (0.0338)         (0.0072)        (0.0078)
refugee                   0.0385∗∗∗          0.0284          0.0613∗∗∗        0.0136∗∗∗       0.0150∗∗∗
                           (0.0137)        (0.0210)           (0.0221)         (0.0047)        (0.0051)
num_child                   -0.0032          0.0014            -0.0087          -0.0002         -0.0008
                           (0.0035)        (0.0053)           (0.0056)         (0.0012)        (0.0013)
hh_head_sex              -0.0436∗∗∗       -0.0442∗∗         -0.0607∗∗∗          -0.0054       -0.0107∗∗
                           (0.0124)        (0.0191)           (0.0200)         (0.0043)        (0.0046)
hh_head_age             3.79 × 10−5         -0.0012            -0.0011           0.0002          0.0002
                           (0.0005)        (0.0008)           (0.0009)         (0.0002)        (0.0002)
parent_eduyears          0.0061∗∗∗        0.0052∗∗          0.0085∗∗∗           -0.0001         -0.0004
                           (0.0014)        (0.0021)           (0.0022)         (0.0005)        (0.0005)
parent_reledu                0.0138          0.0115            0.0124           -0.0104        -0.0161∗
                           (0.0235)        (0.0362)           (0.0379)         (0.0081)        (0.0087)
eld_sex                     -0.0030          0.0005            -0.0112          -0.0021     9.85 × 10−5
                           (0.0096)        (0.0147)           (0.0154)         (0.0033)        (0.0036)
eld_age                     -0.0004         -0.0011        −3.39 × 10−6      7.4 × 10−5      7.2 × 10−5
                           (0.0008)        (0.0012)           (0.0012)         (0.0003)        (0.0003)
hh_asset_index               0.0042        0.0111∗           0.0148∗∗           -0.0010         -0.0012
                           (0.0039)        (0.0060)           (0.0063)         (0.0014)        (0.0015)
hh_income                    0.0011          0.0025            -0.0009           0.0004         0.0006
                           (0.0028)        (0.0044)           (0.0046)         (0.0010)        (0.0011)
int_trauma_exp               0.0004         -0.0002            -0.0028          -0.0002         -0.0009
                           (0.0021)        (0.0032)           (0.0033)         (0.0007)        (0.0008)
Fit statistics
Observations                 667             667                667             667             667
R2                        0.08578         0.06986            0.08313          0.04273        0.05054
Adjusted R2               0.06900         0.05280            0.06630          0.02516        0.03312
F-test                     5.1135          4.0935             4.9410           2.4325         2.9013
F-test, p-value         3.3 × 10−8      3.49 × 10−6        7.31 × 10−8        0.00426        0.00063
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                      44
                          Table 10: Bias tests for Ambition: Job Secular



Dependent Variables:    ChatGPT     Llama213Bchat      Llama213B    iQualChatGPTaug    iQual
Model:                     (1)           (2)               (3)             (4)          (5)
Variables
(Intercept)             -0.0379∗∗      -0.1023∗∗∗      -0.1063∗∗∗        -0.0030       -0.0041
                         (0.0178)       (0.0229)         (0.0254)       (0.0165)      (0.0159)
data_roundR3              0.0177          0.0197          0.0031         -0.0024       -0.0190
                         (0.0171)       (0.0220)         (0.0244)       (0.0159)      (0.0152)
refugee                  0.0260∗∗         0.0205          0.0135          0.0078        0.0071
                         (0.0112)       (0.0143)         (0.0159)       (0.0103)      (0.0099)
num_child               0.0061∗∗          0.0056          0.0053          0.0036        0.0031
                         (0.0028)       (0.0036)         (0.0040)       (0.0026)      (0.0025)
hh_head_sex               0.0002          0.0121           0.0085        -0.0055       -0.0021
                         (0.0101)       (0.0130)         (0.0144)       (0.0094)      (0.0090)
hh_head_age              0.0008∗          0.0009          0.0003      −5.25 × 10−5     -0.0003
                         (0.0004)       (0.0006)         (0.0006)       (0.0004)      (0.0004)
parent_eduyears           -0.0010        -0.0009          -0.0007     −5.75 × 10−5      0.0001
                         (0.0011)       (0.0014)         (0.0016)       (0.0010)      (0.0010)
parent_reledu             -0.0202        -0.0249       -0.0732∗∗∗        -0.0137       -0.0120
                         (0.0192)       (0.0246)         (0.0273)       (0.0178)      (0.0171)
eld_sex                   0.0022         -0.0004          -0.0010        -0.0095       -0.0106
                         (0.0078)       (0.0100)         (0.0111)       (0.0072)      (0.0070)
eld_age                   -0.0009        -0.0004          -0.0002     −3.47 × 10−5      0.0006
                         (0.0006)       (0.0008)         (0.0009)       (0.0006)      (0.0006)
hh_asset_index            -0.0010        0.0002            0.0012         0.0009        0.0003
                         (0.0032)       (0.0041)         (0.0046)       (0.0030)      (0.0029)
hh_income                 0.0004          0.0034          0.0029         -0.0011       -0.0018
                         (0.0023)       (0.0030)         (0.0033)       (0.0022)      (0.0021)
int_trauma_exp            -0.0024        -0.0008        -0.0050∗∗        -0.0005       -0.0007
                         (0.0017)       (0.0022)         (0.0024)       (0.0016)      (0.0015)
Fit statistics
Observations               667            667             667                 667        667
R2                       0.03887        0.01983         0.02149             0.01108    0.01631
Adjusted R2              0.02123        0.00185         0.00353            -0.00706   -0.00174
F-test                   2.2039          1.1029         1.1967              0.61066   0.90381
F-test, p-value          0.01031        0.35453         0.28112             0.83416   0.54279
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                  45
                       Table 11: Bias tests for Ambition: Vocational Training


Dependent Variables:     ChatGPT        Llama213Bchat        Llama213B   iQualChatGPTaug     iQual
Model:                      (1)              (2)                 (3)            (4)           (5)
Variables
(Intercept)                 0.0016            -0.1262∗∗∗     -0.0516∗∗       0.0120∗∗        0.0104∗
                           (0.0051)            (0.0233)       (0.0207)       (0.0058)        (0.0055)
data_roundR3                -0.0040             -0.0118       -0.0328∗        -0.0085         -0.0063
                           (0.0049)            (0.0223)       (0.0199)       (0.0056)        (0.0053)
refugee                     -0.0021             -0.0014         0.0028         0.0012         -0.0019
                           (0.0032)            (0.0146)       (0.0130)       (0.0036)        (0.0035)
num_child                   -0.0010             -0.0023      -0.0073∗∗        -0.0001          0.0004
                           (0.0008)            (0.0037)       (0.0033)       (0.0009)        (0.0009)
hh_head_sex                 -0.0028            0.0245∗          0.0033        -0.0014          0.0015
                           (0.0029)            (0.0132)       (0.0118)       (0.0033)        (0.0032)
hh_head_age              9.34 × 10−5         6.68 × 10−5       -0.0004        -0.0002         -0.0001
                           (0.0001)            (0.0006)       (0.0005)       (0.0001)        (0.0001)
parent_eduyears             -0.0004           -0.0038∗∗∗       -0.0016       -0.0007∗       -0.0007∗∗
                           (0.0003)            (0.0015)       (0.0013)       (0.0004)        (0.0003)
parent_reledu                0.0026             -0.0235        -0.0186         0.0071          0.0043
                           (0.0054)            (0.0250)       (0.0223)       (0.0062)        (0.0060)
eld_sex                     0.0026               0.0113        0.0038          0.0034          0.0009
                           (0.0022)            (0.0102)       (0.0091)       (0.0025)        (0.0024)
eld_age                 −1.58 × 10−5             0.0009        0.0008         0.0001       7.07 × 10−5
                           (0.0002)            (0.0008)       (0.0007)       (0.0002)        (0.0002)
hh_asset_index           3.77 × 10−5            -0.0010        -0.0036        0.0009           0.0002
                           (0.0009)            (0.0042)       (0.0037)       (0.0010)        (0.0010)
hh_income                    0.0002              0.0016        0.0016         0.0005           0.0005
                           (0.0007)            (0.0030)       (0.0027)       (0.0008)        (0.0007)
int_trauma_exp              -0.0001             -0.0010        -0.0007        -0.0006         -0.0002
                           (0.0005)            (0.0022)       (0.0019)       (0.0005)        (0.0005)
Fit statistics
Observations                 667                667             667            667              667
R2                        0.01689             0.02743         0.01987        0.02429         0.01802
Adjusted R2               -0.00115            0.00958         0.00189        0.00638       6.82 × 10−6
F-test                     0.93641            1.5368           1.1049         1.3565          1.0004
F-test, p-value           0.50962             0.10618         0.35279        0.18220         0.44678
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                        46
                          Table 12: Bias tests for Ambition: Entrepreneur


Dependent Variables:    ChatGPT      Llama213Bchat      Llama213B      iQualChatGPTaug     iQual
Model:                     (1)            (2)               (3)               (4)           (5)
Variables
(Intercept)              -0.0273∗∗        0.0001           -0.0034          -0.0136∗        -0.0092
                          (0.0119)      (0.0112)          (0.0096)          (0.0070)       (0.0073)
data_roundR3               -0.0035       -0.0025            0.0032           0.0014          0.0058
                          (0.0114)      (0.0108)          (0.0092)          (0.0068)       (0.0070)
refugee                    -0.0112       -0.0105           -0.0016           -0.0057        -0.0015
                          (0.0075)      (0.0070)          (0.0060)          (0.0044)       (0.0046)
num_child                   0.0002       0.0017             0.0008          0.0024∗∗       0.0019∗
                          (0.0019)      (0.0018)          (0.0015)          (0.0011)       (0.0012)
hh_head_sex             -0.0186∗∗∗        0.0074          0.0138∗∗            0.0004        0.0036
                          (0.0068)      (0.0064)          (0.0055)          (0.0040)       (0.0041)
hh_head_age                -0.0003      0.0005∗             0.0004            0.0001     4.05 × 10−5
                          (0.0003)      (0.0003)          (0.0002)          (0.0002)       (0.0002)
parent_eduyears             0.0007       -0.0006           -0.0003           0.0007          0.0003
                          (0.0007)      (0.0007)          (0.0006)          (0.0004)       (0.0005)
parent_reledu              0.0047        -0.0092          -0.0183∗           -0.0005         0.0003
                          (0.0128)      (0.0121)          (0.0104)          (0.0076)       (0.0078)
eld_sex                   0.0111∗∗      -0.0086∗         -0.0111∗∗∗           0.0007        -0.0018
                          (0.0052)      (0.0049)          (0.0042)          (0.0031)       (0.0032)
eld_age                     0.0003       -0.0003           -0.0003       −2.76 × 10−5       -0.0001
                          (0.0004)      (0.0004)          (0.0003)          (0.0002)       (0.0003)
hh_asset_index             -0.0025        0.0004           0.0016           -0.0024∗        -0.0015
                          (0.0021)      (0.0020)          (0.0017)          (0.0013)       (0.0013)
hh_income                   0.0003       -0.0007        −6.87 × 10−5          0.0006        0.0013
                          (0.0016)      (0.0015)          (0.0013)          (0.0009)       (0.0009)
int_trauma_exp            0.0024∗∗        0.0001           -0.0004        6.71 × 10−5       -0.0004
                          (0.0011)      (0.0011)          (0.0009)          (0.0007)       (0.0007)
Fit statistics
Observations                667            667               667              667            667
R2                       0.02655         0.02647          0.02927          0.01755         0.01095
Adjusted R2              0.00869         0.00861          0.01146          -0.00047       -0.00719
F-test                    1.4864         1.4819            1.6436          0.97374        0.60359
F-test, p-value          0.12408         0.12581          0.07548           0.47254       0.84009
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                   47
                        Table 13: Bias tests for Ambition: Education High



Dependent Variables:     ChatGPT        Llama213Bchat     Llama213B    iQualChatGPTaug    iQual
Model:                      (1)              (2)              (3)             (4)          (5)
Variables
(Intercept)              -0.1557∗∗∗          -0.2259∗∗∗   -0.3257∗∗∗        0.0112         0.0268
                           (0.0233)            (0.0294)     (0.0298)       (0.0172)      (0.0171)
data_roundR3                 0.0246              0.0196      -0.0146        -0.0020       -0.0192
                           (0.0224)            (0.0283)     (0.0286)       (0.0165)      (0.0164)
refugee                     0.0162              -0.0075       0.0063        0.0095         0.0119
                           (0.0146)            (0.0184)     (0.0186)       (0.0108)      (0.0107)
num_child                    0.0056             -0.0043    0.0096∗∗         0.0009        -0.0005
                           (0.0037)            (0.0047)     (0.0047)       (0.0027)      (0.0027)
hh_head_sex                 0.0154            0.0404∗∗        0.0077        0.0128       0.0211∗∗
                           (0.0133)            (0.0167)     (0.0169)       (0.0098)      (0.0097)
hh_head_age                  0.0007              0.0005      0.0010         -0.0004      -0.0007∗
                           (0.0006)            (0.0007)     (0.0007)       (0.0004)      (0.0004)
parent_eduyears           -0.0036∗∗           -0.0044∗∗      -0.0007        -0.0008       -0.0010
                           (0.0015)            (0.0018)     (0.0019)       (0.0011)      (0.0011)
parent_reledu               -0.0241              0.0226       0.0018        -0.0001        0.0163
                           (0.0251)            (0.0317)     (0.0320)       (0.0185)      (0.0184)
eld_sex                     0.0074               0.0143    0.0337∗∗∗        0.0023         0.0025
                           (0.0102)            (0.0129)     (0.0131)       (0.0075)      (0.0075)
eld_age                 −5.05 × 10−5             0.0002       0.0007        -0.0003        0.0003
                           (0.0008)            (0.0010)     (0.0011)       (0.0006)      (0.0006)
hh_asset_index               0.0027             -0.0039      -0.0009       0.0067∗∗       0.0054∗
                           (0.0042)            (0.0053)     (0.0054)       (0.0031)      (0.0031)
hh_income                    0.0017            0.0065∗        0.0061        0.0006         0.0006
                           (0.0030)            (0.0038)     (0.0039)       (0.0022)      (0.0022)
int_trauma_exp              0.0013               0.0007       0.0022        0.0009        -0.0020
                           (0.0022)            (0.0028)     (0.0028)       (0.0016)      (0.0016)
Fit statistics
Observations                  667               667          667              667          667
R2                         0.03744            0.03659      0.03663         0.01590       0.02704
Adjusted R2                0.01978            0.01891      0.01896         -0.00215      0.00919
F-test                      2.1198            2.0700       2.0724          0.88080       1.5148
F-test, p-value            0.01415            0.01702      0.01687          0.56652      0.11370
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                    48
                       Table 14: Bias tests for Ambition: Education Neutral


Dependent Variables:    ChatGPT      Llama213Bchat       Llama213B    iQualChatGPTaug      iQual
Model:                     (1)            (2)                (3)             (4)            (5)
Variables
(Intercept)             -0.3200∗∗∗     -0.3254∗∗∗        -0.1237∗∗∗       -0.0377∗         -0.0144
                         (0.0331)        (0.0350)          (0.0354)       (0.0223)        (0.0220)
data_roundR3               0.0043       0.0721∗∗            -0.0076       0.0498∗∗          0.0312
                         (0.0318)        (0.0336)          (0.0340)       (0.0214)        (0.0211)
refugee                   0.0227           0.0118         0.0608∗∗∗       0.0268∗           0.0181
                         (0.0207)        (0.0219)          (0.0222)       (0.0140)        (0.0138)
num_child                  0.0039         0.0021            -0.0010      -0.0070∗∗        -0.0060∗
                         (0.0053)        (0.0056)          (0.0056)       (0.0035)        (0.0035)
hh_head_sex              0.0383∗∗          0.0027           0.0084          0.0155          0.0166
                         (0.0188)        (0.0199)          (0.0201)       (0.0127)        (0.0125)
hh_head_age                0.0007       0.0017∗∗            -0.0004     −7.1 × 10−6     −8.23 × 10−5
                         (0.0008)        (0.0009)          (0.0009)       (0.0005)        (0.0005)
parent_eduyears           -0.0020      -0.0077∗∗∗           -0.0022        -0.0014         -0.0012
                         (0.0021)        (0.0022)          (0.0022)       (0.0014)        (0.0014)
parent_reledu              0.0487       -0.0836∗∗            0.0042         0.0187         0.0229
                         (0.0356)        (0.0377)          (0.0381)       (0.0240)        (0.0236)
eld_sex                  0.0318∗∗         -0.0033           -0.0124         0.0022         -0.0044
                         (0.0145)        (0.0154)          (0.0155)       (0.0098)        (0.0096)
eld_age                    0.0005        -0.0023∗           -0.0015        -0.0006         -0.0003
                         (0.0012)        (0.0012)          (0.0013)       (0.0008)        (0.0008)
hh_asset_index            -0.0079        -0.0116∗            0.0076       0.0074∗           0.0048
                         (0.0060)        (0.0063)          (0.0064)       (0.0040)        (0.0040)
hh_income                  0.0031         0.0029             0.0036        -0.0012          0.0014
                         (0.0043)        (0.0046)          (0.0046)       (0.0029)        (0.0029)
int_trauma_exp            -0.0048         -0.0039         -0.0062∗          0.0020         0.0011
                         (0.0031)        (0.0033)          (0.0033)       (0.0021)        (0.0021)
Fit statistics
Observations                667            667               667            667              667
R2                       0.04520         0.05661          0.03922         0.04675         0.02593
Adjusted R2              0.02768         0.03930          0.02159         0.02926         0.00806
F-test                    2.5800         3.2705            2.2248          2.6727          1.4507
F-test, p-value          0.00236         0.00013          0.00952         0.00162         0.13825
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                    49
                         Table 15: Bias tests for Ambition: Education Low


Dependent Variables:    ChatGPT     Llama213Bchat         Llama213B    iQualChatGPTaug      iQual
Model:                     (1)           (2)                  (3)             (4)            (5)
Variables
(Intercept)               -0.0171       -0.1612∗∗∗        -0.0657∗∗∗         0.0112        0.0150∗
                         (0.0118)        (0.0224)          (0.0188)        (0.0089)        (0.0090)
data_roundR3              -0.0123          0.0068            0.0050         -0.0032         -0.0062
                         (0.0113)        (0.0215)          (0.0181)        (0.0086)        (0.0086)
refugee                 0.0157∗∗           0.0230            0.0022         -0.0042       -0.0117∗∗
                         (0.0074)        (0.0140)          (0.0118)        (0.0056)        (0.0056)
num_child                 -0.0016         -0.0014            0.0036          0.0010          0.0004
                         (0.0019)        (0.0035)          (0.0030)        (0.0014)        (0.0014)
hh_head_sex               -0.0075         -0.0048           -0.0066         -0.0038         -0.0077
                         (0.0067)        (0.0127)          (0.0107)        (0.0051)        (0.0051)
hh_head_age               -0.0003      3.83 × 10−6          -0.0002         -0.0002         -0.0002
                         (0.0003)        (0.0005)          (0.0005)        (0.0002)        (0.0002)
parent_eduyears           -0.0002          0.0004           -0.0014          0.0003         -0.0007
                         (0.0007)        (0.0014)          (0.0012)        (0.0006)        (0.0006)
parent_reledu             -0.0051         -0.0289          -0.0347∗         -0.0024         -0.0063
                         (0.0126)        (0.0240)          (0.0203)        (0.0096)        (0.0097)
eld_sex                 0.0173∗∗∗          0.0011            0.0048          0.0048          0.0032
                         (0.0052)        (0.0098)          (0.0083)        (0.0039)        (0.0039)
eld_age                  0.0008∗           0.0003           -0.0007      −9.99 × 10−5    2.42 × 10−5
                         (0.0004)        (0.0008)          (0.0007)        (0.0003)        (0.0003)
hh_asset_index            0.0006          -0.0021           -0.0009         -0.0011         -0.0014
                         (0.0021)        (0.0040)          (0.0034)        (0.0016)        (0.0016)
hh_income                 0.0002          -0.0001            0.0010          0.0002         -0.0003
                         (0.0015)        (0.0029)          (0.0025)        (0.0012)        (0.0012)
int_trauma_exp             0.0003     −8.98 × 10−5           0.0006          0.0008          0.0014
                         (0.0011)        (0.0021)          (0.0018)        (0.0008)        (0.0008)
Fit statistics
Observations               667             667               667              667            667
R2                       0.03226        0.01698            0.01891          0.01307        0.01801
Adjusted R2              0.01450        -0.00106           0.00091         -0.00504      −3.96 × 10−6
F-test                   1.8168         0.94150            1.0507          0.72175         0.99978
F-test, p-value          0.04214         0.50451           0.40003         0.73085         0.44735
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                     50
                       Table 16: Bias tests for Ambition: Education Religious


Dependent Variables:    ChatGPT       Llama213Bchat         Llama213B   iQualChatGPTaug      iQual
Model:                     (1)             (2)                  (3)            (4)            (5)
Variables
(Intercept)                -0.0076        -0.0266∗           -0.0251∗         0.0147        0.0256∗∗
                          (0.0135)        (0.0146)           (0.0141)       (0.0107)        (0.0109)
data_roundR3               -0.0209      -0.0424∗∗∗           -0.0260∗        -0.0037         -0.0028
                          (0.0129)        (0.0140)           (0.0136)       (0.0102)        (0.0105)
refugee                     0.0096        0.0154∗             0.0031          0.0080          0.0075
                          (0.0084)        (0.0091)           (0.0089)       (0.0067)        (0.0069)
num_child                  -0.0017         -0.0007            -0.0021        -0.0012        -0.0030∗
                          (0.0021)        (0.0023)           (0.0022)       (0.0017)        (0.0017)
hh_head_sex                -0.0055          0.0094            -0.0105        -0.0032         -0.0045
                          (0.0077)        (0.0083)           (0.0080)       (0.0061)        (0.0062)
hh_head_age             1.68 × 10−6         0.0001            0.0003      3.15 × 10−6     −3.36 × 10−5
                          (0.0003)        (0.0004)           (0.0003)       (0.0003)        (0.0003)
parent_eduyears            -0.0012         -0.0005            -0.0006        -0.0005        -0.0012∗
                          (0.0008)        (0.0009)           (0.0009)       (0.0007)        (0.0007)
parent_reledu               0.0048       -0.0313∗∗            -0.0153         0.0127          0.0157
                          (0.0145)        (0.0157)           (0.0152)       (0.0115)        (0.0118)
eld_sex                     0.0067          0.0032            0.0053      −4.65 × 10−5       -0.0019
                          (0.0059)        (0.0064)           (0.0062)       (0.0047)        (0.0048)
eld_age                     0.0002          0.0007             0.0004        -0.0002         -0.0003
                          (0.0005)        (0.0005)           (0.0005)       (0.0004)        (0.0004)
hh_asset_index              0.0039       0.0070∗∗∗            0.0015          0.0025         0.0037∗
                          (0.0024)        (0.0026)           (0.0025)       (0.0019)        (0.0020)
hh_income                   0.0002         -0.0010            -0.0003        -0.0015         -0.0010
                          (0.0018)        (0.0019)           (0.0018)       (0.0014)        (0.0014)
int_trauma_exp              0.0014     −9.22 × 10−5          0.0022∗      9.23 × 10−5         0.0009
                          (0.0013)        (0.0014)           (0.0013)       (0.0010)        (0.0010)
Fit statistics
Observations                 667               667              667           667              667
R2                        0.02348            0.05358         0.02626        0.01853         0.03037
Adjusted R2               0.00556            0.03621         0.00839        0.00052         0.01258
F-test                     1.3103            3.0853           1.4698         1.0289          1.7073
F-test, p-value           0.20751            0.00029         0.13050        0.41997         0.06115
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                       51
                             Table 17: Bias tests for Ambition: Marriage


Dependent Variables:    ChatGPT       Llama213Bchat         Llama213B    iQualChatGPTaug      iQual
Model:                     (1)             (2)                  (3)             (4)            (5)
Variables
(Intercept)                0.0031           0.0204           0.0462∗∗∗        0.0029          -0.0052
                          (0.0091)        (0.0126)            (0.0153)       (0.0082)        (0.0078)
data_roundR3                0.0036        -0.0231∗          -0.0479∗∗∗        0.0106         0.0136∗
                          (0.0087)        (0.0121)            (0.0147)       (0.0078)        (0.0075)
refugee                    -0.0091         -0.0068             -0.0032        -0.0008         -0.0002
                          (0.0057)        (0.0079)            (0.0096)       (0.0051)        (0.0049)
num_child                  -0.0002          0.0005             0.0036       -0.0029∗∗       -0.0028∗∗
                          (0.0014)        (0.0020)            (0.0024)       (0.0013)        (0.0012)
hh_head_sex                0.0003          -0.0018              0.0013        0.0006       −4.3 × 10−5
                          (0.0052)        (0.0072)            (0.0087)       (0.0046)        (0.0044)
hh_head_age             4.94 × 10−5     −2.9 × 10−5            -0.0004        0.0003          0.0004∗
                          (0.0002)        (0.0003)            (0.0004)       (0.0002)        (0.0002)
parent_eduyears            -0.0007       -0.0025∗∗∗            -0.0006        -0.0004         -0.0003
                          (0.0006)        (0.0008)            (0.0010)       (0.0005)        (0.0005)
parent_reledu             -0.0162∗         -0.0172             -0.0178        -0.0103         -0.0093
                          (0.0098)        (0.0136)            (0.0164)       (0.0088)        (0.0084)
eld_sex                    -0.0010          0.0077           0.0256∗∗∗      -0.0072∗∗       -0.0079∗∗
                          (0.0040)        (0.0055)            (0.0067)       (0.0036)        (0.0034)
eld_age                   -0.0005∗          0.0001             0.0005         -0.0004         -0.0004
                          (0.0003)        (0.0004)            (0.0005)       (0.0003)        (0.0003)
hh_asset_index             -0.0008         -0.0034           -0.0061∗∗        -0.0014        -0.0023∗
                          (0.0016)        (0.0023)            (0.0028)       (0.0015)        (0.0014)
hh_income                   0.0016          0.0026              0.0025         0.0010         0.0009
                          (0.0012)        (0.0016)            (0.0020)       (0.0011)        (0.0010)
int_trauma_exp              0.0003         -0.0015             -0.0007     −9.1 × 10−5        0.0002
                          (0.0009)        (0.0012)            (0.0014)       (0.0008)        (0.0007)
Fit statistics
Observations                 667               667               667           667              667
R2                        0.02664            0.05043          0.08818        0.02193         0.03117
Adjusted R2               0.00878            0.03301          0.07144        0.00398         0.01339
F-test                     1.4918            2.8945            5.2702         1.2217          1.7535
F-test, p-value           0.12203            0.00065        1.6 × 10−8       0.26352         0.05235
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                       52
                            Table 18: Bias tests for Ambition: Migration


Dependent Variables:     ChatGPT        Llama213Bchat    Llama213B      iQualChatGPTaug    iQual
Model:                      (1)              (2)             (3)               (4)          (5)
Variables
(Intercept)                 0.0020            -0.0100         0.0057           0.0024       0.0082
                           (0.0081)          (0.0091)       (0.0080)          (0.0057)    (0.0067)
data_roundR3                 0.0044           -0.0111        -0.0099           -0.0051     -0.0076
                           (0.0078)          (0.0088)       (0.0077)          (0.0054)    (0.0064)
refugee                    -0.0099∗            0.0009         0.0031           -0.0001     -0.0009
                           (0.0051)          (0.0057)       (0.0050)          (0.0036)    (0.0042)
num_child                    0.0009           -0.0004        0.0018         4.24 × 10−6     0.0006
                           (0.0013)          (0.0014)       (0.0013)          (0.0009)    (0.0011)
hh_head_sex                -0.0080∗            0.0050     8.42 × 10−5          -0.0011     -0.0021
                           (0.0046)          (0.0052)       (0.0045)          (0.0032)    (0.0038)
hh_head_age               -0.0004∗∗           -0.0003    −1.95 × 10−5       3.56 × 10−6    -0.0001
                           (0.0002)          (0.0002)       (0.0002)          (0.0001)    (0.0002)
parent_eduyears         −8.93 × 10−5           0.0003         0.0003           -0.0002    -0.0007∗
                           (0.0005)          (0.0006)       (0.0005)          (0.0004)    (0.0004)
parent_reledu               0.0062            0.0088          0.0081           0.0054       0.0071
                           (0.0088)          (0.0098)       (0.0086)          (0.0061)    (0.0072)
eld_sex                     0.0067∗            0.0014        -0.0043           0.0003       0.0007
                           (0.0036)          (0.0040)       (0.0035)          (0.0025)    (0.0029)
eld_age                  5.58 × 10−5           0.0004    7.59 × 10−5        6.47 × 10−5     0.0001
                           (0.0003)          (0.0003)       (0.0003)          (0.0002)    (0.0002)
hh_asset_index              -0.0014           -0.0001        -0.0004        1.38 × 10−5     0.0009
                           (0.0015)          (0.0016)       (0.0014)          (0.0010)    (0.0012)
hh_income                    0.0003           -0.0002        -0.0002       −2.67 × 10−5    -0.0008
                           (0.0011)          (0.0012)       (0.0010)          (0.0007)    (0.0009)
int_trauma_exp              -0.0009           -0.0004    −9.94 × 10−5           0.0005      0.0004
                           (0.0008)          (0.0009)       (0.0008)          (0.0005)    (0.0006)
Fit statistics
Observations                  667               667           667               667          667
R2                         0.03138            0.00724      0.02165            0.00823     0.01693
Adjusted R2                0.01361           -0.01098      0.00370           -0.00997     -0.00111
F-test                      1.7658           0.39750        1.2059            0.45233      0.93836
F-test, p-value            0.05019            0.96466      0.27459            0.94137     0.50766
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                    53
                        Table 19: Bias tests for Capacity: Reliance On God



Dependent Variables:     ChatGPT        Llama213Bchat     Llama213B    iQualChatGPTaug    iQual
Model:                      (1)              (2)              (3)             (4)          (5)
Variables
(Intercept)              -0.0735∗∗∗          -0.1134∗∗∗   -0.0993∗∗∗         0.0008        0.0066
                          (0.0158)             (0.0217)     (0.0198)       (0.0120)      (0.0117)
data_roundR3               -0.0198            -0.0501∗∗      -0.0186        -0.0073       -0.0080
                          (0.0151)             (0.0208)     (0.0190)       (0.0115)      (0.0112)
refugee                   0.0215∗∗             0.0242∗    0.0325∗∗∗          0.0103        0.0070
                          (0.0099)             (0.0136)     (0.0124)       (0.0075)      (0.0073)
num_child                   0.0004               0.0014       0.0004         0.0012       -0.0006
                          (0.0025)             (0.0034)     (0.0031)       (0.0019)      (0.0019)
hh_head_sex                -0.0002              -0.0085    -0.0230∗∗         0.0097        0.0062
                          (0.0090)             (0.0123)     (0.0112)       (0.0068)      (0.0067)
hh_head_age             −6.25 × 10−5            -0.0003      -0.0003       -0.0005∗       -0.0004
                          (0.0004)             (0.0005)     (0.0005)       (0.0003)      (0.0003)
parent_eduyears            -0.0010              -0.0015      -0.0008        -0.0005       -0.0009
                          (0.0010)             (0.0014)     (0.0012)       (0.0007)      (0.0007)
parent_reledu              -0.0275              -0.0125      -0.0085        -0.0181       -0.0071
                          (0.0170)             (0.0233)     (0.0213)       (0.0129)      (0.0126)
eld_sex                    0.0074                0.0095       0.0077        -0.0012       -0.0020
                          (0.0069)             (0.0095)     (0.0087)       (0.0052)      (0.0051)
eld_age                     0.0007             0.0018∗∗       0.0007         0.0002        0.0003
                          (0.0006)             (0.0008)     (0.0007)       (0.0004)      (0.0004)
hh_asset_index            0.0066∗∗             0.0082∗∗     0.0075∗∗       0.0042∗∗      0.0037∗
                          (0.0028)             (0.0039)     (0.0036)       (0.0022)      (0.0021)
hh_income                   0.0006              -0.0018       0.0002         0.0014        0.0018
                          (0.0021)             (0.0028)     (0.0026)       (0.0016)      (0.0015)
int_trauma_exp             0.0013               -0.0007       0.0007        -0.0002        0.0004
                          (0.0015)             (0.0020)     (0.0019)       (0.0011)      (0.0011)
Fit statistics
Observations                  667               667          667             667            667
R2                         0.02232            0.02407      0.02126         0.01891        0.01410
Adjusted R2                0.00438            0.00617      0.00331         0.00091       -0.00398
F-test                      1.2443            1.3443       1.1841           1.0507        0.77974
F-test, p-value            0.24830            0.18863      0.29035         0.40002        0.67179
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                    54
                           Table 20: Bias tests for Capacity: Ability High


Dependent Variables:     ChatGPT        Llama213Bchat        Llama213B   iQualChatGPTaug     iQual
Model:                      (1)              (2)                 (3)            (4)           (5)
Variables
(Intercept)              -0.1597∗∗∗          -0.1068∗∗∗        -0.0103        -0.0190         -0.0156
                          (0.0253)            (0.0215)        (0.0167)       (0.0135)        (0.0138)
data_roundR3             0.0807∗∗∗              0.0153         -0.0145        0.0050          0.0120
                          (0.0243)            (0.0206)        (0.0160)       (0.0129)        (0.0132)
refugee                    -0.0042             -0.0002          0.0038        -0.0008          0.0021
                          (0.0159)            (0.0134)        (0.0105)       (0.0085)        (0.0086)
num_child                 0.0071∗               0.0048         0.0002          0.0009          0.0014
                          (0.0040)            (0.0034)        (0.0027)       (0.0021)        (0.0022)
hh_head_sex                0.0045               0.0061       -0.0218∗∗         0.0001         -0.0043
                          (0.0144)            (0.0122)        (0.0095)       (0.0077)        (0.0078)
hh_head_age             −3.34 × 10−5            0.0003         -0.0006        0.0001          -0.0002
                          (0.0006)            (0.0005)        (0.0004)       (0.0003)        (0.0003)
parent_eduyears            -0.0024             -0.0010         0.0017         0.0005          -0.0002
                          (0.0016)            (0.0013)        (0.0010)       (0.0008)        (0.0009)
parent_reledu              -0.0200           -0.0648∗∗∗        0.0138          0.0102         0.0130
                          (0.0272)            (0.0231)        (0.0180)       (0.0145)        (0.0148)
eld_sex                    0.0175              -0.0014         0.0117          0.0046         0.0061
                          (0.0111)            (0.0094)        (0.0073)       (0.0059)        (0.0060)
eld_age                    -0.0003             -0.0002          0.0005     5.98 × 10−5        0.0001
                          (0.0009)            (0.0008)        (0.0006)       (0.0005)        (0.0005)
hh_asset_index              0.0013             -0.0035         -0.0008         0.0015         0.0038
                          (0.0046)            (0.0039)        (0.0030)       (0.0024)        (0.0025)
hh_income                   0.0026              0.0016         0.0018          0.0021      2.23 × 10−5
                          (0.0033)            (0.0028)        (0.0022)       (0.0018)        (0.0018)
int_trauma_exp             0.0017               0.0016         0.0011         0.0018           0.0014
                          (0.0024)            (0.0020)        (0.0016)       (0.0013)        (0.0013)
Fit statistics
Observations                  667               667             667             667            667
R2                         0.08009            0.02463         0.01936        0.01452        0.01975
Adjusted R2                0.06321            0.00673         0.00137        -0.00356       0.00177
F-test                      4.7448            1.3763          1.0762         0.80300         1.0982
F-test, p-value          1.8 × 10−7           0.17214         0.37742        0.64763        0.35848
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                        55
                            Table 21: Bias tests for Capacity: Ability Low


Dependent Variables:     ChatGPT       Llama213Bchat           Llama213B    iQualChatGPTaug      iQual
Model:                      (1)             (2)                    (3)             (4)            (5)
Variables
(Intercept)              -0.1072∗∗∗          -0.1025∗∗∗        -0.0482∗∗∗         -0.0090        -0.0170
                          (0.0214)            (0.0221)          (0.0180)         (0.0140)       (0.0138)
data_roundR3             0.0694∗∗∗           0.0658∗∗∗          0.0352∗∗           0.0001         0.0058
                          (0.0205)            (0.0212)          (0.0173)         (0.0134)       (0.0133)
refugee                     0.0103             -0.0070           -0.0106           0.0018         0.0014
                          (0.0134)            (0.0138)          (0.0113)         (0.0088)       (0.0087)
num_child                  -0.0024              0.0012            0.0025         0.0043∗          0.0036
                          (0.0034)            (0.0035)          (0.0029)         (0.0022)       (0.0022)
hh_head_sex              -0.0369∗∗∗           -0.0229∗           -0.0154          -0.0049        -0.0024
                          (0.0122)            (0.0126)          (0.0102)         (0.0080)       (0.0079)
hh_head_age             6.31 × 10−5             0.0002            0.0003      −3.59 × 10−5        0.0002
                          (0.0005)            (0.0005)          (0.0004)         (0.0003)       (0.0003)
parent_eduyears           0.0027∗∗             0.0025∗            0.0015          -0.0009        -0.0004
                          (0.0013)            (0.0014)          (0.0011)         (0.0009)       (0.0009)
parent_reledu              -0.0160             -0.0109           -0.0292          -0.0159      -0.0373∗∗
                          (0.0230)            (0.0238)          (0.0194)         (0.0151)       (0.0149)
eld_sex                     0.0096              0.0097           -0.0016           0.0029        -0.0011
                          (0.0094)            (0.0097)          (0.0079)         (0.0061)       (0.0061)
eld_age                 7.46 × 10−5            -0.0005           -0.0004       9.65 × 10−5    −8.11 × 10−5
                          (0.0008)            (0.0008)          (0.0006)         (0.0005)       (0.0005)
hh_asset_index             -0.0027            -0.0075∗           -0.0016          -0.0017        -0.0005
                          (0.0039)            (0.0040)          (0.0032)         (0.0025)       (0.0025)
hh_income                   0.0037              0.0016            0.0005           0.0007         0.0002
                          (0.0028)            (0.0029)          (0.0023)         (0.0018)       (0.0018)
int_trauma_exp             -0.0004             -0.0033           -0.0009          -0.0019        -0.0012
                          (0.0020)            (0.0021)          (0.0017)         (0.0013)       (0.0013)
Fit statistics
Observations                 667             667                  667              667             667
R2                         0.11546         0.07145              0.03663          0.01581         0.01706
Adjusted R2                0.09923         0.05441              0.01895         -0.00225        -0.00097
F-test                     7.1142           4.1934              2.0723          0.87541          0.94597
F-test, p-value         2.85 × 10−12     2.22 × 10−6            0.01688         0.57210         0.50002
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                          56
                           Table 22: Bias tests for Capacity: Budget High


Dependent Variables:    ChatGPT     Llama213Bchat         Llama213B    iQualChatGPTaug      iQual
Model:                     (1)           (2)                  (3)             (4)            (5)
Variables
(Intercept)              -0.0092         -0.0277          -0.0472∗∗∗         0.0070         -0.0051
                        (0.0109)        (0.0169)            (0.0164)        (0.0105)       (0.0109)
data_roundR3             0.0140          -0.0153           0.0346∗∗          0.0046         0.0027
                        (0.0105)        (0.0162)            (0.0158)        (0.0101)       (0.0105)
refugee                  -0.0067          0.0035             -0.0076         0.0021         -0.0036
                        (0.0069)        (0.0106)            (0.0103)        (0.0066)       (0.0068)
num_child                 0.0008         -0.0013            -0.0043∗         -0.0003        0.0003
                        (0.0017)        (0.0027)            (0.0026)        (0.0017)       (0.0017)
hh_head_sex              -0.0030         -0.0003            -0.0166∗        0.0110∗         -0.0026
                        (0.0062)        (0.0096)            (0.0093)        (0.0060)       (0.0062)
hh_head_age              0.0001          -0.0001              0.0005      1.9 × 10−5     −6.18 × 10−5
                        (0.0003)        (0.0004)            (0.0004)        (0.0003)       (0.0003)
parent_eduyears          0.0013∗          0.0005             0.0007           0.0002         0.0006
                        (0.0007)        (0.0011)            (0.0010)        (0.0007)       (0.0007)
parent_reledu             0.0018         -0.0113             -0.0253         0.0002         0.0042
                        (0.0118)        (0.0182)            (0.0177)        (0.0113)       (0.0117)
eld_sex                   0.0029          0.0068           -0.0158∗∗       -0.0107∗∗        -0.0018
                        (0.0048)        (0.0074)            (0.0072)        (0.0046)       (0.0048)
eld_age                 -0.0007∗      −1.18 × 10−5           -0.0007         -0.0004     −5.27 × 10−5
                        (0.0004)        (0.0006)            (0.0006)        (0.0004)       (0.0004)
hh_asset_index           -0.0021         -0.0007              0.0004        0.0033∗          0.0018
                        (0.0020)        (0.0030)            (0.0030)        (0.0019)       (0.0020)
hh_income               0.0036∗∗       0.0062∗∗∗              0.0011       0.0031∗∗          0.0011
                        (0.0014)        (0.0022)            (0.0021)        (0.0014)       (0.0014)
int_trauma_exp           0.0018∗          0.0014             -0.0004         0.0004         0.0011
                        (0.0010)        (0.0016)            (0.0015)        (0.0010)       (0.0010)
Fit statistics
Observations               667            667                667             667              667
R2                       0.03223        0.02420            0.04615         0.03678         0.01432
Adjusted R2              0.01447        0.00629            0.02865         0.01910         -0.00377
F-test                   1.8149          1.3515            2.6371          2.0807          0.79180
F-test, p-value          0.04242        0.18482            0.00188         0.01636         0.65928
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                     57
                            Table 23: Bias tests for Capacity: Budget Low


Dependent Variables:     ChatGPT        Llama213Bchat        Llama213B   iQualChatGPTaug      iQual
Model:                      (1)              (2)                 (3)            (4)            (5)
Variables
(Intercept)                 0.0064           -0.1465∗∗∗        -0.0080        0.0140           0.0073
                           (0.0145)            (0.0227)       (0.0208)       (0.0177)        (0.0170)
data_roundR3                -0.0225           0.0435∗∗        -0.0352∗        -0.0153         -0.0080
                           (0.0139)            (0.0218)       (0.0200)       (0.0170)        (0.0163)
refugee                     -0.0102             -0.0141      -0.0314∗∗      -0.0223∗∗       -0.0230∗∗
                           (0.0091)            (0.0142)       (0.0130)       (0.0111)        (0.0106)
num_child                    0.0025              0.0023        -0.0014       0.0050∗         0.0059∗∗
                           (0.0023)            (0.0036)       (0.0033)       (0.0028)        (0.0027)
hh_head_sex                -0.0144∗           -0.0233∗         -0.0098        -0.0055         -0.0008
                           (0.0083)            (0.0129)       (0.0118)       (0.0100)        (0.0096)
hh_head_age             −4.17 × 10−5             0.0004       -0.0009∗        -0.0005         -0.0003
                           (0.0004)            (0.0006)       (0.0005)       (0.0004)        (0.0004)
parent_eduyears             -0.0009              0.0010         0.0007        -0.0011         -0.0006
                           (0.0009)            (0.0014)       (0.0013)       (0.0011)        (0.0011)
parent_reledu            6.56 × 10−5            -0.0070        -0.0189        -0.0134         -0.0064
                           (0.0156)            (0.0244)       (0.0224)       (0.0190)        (0.0183)
eld_sex                     0.0079            0.0261∗∗∗         0.0017         0.0120         0.0083
                           (0.0064)            (0.0099)       (0.0091)       (0.0078)        (0.0074)
eld_age                      0.0003             -0.0011         0.0001         0.0002      −3.12 × 10−5
                           (0.0005)            (0.0008)       (0.0007)       (0.0006)        (0.0006)
hh_asset_index               0.0006             -0.0012        -0.0017        -0.0016         -0.0035
                           (0.0026)            (0.0041)       (0.0037)       (0.0032)        (0.0031)
hh_income                    0.0006             -0.0003        -0.0002         0.0008      5.47 × 10−5
                           (0.0019)            (0.0029)       (0.0027)       (0.0023)        (0.0022)
int_trauma_exp              0.0015               0.0028         0.0016        0.0015           0.0007
                           (0.0014)            (0.0021)       (0.0020)       (0.0017)        (0.0016)
Fit statistics
Observations                 667                667             667             667             667
R2                         0.02488            0.03241         0.04124        0.02117         0.01903
Adjusted R2                0.00699            0.01465         0.02365        0.00321         0.00103
F-test                      1.3905            1.8253          2.3443          1.1790          1.0575
F-test, p-value            0.16518            0.04092         0.00601        0.29415         0.39394
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                        58
                  Table 24: Bias tests for Capacity: Awareness Information High



Dependent Variables:    ChatGPT      Llama213Bchat      Llama213B    iQualChatGPTaug     iQual
Model:                     (1)            (2)               (3)             (4)           (5)
Variables
(Intercept)             -0.2264∗∗∗       -0.0253        -0.2251∗∗∗        -0.0274        -0.0159
                         (0.0324)       (0.0232)         (0.0318)        (0.0185)       (0.0182)
data_roundR3               0.0049        -0.0259           0.0386          0.0093        -0.0104
                         (0.0311)       (0.0223)         (0.0305)        (0.0178)       (0.0175)
refugee                   -0.0197         0.0072          -0.0064          0.0175        0.0133
                         (0.0203)       (0.0146)         (0.0199)        (0.0116)       (0.0114)
num_child                  0.0017        -0.0035           0.0004         -0.0011        -0.0010
                         (0.0051)       (0.0037)         (0.0050)        (0.0029)       (0.0029)
hh_head_sex               0.0345∗        0.0060          0.0372∗∗         -0.0074        -0.0044
                         (0.0184)       (0.0132)         (0.0181)        (0.0105)       (0.0103)
hh_head_age                0.0006        0.0004            0.0011      −1.72 × 10−5      -0.0002
                         (0.0008)       (0.0006)         (0.0008)        (0.0005)       (0.0004)
parent_eduyears          -0.0036∗        -0.0004          -0.0032        0.0022∗        0.0029∗∗
                         (0.0020)       (0.0015)         (0.0020)        (0.0012)       (0.0011)
parent_reledu             -0.0068        -0.0314           0.0089         0.0089         -0.0032
                         (0.0349)       (0.0250)         (0.0342)        (0.0199)       (0.0196)
eld_sex                  0.0319∗∗        0.0013            0.0006         0.0128         0.0147∗
                         (0.0142)       (0.0102)         (0.0139)        (0.0081)       (0.0080)
eld_age                    0.0001        -0.0005          -0.0004         -0.0004      3.4 × 10−6
                         (0.0011)       (0.0008)         (0.0011)        (0.0007)       (0.0006)
hh_asset_index            -0.0014         0.0010           0.0018          0.0048         0.0041
                         (0.0058)       (0.0042)         (0.0057)        (0.0033)       (0.0033)
hh_income                  0.0019        0.0011            0.0037         -0.0003         0.0004
                         (0.0042)       (0.0030)         (0.0041)        (0.0024)       (0.0024)
int_trauma_exp             0.0047        0.0021            0.0032          0.0004         0.0011
                         (0.0031)       (0.0022)         (0.0030)        (0.0017)       (0.0017)
Fit statistics
Observations                667            667              667            667             667
R2                       0.02910         0.02823         0.02511         0.01993        0.02743
Adjusted R2              0.01128         0.01040         0.00723         0.00194        0.00959
F-test                    1.6334         1.5831           1.4040          1.1081         1.5372
F-test, p-value          0.07802         0.09173         0.15880         0.35014        0.10605
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                   59
                   Table 25: Bias tests for Capacity: Awareness Information Low


Dependent Variables:    ChatGPT      Llama213Bchat        Llama213B      iQualChatGPTaug      iQual
Model:                     (1)            (2)                 (3)               (4)            (5)
Variables
(Intercept)             -0.3240∗∗∗      -0.6286∗∗∗         -0.4136∗∗∗         -0.0033         -0.0087
                         (0.0300)        (0.0318)           (0.0340)         (0.0076)        (0.0088)
data_roundR3             0.0630∗∗       0.1408∗∗∗             0.0198          -0.0005          0.0025
                         (0.0287)        (0.0305)           (0.0326)         (0.0073)        (0.0085)
refugee                    0.0177        0.0436∗∗             0.0255         0.0101∗∗          0.0071
                         (0.0188)        (0.0199)           (0.0213)         (0.0048)        (0.0055)
num_child                  0.0028         0.0036             -0.0016           0.0003         -0.0009
                         (0.0048)        (0.0050)           (0.0054)         (0.0012)        (0.0014)
hh_head_sex              -0.0284∗        -0.0319∗          -0.0647∗∗∗         -0.0033         -0.0071
                         (0.0170)        (0.0181)           (0.0193)         (0.0043)        (0.0050)
hh_head_age               -0.0005         -0.0004            -0.0007          -0.0001          0.0001
                         (0.0007)        (0.0008)           (0.0008)         (0.0002)        (0.0002)
parent_eduyears         0.0052∗∗∗       0.0063∗∗∗           0.0038∗           -0.0007       -0.0013∗∗
                         (0.0019)        (0.0020)           (0.0021)         (0.0005)        (0.0006)
parent_reledu             0.0506           0.0068            -0.0011          -0.0082         -0.0116
                         (0.0322)        (0.0342)           (0.0365)         (0.0082)        (0.0095)
eld_sex                   -0.0089          0.0080            -0.0116           0.0012         0.0006
                         (0.0131)        (0.0139)           (0.0149)         (0.0033)        (0.0039)
eld_age                   -0.0002         -0.0004            -0.0004           0.0001      −3.79 × 10−6
                         (0.0011)        (0.0011)           (0.0012)         (0.0003)        (0.0003)
hh_asset_index             0.0053          0.0078             0.0094           0.0013          0.0005
                         (0.0054)        (0.0057)           (0.0061)         (0.0014)        (0.0016)
hh_income                  0.0009          0.0024             0.0025           0.0009          0.0012
                         (0.0039)        (0.0041)           (0.0044)         (0.0010)        (0.0011)
int_trauma_exp            -0.0004         -0.0005         −5.23 × 10−5         0.0004         -0.0002
                         (0.0028)        (0.0030)           (0.0032)         (0.0007)        (0.0008)
Fit statistics
Observations                 667            667                667             667              667
R2                        0.05864        0.15437            0.04177          0.01835         0.01859
Adjusted R2               0.04137        0.13885            0.02419          0.00034         0.00058
F-test                     3.3950         9.9489             2.3757           1.0187          1.0323
F-test, p-value         7.7 × 10−5     4.8 × 10−18          0.00532          0.42946         0.41682
IID standard-errors in parentheses
Signif. Codes: ***: 0.01, **: 0.05, *: 0.1




                                                     60
                                Figure 7: Refugee and Male eldest child coefficients for all annotations

                                Aspiration: Religious                                  Aspiration: Secular                                     Ambition: No Ambition                                  Ambition: Job Secular


 Refugee status



Male eldest child
                       00



                                     01



                                                02



                                                               03

                                                               50



                                                                                    25



                                                                                                  00



                                                                                                                25



                                                                                                                              50


                                                                                                                                           4



                                                                                                                                                        2



                                                                                                                                                                       00



                                                                                                                                                                                    02




                                                                                                                                                                                                  50


                                                                                                                                                                                                             25


                                                                                                                                                                                                                          00


                                                                                                                                                                                                                                      25
                                                                                                                                         .0



                                                                                                                                                      .0
                    0.



                                   0.



                                               0.



                                                            0.

                                                            .0



                                                                                  .0



                                                                                                   0



                                                                                                                0



                                                                                                                             0




                                                                                                                                                                      0.



                                                                                                                                                                                   0.




                                                                                                                                                                                                .0


                                                                                                                                                                                                           .0


                                                                                                                                                                                                                         0


                                                                                                                                                                                                                                       0
                                                                                                                                       −0



                                                                                                                                                    −0
                                                                                                0.



                                                                                                             0.



                                                                                                                          0.




                                                                                                                                                                                                                      0.


                                                                                                                                                                                                                                    0.
                                                          −0



                                                                                −0




                                                                                                                                                                                              −0


                                                                                                                                                                                                         −0
                         Ambition: Vocational Training                            Ambition: Entrepreneur                                      Ambition: Education High                              Ambition: Education Neutral


 Refugee status



Male eldest child
                   1



                                00



                                           01



                                                        02


                                                                       2


                                                                                   1


                                                                                             00


                                                                                                       01


                                                                                                                    02




                                                                                                                                          0


                                                                                                                                                    5


                                                                                                                                                                  0


                                                                                                                                                                             5


                                                                                                                                                                                        0



                                                                                                                                                                                                  2

                                                                                                                                                                                                         00


                                                                                                                                                                                                                    02


                                                                                                                                                                                                                               04


                                                                                                                                                                                                                                           06
                .0




                                                                    .0


                                                                                .0




                                                                                                                                          5


                                                                                                                                                    2


                                                                                                                                                              00


                                                                                                                                                                           02


                                                                                                                                                                                   05



                                                                                                                                                                                               .0
                              0.



                                          0.



                                                     0.




                                                                                         0.


                                                                                                       0.


                                                                                                                 0.




                                                                                                                                       .0


                                                                                                                                                 .0




                                                                                                                                                                                                        0.


                                                                                                                                                                                                                  0.


                                                                                                                                                                                                                             0.


                                                                                                                                                                                                                                           0.
              −0




                                                                −0


                                                                            −0




                                                                                                                                                                                             −0
                                                                                                                                                             0.


                                                                                                                                                                       0.


                                                                                                                                                                                   0.
                                                                                                                                   −0


                                                                                                                                               −0
                              Ambition: Education Low                        Ambition: Education Religious                                       Ambition: Marriage                                     Ambition: Migration                     Model
                                                                                                                                                                                                                                                   Expert annotations
 Refugee status                                                                                                                                                                                                                                    iQual
                                                                                                                                                                                                                                                   iQual + ChatGPT aug
                                                                                                                                                                                                                                                   ChatGPT
Male eldest child
                                                                                                                                                                                                                                                   Llama−2 13B
                                                                                                                                                                                                                                                   Llama−2 13B chat
                    01



                                   00



                                               01



                                                               02



                                                                           00




                                                                                             01




                                                                                                              02




                                                                                                                                        04


                                                                                                                                                    02


                                                                                                                                                                  00


                                                                                                                                                                              02




                                                                                                                                                                                                  0


                                                                                                                                                                                                            5


                                                                                                                                                                                                                      0


                                                                                                                                                                                                                                5


                                                                                                                                                                                                                                            0
                                                                                                                                                                                               00


                                                                                                                                                                                                         00


                                                                                                                                                                                                                   01


                                                                                                                                                                                                                             01


                                                                                                                                                                                                                                         02
                   .



                                0.



                                            0.



                                                          0.



                                                                           0.




                                                                                             0.




                                                                                                            0.




                                                                                                                                      .


                                                                                                                                                   .


                                                                                                                                                              0.


                                                                                                                                                                            0.
                −0




                                                                                                                                   −0


                                                                                                                                                −0




                                                                                                                                                                                            0.


                                                                                                                                                                                                      0.


                                                                                                                                                                                                                0.


                                                                                                                                                                                                                          0.


                                                                                                                                                                                                                                      0.
                             Capacity: Reliance On God                             Capacity: Ability High                                       Capacity: Ability Low                                 Capacity: Budget High


 Refugee status



Male eldest child
                  4




                                   2




                                                00




                                                                02


                                                                         5




                                                                                            0




                                                                                                            5




                                                                                                                             0



                                                                                                                                           1



                                                                                                                                                         00



                                                                                                                                                                           01




                                                                                                                                                                                                2


                                                                                                                                                                                                           1


                                                                                                                                                                                                                     00


                                                                                                                                                                                                                                    01
                .0




                                 .0




                                                                       02




                                                                                         00




                                                                                                         02




                                                                                                                          05



                                                                                                                                         .0




                                                                                                                                                                                              .0


                                                                                                                                                                                                         .0
                                               0.




                                                               0.




                                                                                                                                                        0.



                                                                                                                                                                       0.




                                                                                                                                                                                                                   0.


                                                                                                                                                                                                                               0.
              −0




                               −0




                                                                                                                                       −0




                                                                                                                                                                                            −0


                                                                                                                                                                                                       −0
                                                                       .



                                                                                       0.




                                                                                                        0.




                                                                                                                         0.
                                                                    −0




                               Capacity: Budget Low                    Capacity: Awareness Information High                        Capacity: Awareness Information Low


 Refugee status



Male eldest child
                         2



                                     1



                                               0



                                                          1


                                                                        25



                                                                                         0



                                                                                                     5



                                                                                                                   0



                                                                                                                                   4


                                                                                                                                            3


                                                                                                                                                      2


                                                                                                                                                                1


                                                                                                                                                                             0


                                                                                                                                                                                     1
                       .0



                                   .0



                                              0



                                                           0




                                                                                       00



                                                                                                   02



                                                                                                                 05



                                                                                                                                 .0


                                                                                                                                          .0


                                                                                                                                                    .0


                                                                                                                                                              .0


                                                                                                                                                                              0


                                                                                                                                                                                      0
                                           0.



                                                        0.



                                                                      .0




                                                                                                                                                                           0.


                                                                                                                                                                                   0.
                    −0



                                −0




                                                                                                                              −0


                                                                                                                                        −0


                                                                                                                                                 −0


                                                                                                                                                             −0
                                                                                  0.



                                                                                                  0.



                                                                                                              0.
                                                                    −0




                                                                                                                         Coefficient



                                                                                                                                   Note :




                                                                                                                                       61