050-2793vietnam vol3.qxd 8/24/04 9:41 AM Page 1 VOLUME 3: VIETNAM STUDY Vietnam Reading and Mathematics Assessment Study 366 Vietnam Reading and Mathematics Assessment Study Preface In 2000, the Vietnam Ministry of Education and Training (MoET) launched a large scale monitoring study of primary education. The study tested pupils and their teachers in the last grade of primary education (Grade 5), using a sample cross-sectoral survey in two key subject areas, reading comprehension in Vietnamese and mathematics. This volume is the second of three. Each volume contains different aspects of the study carried out in Vietnam in 2001. Volume 1 presents the study, summarizes the main issues and proposes policy recommendations. Volume 2 presents detailed findings of the study structured around ten chapters. Volume 3 provides full technical details of the design and conduct of the study: Chapter 1: Research Design and Conduct of the Vietnam Grade 5 Study Chapter 2: Pupil Test Development and Calibration Chapter 3: The development of the teacher tests, equating pupil and teacher results, and international benchmarking Chapter 4: Sample Design Procedures: the Vietnam Grade 5 Survey The study was conducted by the MoET in Vietnam. Many people were involved in the study. We would like to acknowledge our sincere thanks to the following people: the late Vice-Minister Le Vu Hung, who provided the oversight and guidance for the study; Vice-Minister Dang Huynh Mai responsible for primary education; senior directors and heads of departments of MoET, National Education Institutes, which provided specialists and researchers to conduct this study; provincial and district education offices, whose staff collected the data from over 3600 schools throughout Vietnam in an exemplary fashion; Dr. Nguyen Quoc Chi, National Manager of the Primary Education Project, who was instrumental in conducting the study and who provided special insights into the problems of primary education; Professor Dang Ba Lam, Director of the National Institute of Educational Development (NIED), who provided a member of the data team; Professor Tran Kieu, Director of the National Institute of Educational Science (NIES), who furnished researchers for the questionnaire committee, and also Professor Do Dinh Hoan and Dr. Do Cong Vinh who worked on the study. Professor Hoan and his team were responsible for test development. Dr. Vinh organized the data editing, entry, cleaning and analysis. Members of the international support team included Dr. Kenneth Ross, Dr. Mioko Saito and Stephanie Dolata from IIEP/UNESCO; Professor Patrick 367 Vietnam Reading and Mathematics Assessment Study Griffin of the Assessment Research Center of Melbourne University; Miyako Ikeda who guided the data entry, cleaning and merging of files for nearly 75,000 records and who conducted many of the subsequent statistical analyses, and finally the principal architect of the study who guided and helped the international advisory and the national researchers, Emeritus Professor Neville Postlewaite of Hamburg University. Finally, we would like to thank the British Department for International Development and the Canadian Agency for International Development for their strong and generous support to the undertaking of this study. Christopher Shaw and Mai Thi Thanh World Bank in Vietnam August 2004 368 Vietnam Reading and Mathematics Assessment Study Chapter 1 RESEARCH DESIGN AND CONDUCT OF THE VIETNAM GRADE 5 STUDY 1 Towards the end of 1999, the Ministry of Education and Training (MOET) decided to undertake a survey of educational achievement in primary school in the whole of Vietnam. There was a great deal of anecdotal evidence about the system but there was no systematically collected and analysed body of data. To this end the MOET decided to launch a sample survey. The aim of this chapter is to describe the decisions made by the MOET about the study and the steps undertaken to implement the study from beginning to end. Initial decisions Which grade level? There had been a small study of student learning performance undertaken in five provinces (Griffin, 1998). One of the more remarkable findings from that study was that the Grade 3 pupils in Hanoi had higher achievement levels than the pupils in Grade 5 in the other four provinces. At first it was thought that perhaps both grade levels should be assessed in this new study. However, the small study had been in five provinces and with a sample of only 14 schools per province. In the new study it was hoped to draw a sample of schools in each province such that there would be stability of provincial estimates of means and percentages. A quick calculation showed that 60 schools per province would have to be tested to achieve 7.5 percent error for a percentage. Given that there were 61 provinces this meant a sample of 3660 schools. The magnitude of the operation was huge. It was decided that only Grade 5 pupils would be assessed. The target population for the study became 'All pupils enrolled, on the date of testing, in Grade 5 in all schools having a Grade 5 in Vietnam.' The details of how the sample was drawn, the response rates, the subsequent post-stratification and the finally the weighting procedures used have been presented in detail in Chapter 4 in this publication. 1This chapter was written by Do Cong Vinh of the National Institute for Educational Sciences 369 Vietnam Reading and Mathematics Assessment Study Which subject matters? The key subjects in primary schools are reading and mathematics. But, the most important is reading. If a child cannot read then he or she cannot even read a mathematics textbook. It was thus decided to restrict the testing in this first, now large, assessment study to reading comprehension in the Vietnamese language and mathematics. Depending on how this sample survey went, a further study could always be undertaken at a later date (and perhaps on a smaller sample) on writing, listening comprehension, and speaking Vietnamese. A preliminary decision was taken to have tests with a mean of 500 and a standard deviation of 100. At the same time, the items should be broken down into levels of reading (or mathematical) skills so that the curriculum developers could see exactly the kinds of skills that were well-mastered and also poorly-mastered. There were some stringent constraints on the tests. It was assumed that a test item would take, on average, about one minute to answer. Since there were to be tests in Vietnamese and mathematics and also a questionnaire to answer for pupils then it was deemed that each test should not take more than about one hour to complete. At the same time, it was hoped that certain sub-scores could be produced (narrative prose, expository prose, and document reading in reading, and number, measurement and space in mathematics) but with a total of 60 items per test, there are limits on what a test can be used for. The details of the test development and subsequent scoring and scaling have been presented in Chapter 2 in this volume. And which teachers? In previous cross-sectional sample survey studies such as the several IEA studies, it has been hard to establish strong relationships between teacher characteristics or behaviours and pupil achievement. This is because it is assumed that the teacher whom a pupil has in Grade 5 is only a proxy for the teachers that pupil had in Grades 4, 3, 2, and 1. In other words, it is the staff as a whole that is more important than the individual teacher that the pupil happens to have this year. Nevertheless, there is always hope that a study will yield some important teacher-pupil relationships and since this was a first study of its kind in Vietnam, a compromise position was taken. Since only twenty pupils per school were to be sampled and since some schools had up to ten classes in Grade 5, it would be a large undertaking to test all teachers whose pupils were drawn and even then there might be unstable estimates of achievement for a class. It was decided to draw two teachers per school at random. It was recognised that this would mean that there would be a lot of pupils without a teacher in any analyses involving teachers. All teachers in Vietnam are said to be general teachers. That is, they are expected to teach Vietnamese and mathematics to the children in their classes. As it turned out, 50.1 percent of pupils were associated with a teacher in analyses involving teachers. This was considered to be a large enough 'sample' for teacher/pupil analyses for the whole country. There was some worry that the teacher half sample was not the same as the non-teacher half sample. In other words, there was fear that some bias might have occurred in the selection of teachers. It can be seen from Table 1.1 that the two half 370 Vietnam Reading and Mathematics Assessment Study samples were very similar and that it is therefore reasonable to assume that the half sample to be analysed to examine the teacher pupil link is a reasonable approximation to all of Grade 5 pupils and their schools. It can be seen that the whereas the pupils in the total sample had a mean score of 500 for math and English, the pupils with teachers in the merged file had a mean of 498.8 and the pupils without teachers had a mean of 501.1. The standard deviations were also similar. The reading scores were similar. So were the values for sex and age. The pupils with teachers had slightly fewer possessions, parents with slightly fewer years of education, slightly more grade repeating, slightly less advantaged home background and in schools with slightly fewer resources. But, in general, the two groups were very similar. Table 1.1 Comparison of two half samples: teachers and no-teachers Reading Sex Age Total possessions Years parents Times grade Home School score Math score (% girls) in months at home education repeat background resources Mean SE Mean SE % SE Mean SE Mean SE Mean SE Mean SE Mean SE Mean SE Pupil with teacher 497.6 1.51 498.3 1.45 48 0.3 135 0.10 9.5 0.03 14.9 0.07 0.2 0.004 -0.08 0.02 10.4 0.01 Pupil without teacher 502.1 1.52 501.5 1.63 48 0.4 134 0.10 9.9 0.04 15.5 0.07 0.2 0.004 0.09 0.01 11.5 0.11 Total 500.0 1.30 500.0 1.34 48 0.3 134 0.08 9.7 0.03 15.2 0.05 0.2 0.003 0.01 0.01 11.0 0.09 It must be recalled however when dealing with the linkages between pupils and teachers that they are based on the half sample above. But for the general descriptive statistics for all teacher variables have been based on two Grade 5 teachers per school. Establishing research questions and developing dummy tables and questionnaires Establishing the research questions With the initial decisions having been taken, it was time to begin to prepare the study in detail. The most important issue was what the MOET top echelons wanted out of the study. A 'committee' of 36 senior persons from the MOET spent half a day discussing in small group sessions and in plenary session in order to produce a final set of research questions for the study. There were many research questions that were listed and a selection of these has been presented in detail in Appendix1.1. The advantage of this approach was that it answered the questions of immediate interest to the officials of the Ministry of Education and Training but the disadvantage was that may have inadvertently omitted questions of more theoretical interest. 371 Vietnam Reading and Mathematics Assessment Study A summary of the major questions have been listed below. 1. Policy questions related to educational inputs a) What were the characteristics of Grade 5 pupils? b) What were the characteristics of Grade 5 teachers? c) What were the teaching conditions in Grade 5 classrooms and in primary schools? d) What aspects of the teaching function designed to improve the quality of education were in place? e) What was the general condition of school buildings? f) What level of access did pupils have to textbooks and library books? 2. Specific questions relating to a comparison of reality in the schools and the benchmarks set by the MOET Were the following benchmarks met? (total school enrolment, class size, classroom space, staffing ratio, sitting places, writing places, chalkboard, classroom furniture, classroom supplies, academic qualification of school heads, professional qualification of school heads, etc.) 3. Have the educational inputs to schools been allocated in an equitable fashion? a) What was the equity of material resource inputs among provinces and among schools within provinces? b) What was the equity of human resource inputs among provinces and among schools within provinces? 4.What was the level of achievement of Grade 5 pupils overall and in the various domains of reading and mathematics? What was the level of Grade 5 teachers in reading and mathematics? a) What percentages of pupils reached the different levels of skills in reading and mathematics? b) What percentages of pupils reached the benchmarks of having sufficient reading and maths so as to be able to: (i) cope in Vietnamese society and (ii) study independently in Grade 6. c) What were the differences in the different skill levels in both reading and mathematics between (i) boys and girls; (ii) pupils in low average and high socio-economic groups; and (iii) pupils in urban, rural, and remote areas? d) Were the pupil 'elite' (upper 5 percent) performances similar in different regions and economically advantaged areas? e) To what extent did the performance 'tails' (lowest five percent) differ across regions and economically advantaged strata? 372 Vietnam Reading and Mathematics Assessment Study 5. What were the relative influences of variables on achievement? a) What were the major variables accounting for variance in the variance in reading and mathematics achievement? b) What were the major variables that differentiated between the most and least effective schools? Developing the dummy tables For each research question one or more dummy tables were developed. For example, the first set of research questions have been reproduced below. There is the general policy question in bold followed by a set of smaller questions which together answer the larger research question. The question in bold is in two parts. Only the first part has been used for this example. Example of Policy question What are the characteristics, including home background, of the Grade 5 pupils? Do these characteristics and the home background have an influence on achievement? What is the age distribution of Grade 5 pupils? Are there distribution patterns requiring corrective action and/or having an influence on teaching methods and/or the curriculum? What is the sex distribution of Grade 5 pupils? Are there imbalances in the enrolment of male and female pupils requiring corrective action? How many books are there in pupils' homes? What is the distribution of possessions at home? How regularly do Grade 5 pupils eat meals? What is the level of the parents' education of Grade 5 pupils? (Add) What is the ethnic group of the children? These questions were transformed into a dummy table as given below. From the dummy table it can be seen that a breakdown of results was to be required by region and within region by school location (isolated, rural and urban). This implied having an identification code for regions and a variable (probably from the school questionnaire), about the location of the school. The dummy table also implied a question about the sex of pupil to be reported in percentage terms, a question trying to elicit accurate information on the number of books if the home to be reported as an actual number, and a question acting as a proxy for wealth of the home but reported in terms of the number of possessions in the home from a list known to differentiate among homes. This was to be reported as the mean number of items per home. Next there was to be a question on the number of meals a pupil ate per day, a question to elicit the number of years of education of both parents added together and finally the ethnic affiliation of pupils to be reported as the percentage of pupils from a non-Kinh family. 373 Vietnam Reading and Mathematics Assessment Study In all there were about 100 dummy tables. From the dummy tables it was possible to see the information required and the form of analysis of the data to be used in order to complete the tables. Example of dummy table Books at home Possessions at Parent Ethnic group Regions School Sex No. of Meals Location (% Female) (Number) home education Affiliation (Index) per day (Yrs.) (% non-Kinh) % SE M SE M SE M SE M SE % SE Isolated Red River Delta Rural Urban Isolated Region 2 Rural Urban Etc. Isolated Vietnam Rural Urban It was possible to list all of the variables required and then to decide if the information to construct a variable required one or more questions in a questionnaire. Then it was a matter of deciding the best source of information in the sense of obtaining valid information. A questionnaire committee was formed consisting of 14 persons. The names of these persons and their affiliations have been given in Appendix 1. 2. The meetings to produce the first version of the questionnaires lasted several days and demanded a great deal of discussion. Three questionnaires were produced; a pupil questionnaire, a teacher questionnaire, and a school questionnaire. The questions were reviewed by peers in the National Institute of Educational Sciences and after several amendments they were trialled in the province of Thanh Hoa (see below). Following the trialling, several questions were further amended. The final questionnaires used in the study have been presented in Appendix1. 3. It will be seen that the variable names have also been added to the questions. It was unfortunate that a check on the translation of the questionnaires from English into Vietnamese was not made before final data collection. One result of this is that the data for Question 26.08 on the Teacher Questionnaire and all of Question 46 on the School Questionnaire should not be used. The pilot study or trialling the instruments 374 Vietnam Reading and Mathematics Assessment Study Thanh Hoa was the province selected for the trialling of all procedures and instruments. Thanh Hoa was seen as an average province that represented most conditions found in Vietnam. Some expressed doubt about this and wished to see the evidence. But the trialling took place in Thanh Hoa without a discussion of evidence. In the end and after the data had been collected in the main study it was seen that Thanh Hoa was indeed more or less average. It had an average intraclass correlation for a province and the values for many variables including scores were average. Project team members conducted a data collection-training workshop and all trial data was collected on a single day. Volunteer teachers took both tests and separate samples of pupils undertook one of the tests of mathematics or reading and completed the pupil questionnaire. The teacher mathematics and reading tests was trialled as a single trial form. All was said to have gone well. The analyses of the items and of the questionnaire questions was undertaken and the final instruments prepared. Preparing for, and conduct of, the main study The Pilot study had taken place in October, 2000 and the instruments had been finalised immediately following that. The preparation for the main study began in December, 2000. The following were the main steps taken. These have been presented with times next to each major step. The data collection and data entry was a very large task. It was accomplished in a relatively short space of time. 1. December 2000-January 2001. Changing the school ID. In the pilot study it had been seen that the School ID that had been used was difficult for the data collectors to follow. Furthermore, it was recognised that if the Grade 5 study data was to be linked to the population census data at the commune level then a new School ID would be required. The school ID used for Thanh Hoa pilot study was changed to a new school ID which was compatible with the population census. There are two reasons of using a new ID; first, from the experience of Thanh Hoa, it was shown that the school ID which did not convey the information about commune, district, province, and region was difficult to use; second, a new ID which can be linked to other studies are necessary. The new school ID can be linked to the population census at commune level. This transformation of school ID was done by Mr. Vinh's team under the instruction of Dr. K. Ross. The ID system used has been described in Appendix1. 4. 2. February, 2001, Week 1. The questionnaires and tests were compiled into booklets, one for the school heads, one for the teachers and one for 375 Vietnam Reading and Mathematics Assessment Study the pupils. After checking they were sent to the publishing company. As the proofs of the booklets were ready, the research team spent a lot of time proof reading and correcting errors. 3. February, 2001, Week 2. The data collection manuals for coordinators and for data collectors that had been used in the pilot study were revised, where necessary, and sent for duplication. 4. February 12, 2001. It was on this day that the official letter from the Ministry of Education and Training (MoET) was sent to the provincial directors of education. This letter explained why the data collection was taking place, how it would be conducted, in which schools in each province and when. 5. March 2001, Weeks 1-3. Lists of schools (IDs and names) in the sample were prepared. School ID stickers were prepared for the packages of booklets to go to a school. A package consisted of one school head booklet, two teacher booklets, and 21 pupil booklets (one pupil booklet was a spare). The stickers were given to the publisher. A small national team led by Mr. Vinh supervised the publisher's preparation of the packages for each province. Sixty packages of booklets were sent to each provincial educational office. The number of schools tested was 3639 (of the 3660 schools some had two groups of 20 pupils because they were very large and hence the reduction to 3639 schools) and each school was visited for two days. 6. March 2001, Week 2. It was in this week that the lists of provincial coordinators and data collectors were collected. In each province, three people were selected as coordinators. One of the three was a vice- director of the provincial educational office and the other two were people from the primary school section of the provincial educational office. Within each province, data collectors were selected from officers in the district educational offices. The number of data collectors selected from a district office depended on the number of schools selected in the sample in that district. For example, in district A, if three schools had been selected in the sample, then three data collectors were selected. There was one data collector per school. This procedure was a change from what had been done in the pilot study. In the pilot study in Thanh Hoa, the data collectors had been selected from vice school heads of secondary schools. After the pilot study, there was a discussion that district educational officers would be more appropriate as data collectors. 7. March 27 - April 4, 2001.Two central training sessions for coordinators were held. One session was held in Hanoi from March 27 to 29, 2001 376 Vietnam Reading and Mathematics Assessment Study for 31 provinces in the northern part of the country. The other session was held in Ho Chi Minh City from April 2 to 4, 2001 for the 30 provinces in the southern part of the country. There were approximately 100 coordinators in each session. The national team led by Mr. Vinh prepared materials for these training sessions. The teaching materials included the following: (a) how to fill ID numbers in the boxes on the cover sheets of booklets; (b) how to select 2 teachers and 20 pupils in each school (random sampling); and (c) how to conduct the practice sessions on how pupils were to answer the questionnaires and tests. Special practice questions and test items had been prepared so that pupils had practice in answering the different item types and question formats. The manual for coordinators was prepared. It has been presented in Appendix 1. 5 so that when the survey is repeated there is a record of how the survey was conducted. 8. April 9-10, 2001. Training sessions were conducted for data collectors in each province. It was the provincial coordinators who ran these training courses. Members of National team supervised the training sessions in Nam Dinh, Ninh Binh, Ha Nam, Phu Tho, Bac Giang, and Bac Ninh. It was felt that the training sessions had gone well. 9. April 12-13, 2001. Full data collection dates. In all there were 4,405 persons involved in the data collection. Each sample school needed one data collector. This person was selected from the local district educational officers; each district having a school in the sample selected a coordinator. Each province formed a small committee of three persons to organize the data collection within a province. On the first day, a data collector went to a school, prepared the room allocated for conducting the data collection for pupils and teachers, and informed the pupils and teachers who were selected in the sample about the study (and, where necessary, they also instructed them on how to take multiple choice tests). On the morning of the second day, data were collected from the pupils and the school head. After lunch, data were collected from the teachers. The national team supervised the data collection in some schools in Hai Phong, Quang Ninh, Hai Duong, and Hung Yen. At the same time, there were some people at a central office in Hanoi (in the MOET Primary Education Project Office at 118 Ba Trieu, Hanoi) ready to answer questions from schools by phone. (These 377 Vietnam Reading and Mathematics Assessment Study were Mrs Tam, Mr. Hoan, and Mrs. Hanh for questions related to tests, and Mr. Chi and Mr. Trung for questions related to questionnaires.) No serious problem occurred. Some of the problems reported to Mr. Chi's office involved such matters as: one data collector could not find a school ID because he/she had not received the school list. But he was able to find the school ID on the ID sticker on the package; In one package of booklets, several pupil booklets were missing but the data collector was able to photocopy the required number of pupil booklets. The data collectors then parcelled up all instruments and returned them to either the District Education Office or the Provincial Education Office. Further checks were undertaken and where data were missing on any School Head questionnaire, the School Head was approached to supply the missing information. 10. By 4th week in April, 2001. It was by this date that the booklets had to be returned to Hanoi. The packages of booklets were sent back to Hanoi either directly from district educational offices or from provincial educational offices (some district officer send the package to provincial office). The received packages of booklets were kept in Mr. Chi's office. Mr. Dien categorized the packages of booklets by provinces and checked the number of schools within each province. Data editing, entry, cleaning and merging Immediately in May, 2001, with some data from a few schools in six provinces, a small national team together with World Bank staff, went with the actual booklets to the IIEP in Paris where, under supervision, they prepared the structure file for WinDem and then entered the data. They also had training in data cleaning, especially from Dr. Mioko Saito. Upon return to Vietnam they entered the data from one province, Pho Tho. This province was selected because it was considered to be average for the ease and difficulty of reaching schools. The data file was then sent to the IIEP for checking. At the same time it was agreed that a distance learning training course using video-conference by satellite should be organised. In the period July 9-12 a 'Distance Learning Intensive Training Course on Computer-based Data Cleaning and Data Validation for the Vietnam Survey of Schools in 61 Provinces' was held. The staff were Dr. K N Ross, Dr, M. Saito and Mlle. Stephanie Leite from the IIEP. They were at the World Bank 378 Vietnam Reading and Mathematics Assessment Study Office communications centre in Paris. At the Hanoi end there were 15 persons from the National Institute of Educational Sciences (NIES), the National Institute for Educational Development (NIED) and the Hanoi National University, together with the education team from the World Bank Hanoi Office. There five strands to the workshop: ID check using WinDem Merge/link check (in WinDem, Excel, and SPSS) Range validation checks (in WinDem) Within file consistency checks (in SPSS), and Across file consistency checks (in SPSS) Following the successful training three teams were set up together with tasks. The teams were as follows: Team Tasks Member Hand editing team Edit data in instruments based on the rules. Huong, Trung, Oanh, Tung, Keep log Lien. Instruct data entry. Entry team Supervisor Organize a data entry team Vinh, Cuong, Trung Double entry for checking entry miss (2 schools per province) Entry member Enter the data with WinDem by following the rules and hand edit. 36 students (18 in the morning Follow and consult the instruction from the data supervisor. and 18 in the afternoon) Clean the data using WinDem and SPSS based on the rules. (ID Cleaning team check, merge/link check, validation check, within file consistency check, and across file consistency check) Phan, Lang, Thanh, Ha, Miyako Keep log. Librarian Carry instruments from room to room. two people The Teacher Training College in Hanoi was used for the data editing, entry and cleaning. The warehouse where the packages were stored was in the basement of the college. Three rooms on the second floor of the college were allocated for the editing, entry and cleaning. Two librarians carried the packages up the three flights of stairs. A package was first brought to the editing room. Each package was logged in and at the same time the booklets in each package were examined manually. The 'Flow of Tasks' chart has been presented in Appendix 1. 6. As a problem was discovered it was noted and a suggested action to be taken was noted. The problems mainly concerned ID numbers, missing data and hard to read answers. Where necessary the senior members of the team would phone to the province to ask them to visit the school again for clarification. When the editing had been completed for a school and all was in order, the package was passed to the next room for data entry. There were two teams of data enterers. Each team started with 18 students from the university. After the data had been entered then the package was transferred to the data 379 Vietnam Reading and Mathematics Assessment Study cleaning room where the team for data cleaning cleaned the data but could refer to the booklets in each package. Establishing the rules for cleaning and which kinds of changes could be made to the data file took quite a time to develop. The work was conducted in three phases. First there was the cleaning of the data for the first province that had been entered - Pho Tho. The second phase consisted of entering data for the second to the fifth province and cleaning it. The rules that had been established in Phase 1 were revised where necessary and finalised. These rules have been presented in Appendix 1. 7. Phase 3 consisted of applying the rules for the cleaning of the data to the information entered for remaining provinces, and checking again the cleaning that had been done before the first set of provinces. The team leader conducted a separate verification data entry and there was only a difference of 0.02 to 0.05 percent of strokes. The data entry and cleaning lasted from 26 July to 18 October, 2001. However, in mid-August the data for the first eight provinces were sent to IIEP for them to check. Another distance learning training course was held by satellite during 22-24 August in order for the Hanoi team to learn more about data cleaning consistency checks and the creation of derived variables. On 19 October, the completed data file contained all data as well as the school and pupil forms for each school, master copies of the booklets, the enrolment of Grade 5 pupils, a list of schools where it was suspected that cheating had taken place (see Appendix 1. 8) and a list of ambiguous questions in the booklets (see Appendix 1. 9). The rules for 'missing data' were also sent (see Appendix 1.10). In mid November Mioko Saito sent comments on the data files to Hanoi. These comments concerned school IDs not in the sampling list, missing school IDs, classes taught by two different teachers, classes without pupils, and consistency checks within and between files. Between the third week in November and the second week in December, the Hanoi team checked the records, where necessary applied the cleaning rules if these had been missed, but if there were still problems the members of the team located the original booklets and, when required, changed the data to give the correct values in the booklets. The results were written in SPSS syntax and sent to IIEP. A report of this exercise has been presented in Appendix 1.11. In the first week of December, 2001, an inspection was made of 76 suspicious schools. They were suspicious in the sense that they had very high school mean scores very small school standard deviations. The provincial mean scores were also examined as well as the school location. At the same time the Vietnam team contacted the district offices in which the schools were located and asked if the schools were deemed to be very good, good, average, or poor. Finally the list of 35 schools whose tests parts are suspiciously similar was created. In the end it was decided not to exclude any of these schools from the data analysis. All 76 suspicious schools have been listed in 380 Vietnam Reading and Mathematics Assessment Study Appendix 1.8. Thirty-five schools which were identified as cheating after a careful examination have been highlighted in yellow. During the same week, it was discovered that there had been a column shift in some of the test item data. In some cases data were re-entered and in other cases the pupils' test score and teachers' test score were dropped. On the file, there are the variables: 'drop1' which indicates the pupils with column shifts in pupils' test. If 'drop1' is 1, pupil math score was dropped because the column shifts were in pupil math test. If 'drop 1' is 2, the pupil reading scores were dropped. 'Drop2=1'means to drop both reading and mathematics test for pupils in 35 schools because the answers in the test parts of these pupils were suspiciously similar as described above. Three teachers were provided with a teacher booklet which had most of the test pages missing. Therefore, teacher math scores of pupils with 'drop3=1' were dropped. 'Drop 3=2,' Teacher math scores of pupils with 'drop3=2' were also dropped. The variables 'drop1' and 'drop3' have been run. The variable 'drop2' just flags the cases and this has not been executed. In the third week in December, all of the changes were sent in syntax form to Paris. A member of the Vietnam team visited the IIEP in Paris in the third week of January where all changes were confirmed as being in order and she returned to Hanoi with the 'final' file. The numbers of pupils, teachers and schools in the data file were very large and the response rates were excellent. The final response rates were 3635 schools (99.89 percent response rate), 72660 pupils (99.26), 7178 teachers (98.63) and 3631 school heads (99.78) responded to the survey instruments. Data analyses A data analysis team of six persons was formed. The Primary Education Project ensured that the data analysis team were provided with good desktop computers. The Vice Minister for Primary Education had already expressed the wish to have breakdowns for all tables by region and school location (isolated, rural, urban). All of the dummy tables that had been prepared for the different chapters were prepared. After some of the tables had been run it was seen that the results for Ho Chi Minh City were erratic. It was then discovered that the population figures used for Ho Chi Minh City were wrong. The new figures were supplied and Dr. Ross of the IIEP in Paris recalculated all of the weights and a new CD containing the data and weights was burned and sent to Hanoi. The analyses that had been run were re-run and then the analyses for the remaining dummy tables. The team made a decision when to use tables and when to use graphs. The team made comments on their interpretation of some of the results. The results were then written up. If the Ministry decides to have a repeat survey, it would be worth 381 Vietnam Reading and Mathematics Assessment Study considering if it would be better to have all graphs in the main body of the chapters, and the equivalent tables as appendices. This is because it was discovered that the readers in the Ministry found it easier to read graphs than tables. It should be mentioned that the tables for Chapter 2 (the achievement results) were calculated by Professor Patrick Griffin of the Assessment Research Centre of the University of Melbourne. The effective schools analysis reported in Chapter 8 were calculated by Miyako Ikeda working in the Vietnam team. One set of HLM analyses reported in Chapter 9 were calculated and interpreted by Njora Hungi of the Flinders University in South Australia. The written products Three volumes were produced. The first was Volume 2 of this series. This was the book reporting all of the results. The second was this volume, known as Volume 3. The third was Volume 1 which states the recommendations in great detail with pointers as to who should do what when. A Data Archive Mr. Heiko Jungclaus of the IEA Data Processing Centre in Hamburg was asked to prepare the data archive. This was accomplished in the period August to October, 2002. The aim is to make the data easily accessible to all bona fide researchers wishing to use the data. Those wishing to use the data archive should apply to: Vice Minister for Schooling (Primary Education) Ministry of Education 49 Dai Co Viet Street Hanoi Vietnam References Griffin, P. (1998). Vietnamese National Study of Pupil Achievement in Mathematics and Vietnamese. Hanoi, National Institute for Education and Science. 382 Vietnam Reading and Mathematics Assessment Study e ence D D D D D D Questionnair eferr D1 P02 P03 P05 P10 P04 P1 P12 P06 ence eferr ableT 3.1 3.1 3.1 3.1 3.2 3.1 3.1 self-evident. es be to 1 of felt is variables) it pupils?5 requiring of questionnaires these curriculum? ACHIEVEMENT enrolment because the Do the the questionnair in Grade patterns in home? recoding achievement? done the take? and/or at of to on test (Add) been PUPIL5 and questions the ietnam:V not distribution to ound, methods imbalances of (see has Ministry influence bles there an there pupils?5 this computed VIETNAM ta backgr thee Are teaching Are language references have on Grade action? computed be the of to cases home GRADE and equirr be ound pupils?5 pupils?5 to meals? homes? score other in OF them, influence speaks corrective eat including backgr Grade an Grade children? education but variable Rasch of of the questions,y pupils5 pupils' in = out R addressing characteristics home having requiring of pupils5 parents' the there Derived Grade the = spelled ORING these is and distribution and/or group olicP distribution pupils Grade of of are D chapters characteristics, do do in thee age action sex ethnic level books question question the the female questions the the ar actions MONIT tables is is and is regularly percentage is many the of Policy Pupils What What characteristics What corrective What male What How What What How policy questions, 1.1: of questionnaire implication policy questionnaire the of Head Question number 1.01 1.02 1.03 1.04 1.05 1.06 1.07 questionnaire eacherT cases oup Pupil = School Appendix Example abulationT Gr 01 =P = some TQ S In 383 Vietnam Reading and Mathematics Assessment Study R and P12 D +1 ses 15 D & D D D D D D D? D D for P07 P06 P1 R P08 P09 P13 P14 P16 P24 P25 P26 P30 P33 P31 P34 P27 P28 P29 TQ2 TQ3 TQ4 TQ10 TQ12 TQ13 TQ12 TQ13 S23 TQ5 text 3.1 3.2 3.2 3.2 3.2 3.2 3.3 In 3.4 4.1 4.3 4.1 belong, they children teacher work? did the many in of group have? school how terms tables they their And in in pupils?5 ethnic did areas? month? it? elevantr which interest Grade to, children (Measured families? it? the and of remote taught see their previous gender many and taught parents? the who help ooms, teachers their how work? in to homework and who rural, of and their to grade? for and classr was and work school urban, one teachers? for what home) in which to school? by paid their married at to pupils in from least regarding it paid in have at it they the books travel home home was characteristics teachers of to absent corrected was teachers,5 ere and at W pupils homework? their conditions ound status the have repeated female pupils in tuition, Grade area? corner did the had tuition, of about get of teachers? pupils homework extra pupils backgr of education, quieta day extra age local status do were pupils per pupils the had the of had about have days situation did was general from socio-economic parental average marital education hours minutes the pupils pupils questions thee the the many extent extent they the the is pupils have? many many percentage was many many oom wer was was was the how What possessions, Do How How On What What whatoT whatoT information were they How How For classr What What and What did What 1 1.08 1.09 1.10 1.1 1.12 1.13 1.14 1.15 2.1 2.2 2.3 01.1 01.2 02 384 Vietnam Reading and Mathematics Assessment Study Thanh Thanh D?1 D D D D D D D D D D-ask D-ask D? D? D TQ6 TQ1 TQ7 TQ8 TQ9 TQ14 TQ15 TQ16 TQ17 TQ18 R TQ19 TQ20 TQ21 TQ22 TQ23 TQ24 P20 P21 R? P32 P35 P36 P37 and but 4.5 text 1 4.1 4.1 4.1 4.4 4.2 In unnumbered ableT 4.6 4.7 4.8 4.9 4.10 4.1 4.12 4.13a 4.13b How preparing were there pupils? attended as spend books W their had they of did subjects' curriculum? many moveable? time parents money? the other mathematics? in curriculum? How the earn the and and have? desks much to in the meet ces? books subjects received? teachers? ere How order items) they ent subjects had teachers5 W esourre in 16 week? did ietnameseV resources)? in differ all ferent tasks? dif teachers5 Grade ooms? the 'excellent' available? (in each activities mathematics (pupil did as furnitur daily the frequently classr textbooks learn were and on other books, Grade teaching How on the to learn available? in classrooms share to following? that experience designated places spend day to have supplies were spend work? ces per have been ietnameseV the pupils in had the training writing pupils oom teaching had teachers library? teachers pupils' spend esourr pupils pupils of and did resources pupils classr did did they pupil by of did were teacher materials, corner the courses? years teachers sitting thee time time marking did thee was many many many wer classroom and much much hours wer general possessed percentage opportunity opportunity curriculum What in-service How How How classrooma What Which How How lessons many What Which NOT What What What Which 1 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.1 2.12 02.2 02.3 02.4 02.5 385 Vietnam Reading and Mathematics Assessment Study for R the & in D D scores D all D D TQ25 TQ26 TQ26 TQ30 TQ31 T32 TQ33 TQ34 TQ36-40 Factor each TQ35 TQ41 Nearly variables teacher questionnaire S1 S2 S3 S5 S6 S7 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 5.1 to and and school, to career with goals general travel oler they the their theire achievement pupils? to the were wer the others, mathematics of was minutes what the how conditions, with satisfied by age, mathematics and jobs? how living lessons, and background (sex, and e? heads? their their important easonsr relationships home wer heads? reading with with very the school inspectorate? their as what pupils? mathematics with of of school the possessions, pupils5 for in of of teachers their rated role satisfaction of of home account Grade equipment, teachers? correlated inspectors? the and of their teachers group) taking characteristics inspectors, by of by approaches pupils? which by teacher outcomes were in characteristics ethnic of satisfaction status and their indicators after ound teachers? visited of facilities ces what and visited the arranged goals test perceptions by sour school educational live, teacher oaches? correlated conditions backgr background schools sources the children schoolse the the the major activities appr teachers activities, they of viewed teachers' major and socio-economic teachers the were school general general no. wer were the thee the the the accommodation? thee theire the did were thee thee the extent status, often often were wer were conditions, was did living wer wer were often extent wer wer were inspectors How of How What What What school advancement, What Where their What what Which teachers? How whatoT what What What What marital 2.13 2.14 2.15 2.16 2.17 2.18 3.01 02.6 02.7 02.8 02.9 02.10 03 03.1 386 Vietnam Reading and Mathematics Assessment Study Thanh R Thanh & D D D D D-ask D D-ask D D 1 S8 S9 S10 S1 S12 S13 S14 S15 S16 S17 S18 S3 S19 S20 S21 S22 S23 S31 S31 S32 S24 S25 S26 S27 S28 S29 S30 S31 S32 5.2 5.3 5.4 5.6 5.4 5.4 5.6 5.6 5.5 5.5 5.5 5.5 5.6 years they of there pupils and were were of management percentage school satisfied classes this What percentage -training, how many head and ratio?f What teacher How years, home at now schools? school? schools? pupil-staf (academic, have the the in teaching they each schools? in satellite had? was heads? were did in the force? teachers? week heads? shifts of What school per many ce the schools classes of attend? teaching of for the school ferent hours of possessions how schools? dif the of many pupils attend? is and of fs schools? the what school? education staf in How experience the teaching the live, pupils the were main the the in and experience, conditions did of in did pupils ofe breakdown general had boys? heads schools? of enrolment living accommodation? of were olment schools were5 shifts/sessions? training teaching of schools the settings schools? natur gender level training enr total the earsY thee school living of the pupils the were the the the the Grade ferent was altogether) wer did kinds dif their kind old were percentage many satellite was was was teacher was was in in the What course, head What Where with What What How What What How in What What What What What What pupils were 3.02 3.03 3.04 3.05 3.06 3.07 3.08 1 03.2 03.3 03.4 3.09 3.10 3.1 03.5 3.12 387 Vietnam Reading and Mathematics Assessment Study school all R? R? R? D D S33 S34 S35 S36 S37 S38 S39 S40 S42 S43 S44 S46 S45 S47 S49R Nearly variables text in 1 5.7 5.8 Unnumbered but 5.9 5.10 5.1 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 school? pupil ound the deal? and purpose deal? deal? to of temporary backgr the to to had had and was had metrage variables home heads heads What heads square toilets? buildings? after school school tof years? important? school total school school semi-permanent staf selected five most which the the including which and of resources? last the which with buildings? nature, was between inspections? be with ferent schools? the to with oom What dif in the heads? separately) condition achievement school schools? problems schools? classr help ofe deemed in oblems problems permanenta the with girls of the correlations inspected school pr and of metrage? ces? pupil and ents were schools help to natur of reading? been ganised behavioural square in par partial or school esourr (boys and and heads behavioural perception and and the classrooms their was parents variables schools activities were behavioural pupil teacher school ofe of was pupils school heads' and of pupils account. equencyfr had main of main main main natur what major of school of into correlations mathematics in the times thee activities thee number ratio school the the the the and the the thee communities the the taken was inspections? communities many wer activities pupil wer were were was was was was wer percentage did did were been such What How of What Which Which Which Which Which What What classrooms What What What What How How Relationships has What achievement 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 1 03.6 03.7 03.8 03.9 03.10 03.1 03.12 388 Vietnam Reading and Mathematics Assessment Study completed derived be benchmark used to Use variables TQ15.1 TQ15.2 TQ17.2 P21.01 P21.02 P21.03 P21.04 P21.05 P20.01 P20.03 P20.06 P20.10 be oduce to Griffin pr questions to 04 Patrick All by MOET information 7.1 7.2 7.3 7.4 7.5 7.6 to the ratings training locations attendance, attitudes school resources? and the school, places, enrolment, heads' school their to of parents? school size total sitting classroom omitted) in-service of for for for school experience ferent with related was dif strategies, level thee in province, the teacher wer section and meetings schoola by by Did backgrounds, teaching in benchmarks) benchmarks) benchmarks) held vary vary what (This types, levels? training they other other other and have? ferent dif teachers benchmarks? schools teachers?5 school (and (and (and of by ferent which ratio? gender competency use to competencies competencies their Grade in teachers?5 ferent dif fing Ministry's of dif and with reported staf competencies in their their age to extent as Grade the competencies? teacher5 teacher5 the benchmarks benchmarks benchmarks and of report to teacher teachers teachers related match space Grade Grade of provinces, did Ministry's Ministry's Ministry's competencies competencies? levels background, related levels competencies teachers'5 of of experience? total the the the Grade5 ferent of these dif home ratings and ratings conditions meet space, meet on in remote/isolated) teacher Grade board?a meet self-ratings satisfaction? pedagogical competency competencies of all heads heads thee and training and the the of competency job competencies of schools' schools schools schools influence teachers experience. and levels the school school the classroom the the wer competencies rural, places were an were did levels teacher teacher the ratings the teachers' the did did size, did did the ere ere ere What of What have What How (urban, What teaching W teaching W W heads Did head Did How How class How writing How supplies? 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 5.01 5.02 5.03 04 05 389 Vietnam Reading and Mathematics Assessment Study the for variable ID ID ID ID ratio ID ID ID ID derived levels P-T S25 S26 S8 S9 Use indexes measures. S25 S6 S8 S9 S34 TQ7.1+7.2 Derived for Skill Region Province benchmark categories Region Province PRead500 Pmath500 Region Province Pread500 Pmath500 Psex Psesgroups Region Province 3 1-? 7.7 8.1 2.1-2.5 2.6-2.10 2.1 Figures?? and two and provinces pupil, provinces experience, these ovinces girls pr per in among among es and teaching and space and mathematics? of pupils? boys professional among sub-scor of and and years mathematics? provinces classroom provinces ratio? oups and between reading equitably various in academic sub-gr within within (and reading the bookcase, professional), skills in allocated of for and pupil-teacher mathematics ces schools schools various the and and levels esourr among classrooma among and mathematics benchmarks benchmarks (academic visits and wholea ferent reading dif mathematics? ferent in heads? human as variation supplies, variation the dif and eadingr scores groups? Ministry's and and ovinces? pr qualifications inspectors' in country test the reached reached reading in in classroom the meet teachers materiale distribution distribution head head, within pupils pupils of of scores both ferences socio-economic wer schools of resource furniture, resource school schoola achievement ovinces, test dif schools pr the extent the the and as the the the ferent for dif did among was resources? was was percentages percentages were were How qualifications whatoT classroom teacher and What for school What for experience What subjects What What What What between 5.04 6.01 6.02 7.01 7.02 7.03 7.04 06 07 390 Vietnam Reading and Mathematics Assessment Study bottom and Region region province data Data Pread500 Pmath500 opT 5% ID Urban/rural Pread500 Pmath500 readT Tmath ID ID All All 2.14-2.15 Figures variance, pupil regions mathematics? analysis? economically across and between schools? and fer for dif reading regions 5%) in effective ? multivariate less accounting ferent (bottom dif ande Chapter most in speciala in 'tails' performances in mor similar pupil the reported variables and variance and performance between school the school teacher analysis and performances did 5%) areas? variables extent between between in schools teacher and (upper whatoT ences pupil, advantaged fectiveef 'elite' relationships differ main variance, areas? the pupils' thee speciala thee class the were wer wer economically was ere W advantaged and What What This What between 7.05 7.06 08 09 391 Vietnam Reading and Mathematics Assessment Study Appendix 1.2: Names and affiliations of members of the Questionnaire committee Dr. Nguyen Quoc Chi Manager of Primary Education Project Mr. Nguyen Huu Chau Deputy Director, National Institute of Educational Sciences Mr. Vu Van Du Ex-director of Teachers Department Mr. Bui Gia Thinh National Institute of Educational Sciences Mr. Bui Duc Thiep National Institute of Educational Sciences Mr. Phan Chi Thanh National Institute of Educational Sciences Mr. Nguyen Sy Tuyen Deputy Director of Teachers Department Mr. Le Huu Hanh Ex-Director of Planning & Finance Department Mr. Tran Manh Cuong Senior Expert, Teachers Department 392 Vietnam Reading and Mathematics Assessment Study Appendix 1.3: Pupil, Teacher and School questionnaires with variable names 393 Vietnam Reading and Mathematics Assessment Study Part I: QUESTIONNAIRE FOR PUPILS Read all the questions and answer them carefully. For most of the questions, you should put a cross X in the lozenge next to the answer you choose. SOME QUESTIONS ABOUT YOURSELF Your address is: Commune/Ward ___________________________ Precinct/District ___________________________ 1. Which class in Grade 5 do you attend this term? (Pclass) (Put X in one lozenge only) 5A 5B 5C 5D 5§ 5E 5G 5H 5I 5K 5L 5M (01) (02) (03) (04) (05) (06) (07) (08) (09) (10) (11) (12) 2. When were you born? (Write the numbers in the lozenges below) (Pbday) (Pbmonth) (Pbyear) Day Month Year 1 9 3. Are you a boy or a girl? (Psex) (Put X in one lozenge only) Boy (1) Girl (2) 4. Do you speak Vietnamese at home? (Pviet) (Put X in one lozenge only) No (1) Sometimes (2) (3) Always 394 Vietnam Reading and Mathematics Assessment Study 5. Which ethnic group do you belong to? (Pethnic) (Put X in one lozenge only) Kinh (1) Other (2) NB: For questions from 6 to 9 below, describe your home. 6. How many books are there in your home? (Pbookshm) (Put X in one lozenge only. DO NOT count newspapers, magazines, textbooks) None (1) 1-10 books (2) (3) 11-50 books (4) 51-100 books (5) 101-200 books (6) 201 books 7. What appliances are there in your home? (Put X in the lozenge appropriate for each appliance. If any of them is currently out of order but can be fixed, put an X for it) In my home there are the following items 7.01 Bicycle (Pposbic) (2) 7.02 Motorcycle (Pposmot) (2) 7.03 Clock (Pposclck) (2) 7.04 Study desk (Pposdesk) (2) 7.05 Study chair (Pposchar) (2) 7.06 Study lamp (Pposlamp) (2) 7.07 Newspaper/magazine (Pposnews) (2) 7.08 Book (Pposbook) (2) 7.09 Bookcase (Pposbkcs) (2) 7.10 Radio (Pposrad) (2) 7.11 TV (Ppostv) (2) 7.12 Wardrobe (Pposwdrb) (2) 7.13 Electric fan (Pposfan) (2) 395 Vietnam Reading and Mathematics Assessment Study 7.14 Electric or gas cooker (Pposcook) (2) 7.15 Washing machine (Pposwash) (2) 7.16 Telephone (Ppostele) (2) 7.17 Air-conditioner (Pposairc) (2) 7.18 Computer (Pposcomp) (2) 8. Do you have a private study corner at home? (Pprivcor) (Put X in one lozenge only) No (1) Yes (2) 9. How much time do you spend each day doing work for your family? (Pfamtime) (Put X in one lozenge only) I never do any work for my family (1) Less than 1 hour (2) (3) From 1 to 2 hours (4) From 2 to 3 hours (5) More than 3 hours 10. How many meals a day do you have ? (Pmeal) (Put X in one lozenge only) 1 meal (1) 2 meals (2) (3) 3 meals or more 11. Which of the following shows the highest educational level of your mother? (Pmother) (Put X in one lozenge only) Did not go to school and did not attend any continuation class (1) Finished primary school (2) (3) Finished lower secondary school (4) Finished upper secondary school or vocational school 396 Vietnam Reading and Mathematics Assessment Study Finished university (5) I do not have a mother (6) 12. Which of the following shows the highest educational level of your father? (Pfather) (Put X in one lozenge only) Did not go to school and did not attend any continuation class (1) Finished primary school (2) (3) Finished lower secondary school (4) Finished upper secondary school or vocational school Finished university (5) (6) I do not have a father SOME QUESTIONS ABOUT YOUR SCHOOL 13. How many minutes does it take you everyday to travel from home to school? (Ptravel) _________________ minutes 14. How many days in the month of March were you absent from school ? (Pabsent) (Write the number in the lozenges below. Write "0" if you were not absent) Days 15. Reasons for your absence? (Put X in the lozenges appropriate for your reasons) 15.1 (1) I was not absent from school (Pabsnot) 15.2 I was ill (Pabsill) (2) Family reasons (E.g.: funeral, wedding, sick relatives, etc.) 15.3 (3) (Pabsfam) 15.4 (4) I had to work for my family (Pabswork) 15.5 (5) Bad weather or flood (Pabswthr) 15.6 (6) Others (Pabsoth) 397 Vietnam Reading and Mathematics Assessment Study 16. How many times have you repeated a class since you started school? (Pgrep) (Put X in one lozenge only) I have never repeated a class (1) I repeated a class once (2) (3) I repeated a class twice (4) I repeated a class three times or more 17. Do you repeat grade 5 this year? (Pg5rep) (Put X in one lozenge only) No (1) Yes (2) 18. Do you borrow books from your school library and/or class bookcase to read at home? (Pschlibr) (Put X in one lozenge only) There is no school library or class bookcase (1) No (2) Yes (3) 19. Which shift do you study? (Pshift) (Put X in one lozenge only. If you study from 6 to 8 shifts /week, then put X in the lozenge that reads "All day") All day (1) Morning (Shift one) (2) (3) Afternoon (Shift two) (4) Shift three (if applicable) 398 Vietnam Reading and Mathematics Assessment Study 20. Do you have the following items of study materials? (Put X in the lozenge appropriate for each item) I have the following items 20.01 Notebooks for writing (Pmatnotw) (2) 20.02 Notebooks for drafting (Pmatnotd) (2) 20.03 Black pencils (Pmatpenb) (2) 20.04 Colour pencils (Pmatpenc) (2) 20.05 Pocket calculator (Pmatcalc) (2) 20.06 Fountain pens or ball pens (Pmatpen) (2) 20.07 Small tables (Pmattab) (2) 20.08 White chalk or board markers (Pmatchlk) (2) 20.09 School bag (Pmatbag) (2) 20.10 Rulers (Pmatrule) (2) 21. Do you have the following books? (Put X in the lozenge appropriate for each item) I have the following items 21.01 Vietnamese 5. Volume 1 (Ptvvol1) (2) 21.02 Vietnamese 5. Volume 2 (Ptvvol2) (2) 21.03 Story Book 5 (Ptvstbk) (2) 21.04 Exercise Book for Vietnamese 5. Volume 1 (Ptvexbk1) (2) 21.05 Exercise Book for Vietnamese 5. Volume 2 (Ptvexbk2) (2) 21.06 Other books about Vietnamese (Ptvother) (2) 21.07 Vietnamese Dictionary (Ptvdict) (2) 21.08 Vietnamese Essays 5 (Ptvessay) (2) 21.09 Mathematics 5 (Ptmath5) (2) 21.10 Math Exercises 5 (Ptmathex) (2) 21.11 Exercise Book for Math 5. Volume 1 (Ptmex51) (2) 21.12 Exercise Book for Math 5. Volume 2 (Ptmex52) (2) 21.13 Other Math books (Ptmother) (2) 21.14 Moral Education 5 (Ptmor) (2) 399 Vietnam Reading and Mathematics Assessment Study 21.15 Moral Education Exercises 5 (Ptmorex) (2) 21.16 Nature and Society 5. Book 1. Science (Ptnssc1) (2) 21.17 Nature and Society 5. Book 2. History and Geography (Ptnshg) (2) 21.18 Exercise Book for Nature and Society 5 (Science) (Ptnsexs) (2) 21.19 Exercise Book for Nature and Society 5 (History) (Ptnsexh) (2) 21.20 Exercise Book for Nature and Society 5 (Geography) (Ptnsexg) (2) 21.21 Labour and Technology 5 (Pttech) (2) 21.22 Music 5 (Ptmus) (2) 21.23 Exercise Book for Music 5. Volume 1 (Ptmusex1) (2) 21.24 Exercise Book for Music 5. Volume 2 (Ptmusex2) (2) 21.25 Art 5 (Ptart) (2) 21.26 Health 5 (Pthlth) (2) 21.27 Exercise Book for Health 5 (Pthlthex) (2) 21.28 Informatics for Primary School Pupils (Ptinf) (2) 21.29 English for Primary School Pupils (Pteng) (2) 22. Do you have a chair to sit on in your classroom? (Psit) (Put X in one lozenge only) I sit on the floor of the classroom (1) I sit on a bench (2) I sit on a separate chair (3) 23. Do you have a desk to write on in your classroom? (Pwrite) (Put X in one lozenge only) I do not have a desk to write on (1) I write on the type of desk for 4 or 5 pupils (2) I write on the type of desk for 1 or 2 pupils (3) 400 Vietnam Reading and Mathematics Assessment Study SOME QUESTIONS ABOUT THE SUPPORT YOU RECEIVE WHEN STUDYING 24. In your home, is there anyone who reminds you to do your homework? (Phwkdone) (Put X in one lozenge only) I am not given any homework (1) Never (2) Sometimes (3) (4) Always 25. In your home, is there anyone who helps you to do your homework? (Phwkhelp) (Put X in one lozenge only) I am not given any homework (1) Never (2) Sometimes (3) (4) Always 26. In your home, do people pay attention to your work at school? (Phwkatt) (Put X in one lozenge only) Never (1) Sometimes (2) Always (3) 27. Besides your study time at school, how much time do you spend on extra tutorials for the following subjects? (Fill in the appropriate lozenge) 27.01 I do not take extra tutorials (Pextuno) 27.02 I take extra tutorials in Hour/week Vietnamese (Pextuvie) 27.03 I take extra tutorials in Math Hour/week (Pextumat) 401 Vietnam Reading and Mathematics Assessment Study 28. Do you have to pay money for your extra tutorials? (Pextupay) (Put X in one lozenge only) I do not take extra tutorials (1) Yes (2) No (3) (4) I don't know 29. Who gives you extra tutorials? (Pextutch) (Put X in one lozenge only) I do not take extra tutorials (1) I take extra tutorials with teachers from my school (2) I take extra tutorials with teachers from other schools (3) SOME QUESTIONS ABOUT YOUR VIETNAMESE LESSONS 30. Are you usually given homework in Vietnamese? (Phwvget) (Put X in one lozenge only) I am not given any homework (1) Once or twice a month (2) Once a week (3) (4) Three or four times a week 31. Does your teacher correct your Vietnamese homework? (Phwvcorr) (Put X in one lozenge only) I am not given any homework (1) My teacher never corrects homework (2) My teacher sometimes corrects homework (3) (4) My teacher often corrects homework (5) My teacher corrects all the homework 402 Vietnam Reading and Mathematics Assessment Study 32. Do you have Vietnamese textbooks for use in class? (Ptxvshar) (Put X in one lozenge only) I do not have Vietnamese textbooks (1) Only the teachers have the textbooks (2) I share the Vietnamese textbooks with more than two other pupils (3) (4) I share the Vietnamese textbook with another pupil (5) I use my own Vietnamese textbooks SOME QUESTIONS ABOUT YOUR MATHEMATICS CLASS 33. Are you given homework in Mathematics? (Phwmget) (Put X in one lozenge only) I am not given any homework (1) Once or twice a month (2) Once a week (3) (4) Three or four times a week 34. How often does your teacher correct your Mathematics homework? (Phwmcorr) (Put X in one lozenge only) I am not given any homework (1) My teacher never corrects homework (2) My teacher sometimes corrects homework (3) (4) My teacher often corrects homework (5) My teacher corrects all the homework 35. Do you have Math textbooks for use in class? (Ptxmshar) (Put X in one lozenge only) I do not have Math textbooks (1) Only the teachers have the textbooks (2) I share the Math textbooks with more than two other pupils (3) (4) I share the Math textbook with another pupil (5) I use my own Math textbooks 403 Vietnam Reading and Mathematics Assessment Study A QUESTION ABOUT YOUR SYLLABUS 36. Do you learn the subjects below in your class? (Put X in the lozenges indicating the subjects you learn) Yes 36.01 Vietnamese (Plrnviet) (2) 36.02 Mathematics (Plrnmath) (2) 36.03 Moral Education (Plrnmoed) (2) 36.04 Science, History, Geography (Plrnns) (2) 36.05 Technology (Plrntech) (2) 36.06 Music (Plrnmus) (2) 36.07 Art (Plrnart) (2) 36.08 Physical Education (Plrnpyed) (2) 36.09 Health (Plrnhlth) (2) YOU HAVE COMPLETED ALL THE QUESTIONS IN PART I. PUT YOUR PEN DOWN AND KEEP SILENCE WHILE WAITING FOR YOUR CLASSMATES TO FINISH THEIR QUESTIONS. 404 Vietnam Reading and Mathematics Assessment Study Part I: QUESTIONNAIRE FOR TEACHERS SECTION 1: GENERAL QUESTIONS NB: If your school does not use 5A, 5B, 5C, etc. for class names, please notify the investigator before completing the questions. 1. Fill in the following table to show which classes you teach Vietnamese and/or Mathematics and the number of pupils in those classes. (Put X in the box appropriate to each Grade 5 class you teach. For these classes, write the number of pupils in the boxes at the end of the row) 1.1 Vietnamese 1.2 Mathematics No Yes No of pupils No Yes No of pupils 1.1 5A a 5A c (1) (2) b (1) (2) d 1.2 5B a 5B c (1) (2) b (1) (2) d 1.3 5C a 5C c (1) (2) b (1) (2) d 1.4 5D a 5D c (1) (2) b (1) (2) d 1.5 5§ a 5§ c (1) (2) b (1) (2) d 1.6 5E a 5E c (1) (2) b (1) (2) d 1.7 5G a 5Gc (1) (2) b (1) (2) d 1.8 5H a 5Hc (1) (2) b (1) (2) d 1.9 5I a 5I c (1) (2) b (1) (2) d 1.10 5K a 5Kc (1) (2) b (1) (2) d 1.11 5L a 5L c (1) (2) b (1) (2) d 1.12 5M a 5Mc (1) (2) b (1) (2) d 405 Vietnam Reading and Mathematics Assessment Study List of variables from recode 1 to 54 (6 genaral variables and variables of question 1): Name of variable Name of variable Name of variable Name of variable 1. Not used in archive 15.TQ/1.3.1: Tvclassc 29.TQ/1.6.3: Tmclasse 43.TQ/1.10.1: Tvclassk 2. Not used in archive 16. TQ/1.3.2:Tvclascp 30. TQ/1.6.4:Tmclasep 44. TQ/1.10.2:Tvclaskp 3. Not used in archive 17.TQ/1.3.3: Tmclassc 31.TQ/1.7.1: Tvclassg 45.TQ/1.10.3: Tmclassk 4. Not used in archive 18.TQ/1.3.4: Tmclascp 32. TQ/1.7.2:Tvclasgp 46. TQ/1.10.4:Tmclaskp 5. Not used in archive 19.TQ/1.4.1: Tvclassd 33.TQ/1.7.3: Tmclassg 47.TQ/1.11.1: Tvclassl 6. Not used in archive 20. TQ/1.4.2:Tvclasdp 34. TQ/1.7.2:Tmclasgp 48. TQ/1.11.2:Tvclaslp 7. TQ/1.1.1: Tvclassa 21.TQ/1.4.3: Tmclassd 35.TQ/1.8.1: Tvclassh 49.TQ/1.11.3: Tmclassl 8. TQ/1.1.2: Tvclasap 22.TQ/1.4.4: Tmclasdp 36. TQ/1.8.2:Tvclashp 50. TQ/1.11.4:Tmclaslp 9. TQ/1.1.3:Tmclassa 23.TQ/1.5.1: Tvclasd1 37.TQ/1.8.3: Tmclassh 51.TQ/1.12.1: Tvclassm 10.TQ/1.1.4: Tmclasap 24. TQ/1.5.2:Tvclad1p 38. TQ/1.8.4:Tmclashp 52. TQ/1.12.2:Tvclasmp 11.TQ/1.2.1: Tvclassb 25.TQ/1.5.3: Tmclasd1 39.TQ/1.9.1: Tvclassi 53.TQ/1.12.3: Tmclassm 12.TQ/1.2.2: Tvclasbp 26.TQ/1.5.4: Tmclad1p 40. TQ/1.9.2:Tvclasip 54. TQ/1.12.4:Tmclasmp 13.TQ/1.2.3: Tmclassb 27.TQ/1.6.1: Tvclasse 41.TQ/1.9.3: Tmclassi 14.TQ/1.2.4: Tmclasbp 28. TQ/1.6.2:Tvclasep 42. TQ/1.9.4:Tmclasip 406 Vietnam Reading and Mathematics Assessment Study SOME QUESTIONS ABOUT YOURSELF 2. Are your male or female? (Tsex) (Put X in one box only) Male (1) Female (2) 3. How old are you? (Tage) (Write the number in the boxes below) Years old 4. Which ethnic group do you belong to? (Tethnic) (Put X in one box only) Kinh (1) Other (2) 5. Which box shows your highest educational level? (Ted) (Put X in one box only) Primary Education or Equivalent (1) Lower Secondary Education or Equivalent (2) Upper Secondary or Vocational Education (3) (4) University Education 6. In which type of training were you trained to be a teacher? (Tttr) (Put X in one box only) I am not trained to be a teacher (1) Below Teacher Training High School Level (2) Teacher Training High School Level (9 + 3) (3) Teacher Training High School Level (12 + 2) (4) (5) Teacher Training College Level (12 + 3) Teacher Training University Level (12 + 4) (6) 407 Vietnam Reading and Mathematics Assessment Study 7. How many years altogether have you been teaching? (Write the number in the boxes below. Round up to "1" if the duration is less than 1 year) 7.1 Number of years teaching grades 1-4 (Tyrs1) 7.2 Number of years teaching grade 5 (Tyrs2) 8. For how many years have you been working as a teacher in this school? (Write the number in the boxes below. Round up to "1" if the duration is less than 1 year) (Tyrshere) Years 9. Were you classified as an excellent teacher at school, district, province or national levels last school year? (Put X in one box only) 9.1 I am not an excellent teacher (Texno) (2) 9.2 Excellent Teacher at School Level (Texsch) (2) 9.3 Excellent Teacher at District Level (Texdis) (2) 9.4 Excellent Teacher at Provincial Level (Texpro) (2) 9.5 Excellent Teacher at National Level (Texnat) (2) 10. Are you from the locality where you teach? (Tlocal) No (1) (2) Yes 11. In general, do you think that the in-service training courses that you attended were effective in terms of improving your teaching? (Tinserv) (Put X in one box only) I do not attend any in-service course (1) Not effective (2) A little effective (3) (4) Effective (5) Very effective 408 Vietnam Reading and Mathematics Assessment Study 12. Are you married? (Tmarry) (Put X in one box only) Not yet (1) (2) Yes 13. How many children have you got? (Tchild) Children SOME QUESTIONS ABOUT YOUR GRADE 5 CLASSROOM NOTE: If you teach different grades, answer questions 14-18 below about the Grade 5 classroom that you teach the most hours in. 14. How many books are there in the bookcase (bookshelf) of the class you teach? (Tbookcls) (Write the number of books in the boxes below. Do not count newspapers and magazines. Write "0" if there is no bookcase (bookshelf) for the class) Books 15. How many of the following do you have in the classroom where you teach? (Write number in the boxes below. If you teach many classes in grade 5, write the number for the class you answered in question 1) Desks and chairs for pupils (Tdskch) 16. Are the pupils' desks/individual desks movable for different groupings of pupils? (Tdeskmov) No (1) (2) Yes 409 Vietnam Reading and Mathematics Assessment Study 17. Are there the following items in your classroom? (Put X in the box appropriate for each item) Yes 17.01 An useable board (black, white, green) (Tresbd) (2) 17.02 Chalk (or board marker) (Treschlk) (2) 17.03 Wall map (Treswcht) (2) 17.04 Cupboard (Trescpbd) (2) 17.05 Bookshelf (Tresbksh) (2) 17.06 Bookcase (Tresbkcs) (2) 17.07 Teacher's table (Tresttab) (2) 17.08 Teacher's chair (Trestchr) (2) 17.09 Electric fan (Tresfan) (2) 17.10 Electric lamps enough for lighting the classroom (Treslamp) (2) 18. While teaching Math or Vietnamese do you use the following items at school? (Put X in the box appropriate for each item) Yes 18.01 Vietnamese dictionary (Tresdict) (2) 18.02 Geometry teaching aids (compasses, rulers, etc) for writing on the board (Tresgeom) (2) 18.03 Teacher's Book (Vietnamese) (Tresguiv) (2) 18.04 Teacher's Book (Math) (Tresguim) (2) 18.05 Reference materials (Tresrefm) (2) 18.06 Pictures for illustration (Trespict) (2) SOME QUESTIONS ABOUT YOUR TEACHING 19. How many periods do you teach in a typical working week? (Tnperiod) (Including total number of periods teaching different subjects and different grades) (Write the number in the boxes below) Periods/week 410 Vietnam Reading and Mathematics Assessment Study 20. How many minutes are there in a typical period? (Tminteac) (Write the number in the boxes below) Minutes/period 21. On average how many hours a week do you spend for lesson preparation and marking pupils' work? (Thrsouts) (Write the number in the boxes below) Hours/week 22. How many times per year do you have meeting with pupils' parents to discuss pupils' learning progress and related matters? (Tmeetpar) (Put X in one box only) Never (2) Once/year (2) Once/term (2) Once/month or above (2) 23. On average, what is the percentage of parents who attend teacher-parent- meetings in a school year? (Tpctmeet) (Write the number in the boxes below) % 24. How many hours per day do you work (including provision of extra tutorials) to increase your income? (Texhrs) Extra work or tutorials Hours 411 Vietnam Reading and Mathematics Assessment Study SOME QUESTIONS ABOUT YOUR SCHOOL 25. How many times do inspectors or educational officers pay business visits to the class you teach? (Write the number in the boxes for each type and each year. Write "0" if inspectors do not pay business visits. If you do not teach at this school in the years listed below, write "NA" in the appropriate boxes) 25.1 (Tinspv98) 1998 times 25.2 (Tinspv99) 1999 times 25.3 (Tinspv00) 2000 times 26. When inspectors or educational officers pay business visits, which of the following activities do they often do? (Put X in the box appropriate for each item) Inspectors or educational officers are people who: No Yes 26.1 Give advice (Tinspadv) (1) (2) 26.2 Criticise (Tinspcri) (1) (2) 26.3 Initiate new ideas (Tinspide) (1) (2) 26.4 Clarify educational objectives (Tinspobj) (1) (2) 26.5 Explain curriculum content (Tinspcur) (1) (2) 26.6 Introduce new teaching materials (Tinspmat) (1) (2) 26.7 Provide information for teachers to improve themselves professionally (Tinspimp) (1) (2) 26.8 Contribute a little to the direct teaching in the classroom (Tinsplit) (1) (2) 26.9 Promote improvement in teaching methods (Tinspmth) (1) (2) 26.10 Encourage professional relationships with teachers at other schools (Tinspcon) (1) (2) 26.11 Train teachers in their specialisation (Tinspins) (1) (2) 26.12 Find faults and report to teachers' supervisor (Tinspflt) (1) (2) 412 Vietnam Reading and Mathematics Assessment Study 27. How often does your Principal observe and comment on your teaching? (Put X in one box only) (Tprinadv) Never (1) Once/year (2) Once/term (3) Once/month or above (4) I am the Principal (5) 28. Last month, how many times did your colleagues observe your teaching? (Ttobs) Times SOME QUESTIONS ABOUT YOUR PROFESSION AND YOUR ACCOMMODATION 29. How do you travel from home to your school everyday? (Ttravel) (Put X in one box only) On foot (1) By bicycle (2) By motorcycle (3) Other means (4) 413 Vietnam Reading and Mathematics Assessment Study 30. There are many factors to help increase teachers' satisfaction to their job. Please rank the importance of the following factors. (Put X in the box appropriate for each item) Not Important Very important important 30.01 Distance from home to school (Tsatdist) (1) (2) (3) 30.02 Location of school (urban, rural, remote, isolated) (Tsatloca) (1) (2) (3) 30.03 Teachers' housing provided (Tsatahou) (1) (2) (3) 30.04 Quality of teachers' housing (Tsatqhou) (1) (2) (3) 30.05 Conditions of school building and facilities (Tsatbldg) (1) (2) (3) 30.06 Condition of furniture in the classroom (Tsatcfur) (1) (2) (3) 30.07 Teachers' salary (Tsatsala) (1) (2) (3) 30.08 Timely payment of salary (Tsattpay) (1) (2) (3) Not Important Very important important 30.09 Learning progress of pupils (Tsatlean) (1) (2) (3) 30.10 Availability of articles in the classroom (e.g.: books, (1) (2) (3) paper, pens, etc.) (Tsatacsp) 30.11 Quality of school management and administration (1) (2) (3) (Tsatsman) 30.12 Good relationship with colleagues (Tsatrsta) (1) (2) (3) 30.13 Good relationship with local community (Tsatecom) (1) (2) (3) 30.14 Promotion opportunities (Tsatprom) (1) (2) (3) 30.15 Professional development opportunities through further study and/or training (1) (2) (3) (Tsatstud) 30.16 High status of teachers in society (Tsatstat) (1) (2) (3) 30.17 Policies of allowances for teachers (Tsatpall) (1) (2) (3) 31. Select three most important reasons from the list above (then rank according to level of importance) (Write the id number of the reason, 30._ in the boxes below) 31.1 30. The most important reason (Tsat1) 31.2 30. The second important reason (Tsat2) 31.3 30. The third important reason (Tsat3) 414 Vietnam Reading and Mathematics Assessment Study 32. Where do you live? (Tlive) (Put X in one box only) With my parents (1) In a rented house (2) In a living quarter (3) In my own house (4) 33. Do you have the following items in your home? (Put X in the box appropriate for each appliance) Yes 33.01 Daily newspapers (Thposnew) (2) 33.02 Weekly or monthly magazines (Thposmag) (2) 33.03 Radio (Thposrad) (2) 33.04 TV (Thpostv) (2) 33.05 Video cassette player (Thposvcr) (2) 33.06 Cassette (Thposcas) (2) 33.07 Telephone (Thpostel) (2) 33.08 Refrigerator (Thposref) (2) 33.09 Motorcycle (Thposmot) (2) 33.10 Bicycle (Thposbic) (2) 33.11 Running water (Thposwat) (2) 33.12 Electricity (network electricity, generator) (Thposelc) (2) 33.13 Desk (Thpostab) (2) 33.14 Chair (Thposchr) (2) 33.15 Bookshelf (Thposbkc) (2) 33.16 Clock (Thposwth) (2) 33.17 Reading lamp (Thposlmp) (2) 33.18 Wardrobe (Thposwlb) (2) 33.19 Stove (electric or gas) (Thposckr) (2) 33.20 Washing machine (Thposwam) (2) 33.21 Pocket calculator (Thposclr) (2) 415 Vietnam Reading and Mathematics Assessment Study 34. Are you satisfied with your accommodation? (Tsatacc) (Put X in one box only) Not satisfied (1) Satisfied (2) Very satisfied (3) PART 2: VIETNAMESE 35. How often do you set written tests in Reading Comprehension ? (Put X in one box only) (Ttestv) I do not test my pupils (1) Once/a year (2) Once/a term (3) Two or three times/term (4) Two or three times/month (5) (6) Once/a week or above PART 3: MATHEMATICS 36. In teaching math, how important do you find the following pupils' activities? (Put X in the box appropriate for each item) Not Important Very important important 36.01 Study in pairs or groups to solve math problems (Tacprs) (1) (2) (3) 36.02 Individual study (Tacalone) (1) (2) (3) 36.03 Prepare illustrations of a project for presentation in class (Tacproj) (1) (2) (3) 36.04 Use of practical equipment such as angle measure, calculator, ruler, tape measure, etc. (Tacequip) (1) (2) (3) 36.05 Homework (Tachass) (1) (2) (3) 36.06 Study and explain graphics taken from newspapers and magazines (Tacgraph) (1) (2) (3) 36.07 Learning by heart multiplication tables, formulae (Tacrote) (1) (2) (3) 36.08 Quizzes and tests, etc. (Tacquiz) (1) (2) (3) 416 Vietnam Reading and Mathematics Assessment Study 37. From the 8 activities listed above, select one activity that you consider the most important. (Tac) (Write the number in the box below) 36. 38. How important do you think the following goals are for teaching math ? (Put X in the box appropriate for each item) Not Important Very important important 38.01 Basic number skills (Tgoalnum) (1) (2) (3) 38.02 Problem solving (transfer of skills to everyday life and application of knowledge) (Tgoalpro) (1) (2) (3) 38.03 Thinking skills including different ways of thinking in math problem solving (Tgoalthk) (1) (2) (3) 38.04 Confidence in doing math exercises (Tgoalcon) (1) (2) (3) 38.05 Enjoy doing math exercises (Tgoalsat) (1) (2) (3) 38.06 Expansion of professional opportunities (Tgoalopp) (1) (2) (3) 38.07 Development of life skills (Tgoaldev) (1) (2) (3) 39. From the 7 goals stated above, select one that you consider the most important. (Tgoal) (Write the number in the box below) 38. 40. How often do you use the following methods? (Put X in the box appropriate for each item) Never or Sometimes Always Rarely 40.01 Use everyday problems (oral, written or exercises) (Tapprpro) (1) (2) (3) 40.02 Whole class teaching (Tapprwcl) (1) (2) (3) 40.03 Small group teaching (Tapprsmg) (1) (2) (3) 40.04 Individual teaching (Tapprind) (1) (2) (3) 40.05 Questions and answers (Tapprqa) (1) (2) (3) 40.06 The more association with everyday problems the better (Tapprlif) (1) (2) (3) 40.07 Basic skill training (Tapprskt) (1) (2) (3) 40.08 Explanations of the math problem solving process (Tapprmpr) (1) (2) (3) 40.09 Use of materials available in the locality (for example for the measurement of areas or volumes) (Tapprloc) (1) (2) (3) 417 Vietnam Reading and Mathematics Assessment Study 41. Do you often set written tests for Math? (Ttestm) (Put X in one box only) (1) I do not test my pupils (2) Once/a year (3) Once/a term (4) Two or three times/term (5) Two or three times/month (6) Once/a week or above 42. The purpose of this question is to identify needs for further training and development of elementary school teachers with a view to improving their knowledge and delivering skills. The description of competence according to the 5 levels applicable to this question serves as a basis for designing appropriate training courses for teachers. Please read carefully the description for each level as well as the nine specific aspects of competence and mark the description closest to the level of competence by 5th-grade teachers in your school. Description of the 5 levels: Pass: the teacher demonstrates the stated competence in his/her performance of duties in school Fair: the teacher has the basic knowledge, skills and experience so the performance is satisfactory and meet the requirements by the school on the stated competence. Good: apart from the basic knowledge and skills, the teacher must have additional in-depth knowledge, skills and experience related to the stated competence. The teacher not only has the awareness of but also clearly understands the causes of his/her shortcomings and takes conscious measures to overcome his/her shortcomings with regards to the competence while unceasingly improving the professional competence in question. Very good: the teacher reaches this level when his/her knowledge, skills and experience with regards to the stated competence are highly appreciated by colleagues. A very good teacher has to lead in activities related to the competence, be able to assist and guide other teachers in the same school and in other schools to make improvements in the competence. A very good teacher needs to have thorough understanding of the competence, be able to shoulder key responsibilities such as proposal and implementation of new initiatives, methods, and programmes that seek to improve that competence in the school. Excellent: this is the highest level that a teacher could reach with regards to the stated competence. Only 418 Vietnam Reading and Mathematics Assessment Study a small number of teachers can reach this level, typically as a result of accomplished in-depth training and perfection with many years of experience in the education profession. A teacher of this type can hold the responsibility of developing the competence in question in the organisation and can be classified as attaining a national standard in guiding the quality improvement of the competence. Very Pass Fair Good Excellent Good 42.1 apply the education guidelines and policies by the Party and the (1) (2) (3) (4) (5) Government in the teaching and education of students (Tproprti) 42.2 do self-assessment in order to improve professional competence (1) (2) (3) (4) (5) on a continual basis (Tprosimp) 42.3 mobilise students' parents and the community to participate in (1) (2) (3) (4) (5) educating students (Tprocomm) 42.4 listen to students' expression of opinions and support them to help (1) (2) (3) (4) (5) each other (Tprossup) 42.5 have demonstrated improvements in pedagogical level, (1) (2) (3) (4) (5) knowledge and skills over the past years (Tprocimp) 42.6 have enough knowledge to teach all subjects in the curriculum (such subjects as Music, Arts, Physical Education can be (1) (2) (3) (4) (5) excluded) (Tprocurr) 42.7 be able to assist colleagues to apply advanced teaching methods (1) (2) (3) (4) (5) (Tprocoll) 42.8 be able to apply psychological knowledge in order to design teaching and learning activities that are appropriate to different (1) (2) (3) (4) (5) stages of development of individual students (Tpropsyc) 42.9 be competent in classifying students according to their cognitive levels, to meet the different needs of good and weak students (1) (2) (3) (4) (5) (Tproable) 42.10 be competent in using advanced techniques correctly to identify (1) (2) (3) (4) (5) and nurture gifted and talented students (Tprotaln) 42.11 be competent in proposing delivery methods, teaching aids, as (1) (2) (3) (4) (5) well as in time management for each lesson/session (Tproobjs) 42.12 organise and deliver lessons in such a way that each student (1) (2) (3) (4) (5) understands and participates actively in the learning (Tproflex) 42.13 be able to use a multitude of delivery methods including discussion, experiments, modelling, using pictures, games, etc. (1) (2) (3) (4) (5) (Tproaids) Very Pass Fair Good Excellent Good 42.14 encourage students to ask and answer questions as an activity in (1) (2) (3) (4) (5) the learning process (Tprotcu) 42.15 be competent in using different questioning techniques to direct (1) (2) (3) (4) (5) and develop students' inference (Tproques) 42.16 use different assessment methods to train and improve students' (1) (2) (3) (4) (5) skills (Tprometh) 42.17 use assessment results and data to select appropriate teaching and (1) (2) (3) (4) (5) learning methods for the individual students (Tprodata) 42.18 maintain order and discipline in class while being flexible to hold (1) (2) (3) (4) (5) diversified activities for students (Tpromang) 42.19 take into consideration students' questions and opinions in the (1) (2) (3) (4) (5) design and development of lesson plans. (Tproinpt) 419 Vietnam Reading and Mathematics Assessment Study Please check to make sure that you have filled in all the items. Thank you for completing this questionnaire. MINISTRY OF EDUCATION AND TRAINING Study on 5th Grade Reading Comprehension and Mathematics QUESTIONNAIRE TO THE HEADMASTER CODE Province School Tool Number 5 April, 2001 420 Vietnam Reading and Mathematics Assessment Study QUESTIONNAIRE TO THE HEADMASTER Note: the information in the questionnaire should be information referring to the headmaster although another person could be filling it in. PERSONAL DETAILS 1. Are you male or female? (ssex) Male Female 2. How old are you? (sage) (Write a number in the boxes.) years old 3. How long does it take you to travel to the school? (stravel) (Write a number in the boxes.) minutes 4. Is the school an elementary or a primary school? (sprimsec) (Just mark X in one of the boxes.) primary school only primary and secondary school 5. Are you married? (smarry) (Just mark X in one of the boxes.) Yes No No 6. How many children do you have? (schild) Child/children 7. Are you Kinh or minority ethnic? (sethnic) Kinh Others 8. What is the highest education you have obtained? (seduc) (Just mark X in one of the boxes.) (1) primary or equivalent 421 Vietnam Reading and Mathematics Assessment Study (2) junior secondary (3) secondary or vocational (4) tertiary 9. What is the level of teacher training that you have received? (sttr) (Just mark X in one of the boxes.) (1) I did not receive teacher training. (2) Below Pedagogical High School (3) Pedagogical High School (9+3). (4) Pedagogical High School (12+2). (5) Pedagogical College (12+3). (6) Pedagogical University (12+4). 10. Did you receive special training at ministerial or departmental level on Education Management? (smanage) (Just mark X in one of the boxes.) No Yes. I finished a training course at ministerial or departmental level. 11. For a total of how many years have you been teaching (including years you serve as headmaster)? (Syrteach) (Write a number in the boxes. Round up to 1 if less than 1) years 12. In a normal week, how many periods do you teach? (snperiod) (Write a number in the boxes. Write 0 if you do not teach.) period(s)/week 13. For how many minutes do you teach in a period? (sminteac) (Write a number in the boxes.) minutes/period 14. For how many years have you been the headmaster of this elementary school (or deputy head master in charge of elementary group in this primary school)? (syrhead) 422 Vietnam Reading and Mathematics Assessment Study (Write a number in the boxes. Include this school year as well.) year(s) 15. What is the accumulated numbers of years that you have ever served as a headmaster (or deputy head master in charge of elementary group in a primary school)? (syrheada) (Write a number in the boxes. Include this school year as well.) year(s) QUESTIONS ON LIVING CONDITIONS 16. What kind of accommodation is yours? (saccomm) (Just mark X in one of the boxes.) (1) Shared accommodation with parent(s). (2) Rented accommodation (3) Condominium apartment (4) Private house 17. Which of the following items exist in your home? (Mark X in the corresponding box for each of the items.) Yes 17.01 Daily newspaper (Shposnew) 17.02 Weekly or monthly magazine (Shposmag) 17.03 Radio (Shposrad) 17.04 Television (Shpostv) 17.05 VCR (Shposvcr) 17.06 Cassette (Shposcas) 17.07 Telephone (Shpostel) 17.08 Refrigerator (Shposref) 17.09 Motorbike (Shposmot) 17.10 Bicycle (Shposbic) 17.11 Running water (Shposwat) 17.12 Electricity (from grid, generator) (Shposelc) 17.13 Desk (Shpostab) 17.14 Bookcase (Shposbkc) 423 Vietnam Reading and Mathematics Assessment Study 17.15 Watch (Shposwth) 17.16 Desk-lamp (Shposlmp) 17.17 Wardrobe (Shposwrb) 17.18 Cooker (electrical or gas) (Shposckr) 17.19 Washing Machine (Shposwam) 17.20 Computer (Shposcom) 18. Are you happy about the conditions regarding your accommodation? (Ssatacc) (Just mark X in one of the boxes.) Not happy Happy Very happy QUESTIONS ON YOUR SCHOOL 19. Is your school a state, a semi-public, or a private school? (Sstype) (Just mark X in one of the boxes.) State Semi-public Private 20. Does your school have only one location or does it have satellite campuses? (Smainsat) (Just mark X in one of the boxes.) just one location with satellite campuses 21. In what year was your school established? (Syrbegin) (Write the year in the boxes below. Guess-estimate if the school is more than five years old and you do not know for sure.) 22. What is the distance from your school to the following locations in terms of kilometres? (Write a number in the corresponding boxes for each of the locations. Round up to 1 if the distance is less than 1 km.) 22.1 the nearest health centre/clinic (Sdisclin) km 22.2 the nearest road accessible by car (Sdisroad) km 22.3 the nearest public library (Sdislibr) km 424 Vietnam Reading and Mathematics Assessment Study 22.4 the nearest bookstore (Sdisbook) km 22.5 the nearest junior secondary school (Sdissecs) (Write 0 if your school does provide junior secondary grades) km 22.6 the nearest market (Sdismak) km 23. Where is your school located? (Slocatio) (Just mark X in one of the boxes.) in a remote region in a rural region in a quasi-town in a town or city QUESTIONS ON TEACHERS IN THE SCHOOL 24. How many (permanent and temporary) teachers are working in your school this particular week? (Write a number in the corresponding boxes for each type of teachers. If the school does not have any teachers of the kind, write 0. If the number is a unit, make sure to include a zero just before that figure (for example, write "06" in stead of "6"). Add up the total number of teachers and write it in the bottom line of boxes. If you do teach, include yourself.) 24 Number of male permanent teacher(s) (Snpermmt) 25 Number of female permanent teacher(s) (Snpermft) 26 Number of male temporary teacher(s) (Sntempmt) 27 Number of female temporary teacher(s) (Sntempft) 28 Total number of teachers (Sntottch) 25. How many teachers in your school have the following educational attainments? (Write a number in the boxes. Each teacher is considered according to his/her highest attained education. Write 0 if there is no teacher of the type. Make sure that the total number is the same as the total in the question 24.) 25.1 Elementary or equivalent (Snpredt) teacher(s) 25.2 Junior secondary or equivalent (Snlsedt) teacher(s) 425 Vietnam Reading and Mathematics Assessment Study 25.3 Senior secondary or equivalent (Snusedt) teacher(s) 25.4 Tertiary (Snunedt) teacher(s) 25.5 Total number of teachers (Snedtott) 26. How many teachers in your school have finished the following teacher training programmes/courses? (Write a number in the boxes. Each teacher is considered according to his/her highest qualification. Write 0 if there is no teacher of the type. Make sure that the total number is the same as the total in the question 24.) 26.1 With no teacher training (Snttnone) teacher(s) 26.2 Below Pedagogical High School (Snttls) teacher(s) 26.3 Pedagogical High School (9+3) (Sntt9+3) teacher(s) 26.4 Pedagogical High School (12+2) (Sntt12+2) teacher(s) 26.5 Pedagogical College (12+3) (Sntt12+3) teacher(s) 26.6 Pedagogical University (12+4) (Snttuniv) teacher(s) 26.7 Total number of teachers (Sntttott) QUESTIONS ON CLASSES AND STUDENTS 27. How many students are studying in your school? (Write a number in the boxes. Use the total number of students, including those who are absent for this week.) 27.1 Male (Senrlboy) 27.2 female (Senrlgir) 426 Vietnam Reading and Mathematics Assessment Study 28. How many of the students are in grade 5? (Write a number in the boxes. Use the total number of students, including those who are absent for this week.) 28.1 Male students in 5th grade (Seng5boy) 28.2 Female students in 5th grade (Seng5gir) 29. How many classes are there in your school? (Snclatot) (Write a number in the boxes.) classes 30. How many 5th grade classes are there in your school? (Snclg5t) (Write a number in the boxes.) 5th grade classes QUESTIONS ON SCHOOL ACTIVITIES 31. How many shifts of classes (apart from classes for adults) are there? (If there is only one shift, just fill in the first row. If there are two shifts, fill in the first and the second row. If students attend 6 to 8 sessions per week, treat them as whole-day shift.) Number of students Number of classes 31.1 Morning-Shift 1 (a.m.) (Sshft1pu) (Sshft1cl) 31.2 Afternoon-Shift 2 (p.m.) (Sshft2pu) (Sshft2cl) 31.3 3rd-Shift 3 (Sshft3pu) (Sshft3cl) 31.4 Whole-day shift (Sshft4pu) (Sshft4cl) 31.5 How many satellite campuses does your school have? (Snsat) (Just mark X in one of the boxes.) no satellite campuses 1-2 satellite campus(es) 3-4 satellite campuses 427 Vietnam Reading and Mathematics Assessment Study 5-6 satellite campuses 7-8 satellite campuses 9-10 satellite campuses more than 10 satellite campuses 32. When was the last time your school received a comprehensiveinspection? (Syrinspe) (Just mark X in one of the boxes.) The school has never received a comprehensive inspection. Last inspection was prior to 1996. 1996 1997 1998 1999 2000 33. How many times has the school been visited by an inspection team since September, 1999? (Stinspe) (Write a number in the boxes. Write 0 if it has never been visited by an inspector.) times 35. How many times has the school been visited by an inspection team since September, 1999 upon the following purposes? (Write a number in each row of boxes. If the purpose of the inspection is many folds, count it for all of the purposes.) 35.1 comprehensive inspection (Stfinsp) times 35.2 thematic inspection (e.g. on mathematics) (Stsminsp) times 35.3 teacher inspection (Stininsp) times 35.4 to assist improvement in teaching skills (Statinsp) times 35.5 to provide advice to headmaster and/or other key managers (Stahinsp) times 35.6 to address crises or problems in the school (Stcrinsp) times 35.7 others (Stnoinsp) times 428 Vietnam Reading and Mathematics Assessment Study 36. From your perspective as a headmaster, how do you think of the importance of the following activities? (Just mark X in the appropriate box for each activity.) not important important very important 36.1 PR with the local community (Sactcomm) 36.2 Monitoring student progress (Sactprog) 36.3 Management tasks related to school functions (Sactadmi) 36.4 Observation of teacher delivery and discussion that follows (Sactless) 36.5 Activities aimed at the teacher's professional development (Sactprof) 36.6 Activities aimed at the headmaster's capacity development (Sactsede) 36.7 Extracurricular activities (Sactexcu) 36.8 Taking care of school facilities and resources and teachers' living conditions (Sactscon) 36.9 "Excellent Teacher" and "Excellent Student" contests (Sactcomp) 37. Among the nine activities mentioned above, select and rank three most important ones. (Write the corresponding number of the reason as listed in the previous question: 36_ in the boxes below.) 37.1 36 most important activity (Sact1) 37.2 36 second most important activity (Sact2) 37.3 36 third most important activity (Sact3) 38. Which of the following activities does your school hold? (Mark X in the appropriate box for each activity.) no once/year more than once 38.1 produce a school magazine (Sprogmag) 38.2 story telling contests (Sprogcom) 38.3 quiz (Sprogqui) 38.4 camping (Sprogexp) 38.5 annual school festival (Sprogcel) 38.6 education conference (Sprogcnl) 429 Vietnam Reading and Mathematics Assessment Study 39. How often do you have to deal with the following kinds/types of behaviour by pupils? (Mark X in the appropriate box.) never sometimes often 39.1 Late arrivals (Sbhplate) 39.2 Absenteeism (with no due reasons) (Sbhpabst) 39.3 Skipping Classes (Sbhpskcl) 39.4 Drop-out (Sbhpdrop) 39.5 Disturbance and trouble making in class (Sbhpdist) 39.6 Lying (Sbhpchea) 39.7 Swearing (Sbhpabus) 39.8 Vandalism (Sbhpvand) 39.9 Theft (Sbhpthef) 39.10 Bully (Sbhpbull) 39.11 Bullying or blackmail of teacher(s) (Sbhpinti) 39.12 Drug abuse (Sbhpdrug) 39.13 Fight (Sbhpfght) 39.14 Health problems (Sbhphlth) 40. Does the school often have to deal with the following kinds of behaviour by the teachers? (Mark X in the appropriate box.) never sometimes often 40.1 Late arrivals (Sbhtlate) 40.2 Absenteeism (with no due reasons) (Sbhtabst) 40.3 Skipped periods (Sbhtskcl) 40.4 Threatening or intimidating students (Sbhtbull) 40.5 Division/lack of solidarity (Sbhtnoco) 40.6 Swearing (Sbhtabus) 40.7 Drug abuse (Sbhtdrug) 40.8 Alcoholism (Sbhtalco) 40.9 Health problems (Sbhthlth) 430 Vietnam Reading and Mathematics Assessment Study 41. How many days of teaching and learning in your school were lost due to such causes as a delayed term, examination rounds, school festivals, ceremonies, typhoons, floods, etc. in the last school year? (Slostday) (Write a number in the boxes below. Write 0 if no day was lost.) day(s) QUESTIONS ON SCHOOL FACILITIES 42. How many of the following types of classrooms does your school have? (Write a number in the boxes.) 42.1 permanent classroom(s) (Scrmperm) 42.2 semi-permanent classroom(s) (Scrmsemi) 42.3 temporary classroom(s) (Scrmtemp) 43. What are the combined areas of permanent classrooms, semi-permanent classrooms, and temporary classrooms, respectively in terms of square metres? (Write a number in the boxes for each type of classrooms. Do not include open-air area that may be used for teaching.) 43.1 permanent classroom(s) (Sareaper) m2 43.1 semi-permanent classroom(s) (Sareasem) m2 43.2 temporary classroom(s) (Sareatem) m2 44. What are the areas of playground and subject rooms in terms of square metres? (Write a number in the boxes.) 44.1 playground(s) (Ssqplayg) m2 44.2 subject room(s) (Ssqspurp) m2 45. What is the overall condition of the classrooms? (Sbldgcon) (Just mark X in the appropriate box.) The whole school needs reconstruction. Some of the classrooms need major repair. Most of the classrooms need minor repair. Some of the classrooms need minor repair. They are in good conditions. 431 Vietnam Reading and Mathematics Assessment Study 46. Do not use this question How many toilets are there for boy students, girl students and staff, respectively? (Write in the boxes. If the answer is 0, just put 0.) Number of toilets for 46.1 boy students 46.2 girl students 46.3 staff 47. Does the school have the following facilities? (Mark X in the appropriate boxes.) yes 47.01 school library (Sreslibr) 47.02 meeting hall (Sreshall) 47.03 staff room (Sresstaf) 47.04 headmaster room (Sreshead) 47.05 archives room (as separate from head master room) (Sresstor) 47.06 school health room (Sresmedi) 47.07 emergency medical kit (Sresfaid) 47.08 sport/play amenities (Sresspor) 47.09 running water/storage tank/well/spring (Sreswatr) 47.10 drinking water (Sresdwat) 47.11 electricity (from generator/ grid) (Sreselec) 47.12 telephone (Srestele) 47.13 facsimile (Sresfax) 47.14 school garden (Sresgard) 47.15 typing machine (Srestype) 47.16 duplicating machine (Sresdupl) 47.17 amplifier (Sresmike) 47.18 tape recorder (Srestape) 47.19 overhead projector (Sresohp) 432 Vietnam Reading and Mathematics Assessment Study 47.20 television (Srestv) 47.21 video cassette recorder (Sresvcr) 47.22 photocopying machine (Sresphot) 47.23 computer (Srescomp) 47.24 fence surrounding the school (Sresfenc) 47.25 canteen (Srescafe) 47.26 musical instrument (Sresmeq) 47.27 sport implement (Sresseq) NOTE: the school library is the place where students of all classes and grades can come to read and borrow books. 48. Can students borrow books and bring them home? (Sborrow) (Just mark X in one of the boxes.) We do not have a school library. We do not lend books. We do lend books. 49. Do students' parents and/or the community make contributions to the school with respect to the following activities? (Mark X in the appropriate boxes.) yes 49.01 Building facilities for the school (such as classrooms,housing for teachers) (Sparbld) 49.02 Maintenance of facilities (classrooms, teacher housing) (Sparmain) 49.03 Purchase, maintenance, or repair of furniture and equipment (Sparfurn) 49.04 Purchase of textbooks (Sparbook) 49.05 Purchase of stationery (Sparstat) 49.06 Paying salaries to temporary staff (Spartsal) 49.07 Paying extra compensation to teachers apart from standard salaries (Spartads) 49.08 Paying salaries to supportive staff (Sparnsal) 49.09 Paying extra compensation to supportive staff (Sparnads) 49.10 Extracurricular activities including excursions (Sparexcu) 433 Vietnam Reading and Mathematics Assessment Study 49.11 Assisting in the teaching and/or watching of Students without being paid (Sparatch) 49.12 Providing meals at school (Sparmeal) 49.13 Providing land to develop the school and teacher housing (Sparhous) 49.14 Providing cultivation land to teachers (Sparfarm) 50. The purpose of this question is to identify needs for further training and development of elementary school teachers with a view to improving their knowledge and delivering skills. The description of competence according to the 5 levels applicable to this question serves as a basis for designing appropriate training courses for teachers. Please read carefully the description for each level as well as the nine specific aspects of competence and mark the description closest to the level of competence by 5th-grade teachers in your school. Description of the 5 levels: Pass: the teacher demonstrates the stated competence in his/her performance of duties in school Fair: the teacher has the basic knowledge, skills and experience so the performance is satisfactory and meet the requirements by the school on the stated competence. Good: apart from the basic knowledge and skills, the teacher must have additional in-depth knowledge, skills and experience related to the stated competence. The teacher not only has the awareness of but also clearly understands the causes of his/her shortcomings and takes conscious measures to overcome his/her shortcomings with regards to the competence while unceasingly improving the professional competence in question. Very good: the teacher reaches this level when his/her knowledge, skills and experience with regards to the stated competence are highly appreciated by colleagues. A very good teacher has to lead in activities related to the competence, be able to assist and guide other teachers in the same school and in other schools to make improvements in the competence. A very good teacher needs to have thorough understanding of the competence, be able to shoulder key responsibilities such as proposal and implementation of new initiatives, methods, and programmes that seek to improve that competence in the school. Excellent: this is the highest level that a teacher could reach with regards to the stated competence. Only a small number of teachers can reach this level, typically as a result of accomplished in-depth training and perfection with many years of experience in the education profession. A teacher of this type can hold the responsibility of developing the competence in question in the organisation and can be classified as attaining a national standard in guiding the quality improvement of the competence. 434 Vietnam Reading and Mathematics Assessment Study Pass Fair Good Very Excellent Good 50.1 apply the education guidelines and policies by the Party and the Government in the teaching and education of students (1) (2) (3) (4) (5) (Sproprty) 50.2 do self-assessment in order to improve professional competence on a continual basis (Sprosimp) (1) (2) (3) (4) (5) 50.3 mobilise students' parents and the community to participate in educating students (Sprocomm) (1) (2) (3) (4) (5) 50.4 listen to students' expression of opinions and support them to help each other (Sprossup) (1) (2) (3) (4) (5) 50.5 have demonstrated improvements in pedagogical level, knowledge and skills over the past years (Sprocimp) (1) (2) (3) (4) (5) 50.6 have enough knowledge to teach all subjects in the curriculum (such subjects as Music, Arts, Physical Education (1) (2) (3) (4) (5) can be excluded) (Sprocurr) 50.7 be able to assist colleagues to apply advanced teaching methods (Sprocoll) (1) (2) (3) (4) (5) 50.8 be able to apply psychological knowledge in order to design teaching and learning activities that are appropriate to different (1) (2) (3) (4) (5) stages of development of individual students (Spropsyc) 50.9 be competent in classifying students according to their cognitive levels, to meet the different needs of good and weak (1) (2) (3) (4) (5) students (Sproable) 50.10 be competent in using advanced techniques correctly to identify and nurture gifted and talented students (Sprotaln) (1) (2) (3) (4) (5) 50.11 be competent in proposing delivery methods, teaching aids, as well as in time management for each lesson/session (Sproobjs) (1) (2) (3) (4) (5) 50.12 organise and deliver lessons in such a way that each student understands and participates actively in the learning (Sproflex) (1) (2) (3) (4) (5) Pass Fair Good Very Excellent Good 50.13 be able to use a multitude of delivery methods including discussion, experiments, modelling, using pictures, games, etc. (1) (2) (3) (4) (5) (Sproaids) 50.14 encourage students to ask and answer questions as an activity (1) (2) (3) (4) (5) in the learning process (Sprotcqu) 50.15 be competent in using different questioning techniques to (1) (2) (3) (4) (5) direct and develop students' inference (Sproques) 435 Vietnam Reading and Mathematics Assessment Study Pass Fair Good Very Excellent Good 50.16 use different assessment methods to train and improve students' (1) (2) (3) (4) (5) skills (Sprometh) 50.17 use assessment results and data to select appropriate teaching (1) (2) (3) (4) (5) and learning methods for the individual student (Sprodata) 50.18 maintain order and discipline in class while being flexible to (1) (2) (3) (4) (5) hold diversified activities for students (Spromang) 50.19 take into consideration students' questions and opinions in the (1) (2) (3) (4) (5) design and development of lesson plans. (Sproinpt) Please check to make sure that you have filled in all of the items. Thank you for completing the questionnaire. 436 Vietnam Reading and Mathematics Assessment Study Appendix 1.4: Description of ID system used in the study The IDs of schools in this data consisted of eight digits. The first seven digits were the same as the ID codes which were used in the population census conducted by General Statistics Office (GSO) in 1999. Region Province District Commune School The first digit indicates "regions." The second and the third digits indicate provinces within each region. The fourth and the fifth digits indicates districts and the sixth and the seventh digits indicate communes. The last digits indicates schools. 437 Vietnam Reading and Mathematics Assessment Study Appendix 1.5: Manual for coordinators VIETNAM CO-ORDINATION MANUAL Grade 5 National Study of Reading and Mathematics 2001 438 Vietnam Reading and Mathematics Assessment Study CO-ORDINATION MANUAL FOR THE MAIN STUDY Introduction I n April, 2001, Vietnam will undertake a very large-scale study of Reading and Mathematics achievement in all provinces. The study has been developed around a series of policy research questions designated by the Ministry of Education and Training (MOET) and this study is an official undertaking of the MOET. The piloting of all test items and questionnaire items took place in October, 2000. The revision of all questionnaires was made in November, 2000. The initial revision of the tests was done in November and a very final version will be ready by 15 December, 2000. The probability sample of schools was drawn in November, 2000. There are approximately 60 schools that have been drawn in each of the 61 provinces, making a total of nearly 3,660 schools. The outlines of the analyses have been developed and all dummy tables for the study have been written. The quality of the study will now depend on the layout and printing of all instruments, the co-operation of the schools and provinces and the care needed in data collection to ensure that there are as few as possible missing data. After that the data have to be entered, cleaned, weighted, and analysed and finally the results written up. This manual is intended as a guide to those in charge of the study to ensure that all is well planned and that the execution of the study runs smoothly. Some basic facts about the study The main data collection will take place on 12 and 13 April 2001. The first day will be used to establish the pupils to be tested within each school and organising the testing room. The second day will be the day when the questionnaires and tests are completed by the pupils, teachers, and school heads. Within each school 20 Grade 5 children will be drawn at random. There will be no replacement of missing schools or missing pupils. Two Grade 5 teachers will be selected at random by the researchers for completing the teacher questionnaires and tests. Data collectors will be District Education Office personnel but where more schools have to be tested in a district than there are district education officers then supplementary personnel, for data collection have to be drafted in. NO personnel form the schools should be used. 439 Vietnam Reading and Mathematics Assessment Study The training of the data collectors will be done in two waves. The first will be the training of 3 Provincial co-ordinators from each province. This will be organised in five sessions in different parts of the country. The second wave will be the training of the district education officers and supplementary personnel in two sessions within each province. There are a number of tasks that you will need to undertake in order to ensure that the Vietnamese Pilot Study runs smoothly. These are now explained but there is a checklist at the end of the manual. Please read this manual carefully. As you complete each task, make a tick in the appropriate checkbox in the checklist. Please make certain that you complete ALL tasks! Master copies of Instruments You should have a master copy of each of the following materials for the Pilot Survey. Number of Instrument Title Master copies number 1. 1 - Co-ordination Manual (this manual) 2. 1 - Manual for Data Collectors 3. 1 01 School Form 4. 1 02 Pupil Name Form 5. 1 03 Pupil Booklet: Questionnaire, Pupil Mathematics Test and Pupil Reading Test 6. 1 04 Teacher Booklet: Questionnaire, Teacher Mathematics Test and Teacher Reading Test 7. 1 05 School Head Questionnaire Task 1: Check instruments: Ensure that you have all of the materials listed above. Check that all materials have been provided. You will receive the reading and math tests separately. For the math items, use the item analyses from the pilot math tests to ensure that the items are in graded difficulty from easy to difficult. Check that no items are half on one page and half on another. The entire item should be on the same page. Check again that the items 'make sense' and if there are any doubts, go back to the test constructors and ask for changes. Task 1(a): Check translation of all instruments: The instruments were prepared in both English and Vietnamese. Please conduct a further check on the translation from Vietnamese to English according to the procedures in Appendix A as soon as possible. It is most important that the procedures in the appendix be exactly followed to ensure that the instruments are comparable so that those only speaking English know the specific content of each item and question. Task 2: Assemble the booklets: Assemble the questionnaires and tests into a Pupil Booklet and a Teacher Booklet. Within each booklet have separate cover pages for each of the instruments. Ensure that the page numbering system you have used is appropriate. These are the final booklets that you will be handing to the printer for printing. Task 3: Ministry blessing and letters from the Minister to province and school heads: It is essential that all involved in the study understand that this is an official Ministry study. To this end it is important to have a letter sent from the Minister or Vice-Minister to each provincial education 440 Vietnam Reading and Mathematics Assessment Study head informing that person about the study and informing him of which schools in his or her province have been selected in the national sample. At the same time, it is important to have an official letter (either from the Ministry or from the researchers but in the name of the ministry) going to each school head of a selected school informing her or him of the fact that the school has been drawn in a national sample and what this will involve for the school. A School form should be enclosed for the school head to complete and return. Since there is often a delay in having official letters prepared and dispatched, the researchers should take up this matter as soon as possible (and, if necessary, draft a letter for signature). Task 4: Contracts and hardware: A contract will be needed for the printing which will be a huge job. It may be possible to build into the contract the despatch of materials from the printing house to the provincial centres. Whatever is the case, the co-ordinator must ensure that there is a contract, that there is room for storage of the materials and that there is a way of getting the materials out throughout the country and getting them back to a central place. This will require a contract for printing. It will require a contract for despatch and return of materials. It will require a very large warehouse in which to store the materials. It would be desirable if the warehouse was such that the instrument checking at the central level and the data entry could also be conducted in the warehouse. The warehouse must be large enough to house 74000 pupil booklets (plus spares), about 8000 teacher booklets, about 4000 school head questionnaires, 4000 school forms and 4000 pupil forms. They should be stored in a systematic way such that any one booklet can be retrieved easily. Task 5: Organising the benchmarks: In the pupil, teacher and school questionnaires there are several questions about the existence or not of material and human resources in the schools. For several of these there are existing Ministry benchmarks. Suggestions of the benchmarks to be used have been given in Chapter 7 of the proposed report. The co-ordinator should review these suggestions as well as the questions in the questionnaires together with the appropriate persons in the Ministry in order to decide on the final list of benchmarks to be used in the study. Of course, no benchmarks can be used for which there is not information in the questionnaires. Where it is believed that the information is important but there are no official benchmarks then it is possible for the researchers and Ministry personnel to develop what they believe are reasonable benchmarks for Vietnam. The benchmark information to be used in the analyses proposed in Chapter 7 should be written onto one sheet of paper, each given a variable label and given to those responsible for the data entry. The co-ordinator must ensure that these data are entered into the DEM structure file. The information should also be given to Messrs. Griffin, Postlethwaite and Ross . Chris and Miyako: Task 6 was never conducted as it was stated here. Rather Patrick had his groups and they did the exercise for the skill levels and then for the benchmarks. Maybe we should ask Patrick to write this section so that it is carried out in an identical way next time. So, I suggest that you ask Patrick to replace Task 6 below. Task 6: Organising the establishment of minimal and desirable levels of mastery for pupils in mathematics and reading: The co-ordinator should form two panels of subject-matter specialists and practising teachers. Within the Ministry there will be one or two persons known as specialists 441 Vietnam Reading and Mathematics Assessment Study for primary reading and another one or two persons specialists in primary school mathematics. It is also desirable that three reading teachers and three math teachers be selected. Of the three teachers one should come from a high achieving school, one from and average achieving school and one from a low achieving school. Thus, there are two groups: a math group consisting of the Ministry primary math specialists and three Grade 5 math teachers, and a reading group consisting of the Ministry primary reading specialists and the Grade 5 reading teachers. The first task for a group to undertake is to go through the test (reading or math according to the group) and select those items that they deem to be essential to have been mastered if the pupils are to be able to pursue their studies effectively in Grade 6. The second task is to go through those items now designated as essential and decide how many of them should be mastered if a pupil is to be classified as reaching the minimum mastery level. The third task is to go through the essential items again and decide how many should be mastered if a pupil is to be classified as reaching the desirable level of mastery. A form for each subject should be prepared in which the essential items are given and then the numbers required for minimum and desirable levels. An example of a form is given below for mathematics. Essential items - Mathematics Grade 5 survey Item number Essential Item 1 Item 2 Item 3 Item 4 Etc. to end of items Number required for minimum mastery = ?? Number required for desirable mastery = ?? Again the co-ordinator must ensure that this information is entered onto the DEM structure file. Copies of these forms should be made and filed so that they are easily available. Task 7: Knowing and checking the sample of schools: The list of the schools drawn in the sample has been given to Mr. Vinh. Several copies of this document should be made and filed. Copies should also be sent to Drs. Ross and Griffin. If any errors are discovered about any school when contacting provincial and school heads then these should be made known to Dr. Ross and his advice as to what to do should be followed. Task 8: Preparing the data collectors' manual: This manual should be prepared in detail. An outline has already been prepared but it should be checked and re-checked and modified. It should cover every eventuality that the data collectors are likely to encounter in the schools while conducting the data collection. Of particular importance is ensuring that all questions in 442 Vietnam Reading and Mathematics Assessment Study questionnaires have been answered and that there are not inconsistencies in the answers. To this end a special section should be prepared for insertion in the Data Collectors' manual of the likely inconsistencies for which the data collectors should look at when reviewing the answers to the questions before they leave the school. Where either questions have not been answered or inconsistent answers given, then the data collector MUST bring these to the attention of the respondents before leaving the school and have the non-completed questions completed and any inconsistencies made consistent. Preparing this section should begin as soon as possible. Task 9: Preparing the central training: It has been suggested that 3 persons from each province should be trained in the initial training. In all, this makes 183 persons. It is these people who will in turn train the data collectors in each province. It has further been suggested that these 183 persons will be split into five groups for training purposes. These plans may change as the work proceeds. If possible, use a parallel set of pupil tests for this training. What are the key points to be taken into account in the training? 9.1. Ensure that the ID numbering system has been worked out in advance and exactly how the Ids will be entered on to the booklets. If the IDs are to be entered by the data collectors then there must be training in this. 9.2. Sufficient copies of all instruments and school and pupil forms must be made available for the training. Probably the best place to check for missing or smudged or unusable pages in instruments is during the training sessions as the data collectors prepare their own 'bundle' of instruments for the school in which they will test. Hence, there should be a sufficient stock of spare copies of instruments at each training session. 9.3. The data collectors should be made familiar with the content of the instruments. Probably the best way to do this is to have them pretend to be Grade 5 pupils and answer the pupil tests and complete the pupil questionnaire. They pretend to be a Grade 5 teacher and complete the teacher questionnaire and tests and then pretend to be a school principal and complete that questionnaire. If there is not time for all of this then the co- ordinator must decide which selection of tests and questionnaires the trainees should answer. 9.4. Getting the co-operation of the schools. It is important the schools co-operate fully in the testing. An explanation should be given to trainees (for them to pass on in the schools, if necessary) as to why the testing is taken place at all (note that reasons will also have been given in the Minister's letter to the school heads that has already been sent out). It is not uncommon that school heads say that the random selection of pupils from their school is not a good one. The data collectors must say that this is possible but that a) the study is not interested in individual schools per se, and b) it can happen that the representation from one school will appear to be low but in another it can appear to be high. The same applies to the random selection of the two teachers within each school. However, the end result of the study is to identify the major weaknesses in material and human resources so that plans can be made to improve the situation. 9.5 Ensuring that all pupils know how to answer the multiple-choice items. This is a matter of the data collectors taking great care with the explanations to be given when the pupils do the practice items. An explanation must be made of what the pupil does if he changes his/her mind about the answer to an item after having already marked an answer. Here the pupil must cross out CLEARLY the first answer and then cross the answer that he now wishes to select. 443 Vietnam Reading and Mathematics Assessment Study 9.6 Full and consistent answers to all questionnaire items. The best procedure with pupils is to 'walk' them through the questionnaire. The data collector should say. "'Question 1 is about ??. Now cross the answer'. Then on to Question 2 and so on. While the pupils are answering, the data collector should walk round the classroom and ensure that the pupils are answering each and every question. It would be wise to show data collectors the kinds of errors that occurred in the pilot study (use some of the questionnaires with errors in them) and then lead them through the special section in detail on the types of inconsistent responses for which they should check. 9.7 Returning the completed instruments to a central point. A mechanism must be developed and tested for getting the completed instruments back to the warehouse in Ha Noi. What is important is that no completed school bundles are lost. 10. Planning the provincial training sessions: This is a training where the central personnel will have little control although it is hoped that there will be sufficient resources available to allow 20 persons from the central team to attend several of the training sessions within provinces.. The points to be taken care of in the training are the same as mentioned above under point 9. It is suggested that the provincial training be staggered so that at least for the first four training sessions in two provinces can be monitored by the central personnel and should problems occur they can be solved and the mechanism slightly amended for the other 118 training sessions. Do not forget that each data collector should also be provided with - additional spare forms of all of the above-mentioned forms; basic stationery in case some pupils come without to the testing room, including pencils, erasers, pencil sharpener. 11. Training those who will conduct the spot checks: It is suggested that at least two schools in each province be visited while the data collection is taking place. These should be surprise visits and the data collectors must not know in advance that they will be visited. At the same time all data collectors should be forewarned that the spot checks will be carried out and that they may well receive a visit. It is suggested that the persons to conduct the spot checks be selected by the co-ordinator and trained in the central training sessions. The co-ordinator should develop a detailed pro-forma to be completed by each person making a spot check. These should be returned to the co-ordinator. 12. Checking the instruments at the district level: The best place to check the fullness of the data return is at the district office. This is because if there are any instruments missing or incomplete questions it is easy for the district office to get back to the school for the missing instruments or data. Checks should be made that: a) all instruments have been returned; b) all IDs are correct; c) all questions in questionnaires have been completed; d) there are no inconsistent responses. 444 Vietnam Reading and Mathematics Assessment Study Where there are problems with any of the above points then the district office should go straight back to the school and have the matters corrected. 13. Planning for the return of instruments to the warehouse: There will be a flood of instruments being returned from district or provincial offices to the warehouse within the space of a few days. Arrangements must be made to receive the instruments, catalogue them in, and file them in the appropriate place in the warehouse. It would seem to be wise to file them in such a way that all schools from a particular province are in one place and also subdivided by district. The first task after filing and cataloguing all returned bundles will be to check them manually. This involves checking that the bundle for each school has been returned. The non-arrival of bundles must result in a chase being instituted to locate any missing bundle and getting it the warehouse. The second task is to check that the required number of pupil booklets has been returned. There will be a Pupil Name Form with each bundle and therefore it can be seen how many completed booklets there should be from any one school. The third task is to check that full and accurate IDs are on each instrument. Where this is not the case then the ID should be checked and written in clearly. The fourth task is to check that all questions on questionnaires have been answered. Where there is missing data, then a mechanism must be in place to get the missing data from the pupil, teacher, or school head. When a bundle has been completed then it can go for data entry. Again a catalogue should be kept of which bundles have been passed over for data entry and also when it is returned. What must be avoided is that there are some bundles that end up not being entered or bundles that have been entered twice. About ?? people should be used for the manual checking and trained before the instruments begin to arrive. 14. Preparing and checking the DEM structure file: The DEM structure file can be prepared any time after January i.e. as soon as the final versions of instruments are ready. Apart from the instruments, do not forget the Pupil Name forms, the enumeration unit ID at the beginning of the Pupil questionnaire, the benchmark information, and the test items designated as essential. The enumeration ID may pose a problem. What is needed is a list of the enumeration Ids so that when the address of the pupil and his commune and district are read, the ID can be written onto the front of the pupil questionnaire and then later entered at the point of data entry. It is suggested that a check be made on the DEM file by the unit at IIEP. The structure file should be sent for a first check as soon as it has been prepared in Ha Noi. A final check can only be made when someone in Paris tries to enter data. To this end it would be useful also to photocopy about 250 pupil booklets about 20 teacher booklets and about 20 school head questionnaires of the first sets of bundles that arrive and send them to Paris. 15. Selecting and training the data enterers: The same procedures should be used as for the data entry of the pilot data. About 20 data enterers will be needed. They will work in two shifts of four hours each each day. There should be 10 good machines and probably a spare one. WINDEM should be installed on each machine. It is estimated that the data can be entered by mid-June but this may be somewhat optimistic. Again there should be one person who is responsible for cataloguing each bundle that comes for 445 Vietnam Reading and Mathematics Assessment Study data entry and checking it off as it has been completed and the bundle goes back to the storage part of the warehouse. 16. Cleaning the data: As the data entry is coming to an end, an international workshop will be held by IIEP in Vietnam on data cleaning. In theory, if all of the checks have been made that have been mentioned above and if the data entry has been well-conducted then no cleaning will be required. Unfortunately, experience suggests that no data set is ever absolutely clean. Some are very unclean. At the workshop and following it a series of small programs will be developed to deal with inconsistencies and the imputation of missing data. The cleaning in Vietnam will take place in June/July/August and it is hoped that this will be sufficient time. 17. Calculation of sampling weights and derived variables: As soon as a clean data set is available, it should be sent to Ken Ross so that he can calculate the sampling weights and add them to the file. At the same time it is hoped that Prof. Patrick Griffin can come to Vietnam and work with the Vietnamese in order to produce the Rasch scores for the achievement scores and also for designated independent variables (see Recodes for Vietnam document) and then add the scores to the file. 18. Data analyses: The dummy tables have been prepared and a copy is with Thanh at the World Bank in Vietnam. Using IIEP/JACK, the Vietnamese team should compute the data for the dummy tables and then prepare the completed tables. At the same time this should also be done by Postlethwaite and Ross and Griffin. It is hoped that an international workshop can be held near the beginning of the data analysis phase of the work so that the Vietnamese data can be used for learning computing the analyses, interpreting the results and turning the results into policy recommendations. 19. Writing-up the results: A preliminary document has been prepared by Postlethwaite. This so far consists of 12 chapters, the first two of which are general and the rest present results. At present they have been written in outline form down to and including the dummy tables for each section of each chapter. There are too many analyses to expect that they can all be done quickly. However, a selection can be made about which analyses should have priority in Vietnam and these could be done first. For example, for the international workshop it might be of interest to select the theme of the equitable distribution of human and material resources in Vietnam and use those dummy tables first. A second priority might be the identification of well achieved objectives, averagely achieved objectives and poorly achieved objectives. The co-ordinator should obtain these chapters from Thanh, read them and suggest modifications where appropriate. The he/she should make a priority about the order in which the analyses might be done. At the same time he should amend the text where there are errors or insufficient information. 20. Planning and dovetailing all of the activities: The co-ordinator should work out a detailed plan of the above activities stating the amount of personnel and money required for each activity, who will be responsible for each activity, and when it should begin and end. A tentative timetable has been presented on the next two pages but this will need to be amended. A check list should be produced so that the co-ordinator does not forget a task. It is easy to do with so very many different things to be taken into account at the same time. 446 Vietnam Reading and Mathematics Assessment Study Tentative timetable for Grade 5 survey Note: the following timetable must be reviewed and altered where required. No account of the acquisition and disbursement of funds or the visibility of expenditure has been mentioned and it is for those responsible for the survey to ensure that this is done. Dates Action Person(s) responsible 2000 By Nov. 15 a) Finalisation of English version of the pupil, teacher, and school a) Neville P questionnaires with variable names b) Finalisation of draft timetable b) Neville P c) Finalisation of first draft of two manuals c) Neville P d) Finalisation of final list of schools in the sample d) Ken Ross Nov. 20-25 a) Produce Vietnamese versions of the above a) Thanh b) Read and check and modify, where required, the chapters for the report b) Thanh c) Finalise derived variables and recodes c) Neville P d) Draft letters from Minister to province heads and school heads d) Mr. Vinh e) Negotiate use of warehouse for 2001 e) Mr. Chi f) Begin discussion of contract of the printing and distribution of all test f) Mr. Chi instruments to provincial level Nov. 27-Dec.1 a) Typing and layout of questionnaires in Vietnamese a) Thanh b) Organise meeting with the appropriate persons in Ministry to identify the b) Thanh, Chi, Vinh benchmarks to be used (see Chapter 7 and questionnaires). Have list of benchmarks where values can be entered in for data analysis. Dec. 4-8 a) Write up the above and continue the negotiations for the contracts needed a) Chi b) Finalise School and pupil forms and manuals b) Vinh Dec. 11-15 a) Finalisation of tests for pupils and teachers a) Patrick G b) Control tests b) Test comm c) Further work on chapters c) Neville P d) Organising the writing up of the i) sampling, ii) questionnaires construction d) Neville P and iii) test construction Rest of December Finalise all booklets for pupils, teachers and School head. The order within Thanh each booklet should be: i) questionnaire, ii) math test and, iii) reading test. Send to printer together with school and pupil forms and manuals 2001 January a) Have Minister's letters to all province heads and all school heads a) Chi/Vinh b) Organise District Officers' training and inform district officers of the b) Vinh arrangements c) Organise central training and inform the relevant people as well as the spot c) Vinh checkers d) Prepare DEM structure files. d) Vinh team/Paris IIEP 447 Vietnam Reading and Mathematics Assessment Study Feb. to 25 March a) enter IDs on the instruments a) Vinh team b) Send instruments to provinces b) Vinh team c) Arrange contract for computers for data entry c) Chi 25 March - 5 April Central training in four sessions of three days each Vinh April 9-11 Training of data collectors in each region Vinh April 12-13 Main Data Collection April 16-21 District officers' manual check of instruments and questions and return to District officers under the schools to collect missing data guidance of Vinh Apr. 23 - 27 a) Return of all instruments to Ha Noi a) District officers b) Have computers installed b) Chi/Vinh May 1 - June 15 a) Check instruments in Ha Noi a) Vinh team b) Train 50 data enterers b) Vinh c) Enter data under supervision of Vinh c) Vinh team Around mid-July One week workshop on data cleaning Ministry and IIEP 15 July to end year Data cleaning Vinh team 2002 January-February a) Calculation and adding to file of sample weights a) Ken Ross b) Compute Rasch scores and add to file b) Patrick G c) Rasch selected variables (never done) c) Vinh d) Organising the writing up of the data cleaning February-April Data analysis for completing dummy tables Vinh team and Miyako February-May Writing up by chapter Neville P. Patrick G, and Miyako Ikeda June onwards Organising the publication of parts of the write-ups Chris Shaw and Ministry 448 Vietnam Reading and Mathematics Assessment Study CHECK LIST FOR THE CO-ORDINATION CENTRE Task Action 1 Check master copies of all instruments 1a Check that both language versions accord 2 Assemble the booklets 3 Prepare letters from Ministry to provincial and school heads 4 Organise and prepare contracts and renting of buildings 5 Organise the benchmarks 6 Organise the establishment of minimal and desirable levels of mastery for both subject matters 7 Know and check the sample of schools 8 Prepare the data collectors manual and the School and Pupil Name forms 9 Prepare the central training 10 Plan the provincial training sessions 11 Select and train the spot checkers. Prepare list of schools to be spot checked 12 Check the instruments at the district level 13 Plan the return of instruments to the warehouse 14 Prepare and check the DEM structure file 15 Select and train the data enterers 16 Clean the data sets 17 Calculate the sampling weights and add to file. Calculate the derived variables and add to file. Do recodes. 18 Analyse data 19 Write up the results 20 Plan and dovetail all of the above activities 449 Vietnam Reading and Mathematics Assessment Study TRANSLATION OF INSTRUMENTS (a) Translation tips It is important that the tests and questionnaires exist in both Vietnamese and English so that whomever is undertaking different parts of the data analyses knows exactly what is involved with the different items and questions with which they are working. To this end not only should the sense be identical but also the level of language difficulty used. The main issue in translation is to achieve equivalence of difficulty in the language into which you are translating and the English version of the Test. This is vital but, unfortunately, there is no simple or foolproof way of ensuring equivalence. The tips listed below may be of help. . The translator must make every effort to pitch the words and sentence structures at the same difficulty level in the language into which they are translating as in the other language version. This not an easy matter. . If a question is a paraphrase item, the translator must ensure that the translated item is indeed paraphrase and not verbatim (i.e. appropriate synonyms must be found) . It is useful to examine differences in word frequency (or vocabulary load) in the part of the passage to which the item refers. Useful lists of frequency counts are available for many languages. If such lists are not available, Curriculum Centres often have such lists (typical words used at different grade/standard levels) that they have compiled for help in writing their curriculum materials for different grade levels. . Independent translations should be made by at least two different expert translators familiar with age appropriate linguistic demands. In cases of disagreement, consensus should be achieved either by direct negotiation between the two translators or by a third expert making the final choice. . Particular attention should be paid to how the wording of questions matches the wording of the text in the relevant sections of text. . Although the math items should not contain too many words, there are some items that do include quite a few words. These should be translated as simply as possible since the math test is a math test and not a reading test. (b) Translators to be used The translators you select must have a very good command of English and be used to the vocabulary and language level of Grade 5 pupils in your country. They should also have previous experience in this kind of translation. (c) Two forward translations and selected back translations For each instrument, it is suggested that there should be two forward translators. Each translator should undertake the translation separately and the final versions then compared. Where there are discrepancies, the two translators should attempt to compromise and produce one final version. Where they are unable to compromise the chief co-ordinator should decide on the final version 450 Vietnam Reading and Mathematics Assessment Study Appendix C: SCHOOL AND PUPIL FORMS This form must be returned to . . . ??? Province Grade 5 Reading and Mathematics National Study District Commune School Instrument 1 School Form Name of School Head: Name of School: Full Address: Telephone: 1. Total Enrolment of school (include all shifts and all campuses): 2. Total Enrolment of Grade 5 (include all shifts and campuses): 3. List of names of all Grade 5 math and reading teachers: 1.____________________________________ 2._____________________________________ 3.___________________________________ etc. 451 Vietnam Reading and Mathematics Assessment Study ID Province VIETNAM School Grade 5 Study of Reading and Mathematics Instrument 2 Office Use Pupil Name Form List of selected pupils for this school Pupil ID Pupil Name Sex Age Comments There is room for 20 pupils. If possible add the sex and age of each pupil. The comments column is to make comments on absence of pupil for all testing sessions or one of the sessions or any other comment on a particular pupil. 452 Vietnam Reading and Mathematics Assessment Study Revised Rules (province 2-5) Completed Data Re- noisiceD noissucsiD re Re- ent noits stnemurtsni Cleaning ggeuS kcehC and, e by r -line Clean data compute Entry miT Report - t Re Editing,a stnemurtsni kcehC Dat of ivr & Supe se entry revise Chart taaD ct Data Rule of of Flow Organize/ struni team owlF owlF 1.6: data itdE rosivrepuS maeT gnitidE nairarbiL rosivrepus maet ataD yrtne ataD Appendix 453 454 Appendix 1.7: Cleaning rules (Rules of Validation changes) V Rules of Validation Changes ---Aug. 17, 2001 ietnam Pbooklet Reading Question Validation Criteria Problem Cases Rule for change PQ2 PBYEAR No change and PQ13 PTRAVEL Change to 150 PQ27.02 PEXTUVIE 0<=PQ27.02<=10 PQ27.02=12 Check instruments, Mathematics PQ27.02=48 only when PQ27.02 + PQ27.03 > 20 PQ27.03 PEXTUMAT 0<=PQ27.03<=10 PQ27.02=15 Then, change the answers with > 10 in order to make PQ27.02=32 PQ27.02=PQ27.03<=20. Tbooklet Assessment TQ15 TDSKCH No change TQ19 TNPERIOD Change to 50 TQ20 TMINTEAC 30<=TQ20<=50 TQ20=0 Change to 40 TQ20=55 Change to 50 Study TQ21 THRSOUTS 0<=TQ21<=30 TQ21=35 Change to 30 TQ24 TEXHRS 0<=TQ24<=10 TQ24=20 Change to 10 SHquest SQ3 STRAVEL 0<=SQ3<=120 SQ3=180 Change to 120 SQ21 SYRBEGIN No change SQ22.1-6 SDISCLIN 1<=SQ22.1<=150 SQ22.1=0 Change to 1 SDISROAD SDISLIBR SDISBOOK SDISSECS SDISMAK SQ31.1a SSHFT2PU 0<=SQ31.1a<=1000 SQ31.1a=1865 Compare with Sform SQ31.2a SSHFT2PU 0<=SQ31.2a<=1000 SQ31.2a=1582 Compare with Sform SQ34 STINSPE 0<=SQ34<=10 SQ34=16 No change SQ35.2 STSMINSP 0<=SQ35.2<=10 SQ35.2=22 No change SQ35.3 STININSP 0<=SQ35.3<=10 SQ35.3=21 No change SQ35.4 STATINSP 0<=SQ35.4<=10 SQ35.4=12 No change SQ35.5 STAHINSP 0<=SQ35.5<=10 SQ35.5=29 No change SQ42.3 SCRMTEMP No change SQ44.01 SSQPLAYG 0<=SQ44.01<=9000 SQ44.01=9912 No change SQ44.02 SSQSPERM No change SQ46.1 STOILB 0<=SQ46.1<=10 Change to 10 SQ46.2 STOILG 0<=SQ46.2<=10 Change to 10 SQ46.3 STOILS 0<=SQ46.3<=5 Change to 5 General clearing rule ---consistency check--- Basis Questions Problem cases Rule for change Justifications Hand/ If..... ,must be Computer edit V SQ11 & SQ14 & Q11 is smaller than Q15 Change Q11 to the year Computer1 ietnam SQ15 answered in Q15 Q11>=Q15>=Q14 Q15 is smaller than Q14 Change Q15 to the year Computer1 Reading answered in Q14 SQ17.12 & Q17.12 is 1 but Q17.08is 2 Change Q17.08 to 1 Other devices can be run by Hand SQ17.08, Q17.08=1 batteries SQ17.19, Q17.12 is 1 but Q17.19 is 2 Change Q17.19 to1 Hand and SQ17.20 Q17.12=1 Q17.19=1 Mathematics Q17.12 is 1 but Q17.20 is 2 Change Q17.20 to 1 Hand Q17.20=1 SQ20 & SQ32 Q32 is 1 but Q20 is 2 Change Q20 to 1 Because Q32 requires exact Hand Q32=1 Q20=1 number of satelites Q32=2,3,4,5,6,7 but Q20 is 1 Change Q20 to 2 ditto Hand Q32=2,3,4,5,6,7 Q20=2 Assessment SQ24 & SQ25 & The numbers in Q24,Q25,Q26 Enter the median number Computer1 SQ26 Q24=Q25=Q26 are different among three SQ24 Total of (Q24.1, Q24.2, Q24.3 & Change Q24.4 in order to Because 'temporary female Computer1 Q24.1+Q24.2+Q24.3+Q24.4=Q24.5 Q24.4) is different from Q24.5 adjust to Q24.5 teacher' is most unsettled Study 455 SQ25 Total of (Q25.1, Q25.2, Q25.3 & Change 25.3 in order to Because the common academic Computer1 Q25.1+Q54.2+Q25.3+Q25.4=Q25.5 Q25.4) is different from Q25.5 adjust to Q25.5 level is upper secondary and standard teacher training is 12+2 SQ26 Total of Change 26.4 in order to adjust to Q26.7 Computer1 456 Q26.1+Q26.2+Q26.3+Q26.4+Q26.(Q26.1,Q26.2,Q26.3,Q26.4,Q26.5,Q2 V 5+Q26.6=Q26.7 ietnam 6.6) is different from Q26.7 SQ27 and Total of Q27.1 and Q27.2 is different Check 'School Form' and contact Computer1 SQ31a Q27.1+Q27.2= school again Reading Q31.1a+Q31.2a+Q31.3a+Q31.4a from the total of (Q31.1a, Q31.2a, Q31.3a, Q31.4a) SQ28 & Contact schools Computer2 Q28.1+Q28.2=SENRG5 Total of Q28.1 and Q28.2 is different SENRG5(School Form) from SENRG5 and Contact school Computer1 Q31.1b+31.2b+31.3b+31.4b=Q29Total of (Q31.1b+31.2b+31.3b+31.4b) Mathematics SQ29 and SQ31b is different from SQ29 Change Q35.1 to 0 Computer1 SQ33 & SQ35 Q33=1,2,3,4,5 Q35.1=0 Q33 is 1,2,3,4,5 but Q35.1 is not 0 Change Q35.1 to 1 Computer1 Q33=7 Q35.1>=1 Q33 is 7 but Q35.1 is 0 SQ34 & SQ35 Change blanks in Q35.1-7 to 99. Computer1 Assessment The total of Q35.1-7 is less than Q34 If there is no blank (missing value), Computer1 Q35.1+Q35.2+Q35.3+Q35.4+Q35. adjust 35.7 to Q34. 5+Q35.6+Q35.7>=Q34 The blanks in Q35.1-7 Enter 0 only if the total of Q35.1-7 is Computer1 equal to or bigger than Q34. If it is less Study than Q34, enter 99. SQ42 & SQ43 Change Q43.1 to the average of the Computer1 Q42.1>0 Q43.1>0 Q42.1 is not 0 but Q43.1 is 0 school Change Q43.2 to the average of the Computer1 Q42.2>0 Q43.2>0 Q42.2 is not 0 but Q43.2 is 0 school Change Q43.3 to the average of the Computer1 Q42.3>0 Q43.3>0 Q42.3 is not 0 but Q43.3 is 0 school Change Q47.13 to 1 Because these Hand SQ47.11 & Q47.13=1 Q47.11=1 but Q47.13 is 2 devices can not be SQ47.13 Q47.19=1 Q47.11=1 but Q47.19 is 2 Change Q47.19 to 1 Hand run without electricity SQ47.14 Q47.11=1 Change Q47.22 to 1 Hand SQ47.22 Q47.22=1 Q47.11=1 but Q47.22 is 2 SQ47.23 Q47.23=1 Q47.11=1 but Q47.23 is 2 Change Q47.23 to1 Hand SQ47.01 and SQ48 Q48=1 Q47.01=1 Q48 is 1 but Q47.01 is 2 Change Q47.01 to 1 Hand Q48=2,3 Q47.01=2 Q48 is 2,3 but Q47.01 is 1 Change Q47.01 to 2 Hand Appendix1.8: Schools suspected of cheating Contact district office Math Read Scrutiny Scrutiny Provincial isolate/rural/ Provincial Provincial Provincial (very good/ good/ schoolID Province Name Commune Name School Name mean Math SD mean Read SD Math Read Read small Math Mean Math SD Read SD average/bad) score score suspicious? suspicious? Mean town/city NS=National Standard 10505071 H T©y Tßng B¹t Tßng B¹t 52.0 0.51 39.6 0.99 No No 40.05 11.44 41.36 10.92 rural average 10521211 H T©y Liªn Ph-¬ng Liªn Ph-¬ng 57.4 0.99 55.7 0.66 No No rural good 10703271 H¶i D-¬ng ChÝ Minh ChÝ Minh 54.8 0.52 No 42.93 10.88 44.01 10.44 rural very good 10909111 H-ng Yªn §ång Thanh §ång Thanh 54.4 0.93 56.7 0.59 No No 43.04 10.26 43.76 9.97 rural good 11111351 H Nam H-ng C«ng H-ng C«ng 57.6 0.68 50.8 0.64 No No 35.75 10.03 38.38 10.56 rural very good (NS) 11301191 Nam §Þnh Ng« QuyÒn TrÇn Phó 51.7 0.80 No 42.21 9.51 44.03 8.87 city good V ietnam 11307251 Nam §Þnh Yªn Ninh Yªn Ninh 57.8 0.55 55.9 0.49 No No rural good 11313271 Nam §Þnh Xu©n Ph-¬ng Xu©n Ph-¬ng 52.0 0.56 No rural good Reading 11319411 Nam §Þnh H¶i S¬n H¶i S¬n 55.7 0.59 50.9 0.75 No No rural good 11507251 Th¸i B×nh Thôy V¨n Thuþ V¨n 59.0 0.00 59.9 0.31 No No 44.23 9.43 45.17 8.63 rural very good (NS) 11507731 Th¸i B×nh Th¸i Thä Th¸i Thä 56.6 0.89 No rural very good and 11511531 Th¸i B×nh Vò §o i Vò §o i 56.6 0.88 No rural very good 20103051 H Giang Lòng Có Lòng Có 54.8 0.89 No 36.78 11.99 36.2 11.72 isolate bad Mathematics 20103091 H Giang Lòng T¸o Lòng T¸o 51.1 0.32 No isolate bad 20119591 H Giang VÜ Th-îng VÜ Th-îng 52.5 0.69 No rural average 20119612 H Giang §ång Yªn §«ng Phong 55.9 0.49 No rural average 20311092 Cao B»ng Ngäc Khª H÷u B×nh 41.9 0.37 No 37.14 11.19 34.36 13.07 isolate average Assessment 20321011 Cao B»ng Lª Lai T©n ViÖt 57.2 0.88 No rural average 20509411 L o Cai B¶n Phè B¶n Phè 51.8 0.72 No 43.52 8.73 43.46 9.01 rural good 20513131 L o Cai San S¶ Hå San S¶ Hå 44.7 0.49 No isolate good Study 20701051 B¾c C¹n S«ng CÇu S«ng CÇu 55.3 0.97 No 36.05 11.41 35.84 12.15 city good 457 20713051 B¾c C¹n Thanh V©n Thanh VËn 52.9 0.85 No rural average 20917251 L¹ng S¬n Nh©n Lý Nh©n Lý 45.7 0.92 No 31.64 10.29 34.42 12.06 rural good 458 21111551 Tuyªn Quang Ninh Lai Ninh Lai 52.4 0.90 No 32.94 11.63 33.33 11.91 rural very good V ietnam 21305071 Yªn B¸i L©m Th-îng L©m Th-îng 50.6 0.68 No 34.56 12.34 35.73 13.01 rural bad 21305211 Yªn B¸i M-êng Lai M-êng Lai 53.9 0.45 54.6 0.83 No No rural average Reading 21311151 Yªn B¸i Kiªn Th nh Kiªn Th nh 1 59.8 0.41 No isolate bad 21501191 Th¸i Nguyªn T©n ThÞnh Lª V¨n T¸m 58.6 0.76 55.5 0.69 No No 40.81 10.82 41.87 11.35 city very good 21511031 Th¸i Nguyªn S«ng CÇu S«ng CÇu 50.3 0.97 No small town good and 21715171 Phó Thä H-ng Long H-ng Long 56.4 0.88 No 41.04 10.87 41.38 10.58 rural good Mathematics 21723491 Phó Thä La Phï La Phï 51.5 0.94 No rural very good (NS) 22305371 B¾c Ninh § o Viªn § o Viªn 53.3 0.80 No 47.54 9.24 46.99 8.78 rural average 22305451 B¾c Ninh Chi L¨ng Chi L¨ng 57.6 0.60 55.3 0.57 No No rural good 22309211 B¾c Ninh Tr¹m Lé Tr¹m Lé 57.0 0.39 No rural good(NS) 22313071 B¾c Ninh H-ng M¹c H-ng M¹c 1 58.6 0.89 56.6 0.99 No No rural very good Assessment 22315031 B¾c Ninh V¹n Ninh V¹n Ninh 56.3 0.97 No rural very good 22315071 B¾c Ninh §¹i Lai §¹i Lai 59.7 0.49 56.0 0.22 No No rural very good 22315211 B¾c Ninh Nh©n Th¾ng Nh©n Th¾ng 58.6 0.82 57.7 0.57 No No rural very good Study 22315291 B¾c Ninh Quúnh Phó Quúnh Phó A 59.0 0.22 57.9 0.31 No No rural very good (NS) 22511471 Qung Ninh Qung An Qung An 2 53.1 0.91 No 45.95 9.41 46.92 8.83 isolate good 22517131 Qung Ninh §«ng X¸ §«ng X¸ 58.5 0.69 No rural very good 22525371 Qung Ninh Liªn VÞ Liªn Vi 56.7 0.93 No rural very good 30307111 Sn La ChiÒng Khoang ChiÒng Khoang 55.8 0.55 No 37.48 12.41 37.18 12.22 isolate good 30315151 Sn La §øa Mßn §õa Mßn 54.2 0.77 54.8 0.64 No No isolate good 30511431 Ho B×nh Cao Th¾ng Cao Th¾n 50.0 1.00 No 33.68 12.82 34 12.76 rural good 30511591 Ho B×nh Thanh N«ng Thanh N«ng 54.0 1.00 No rural good 40111171 Thanh Ho¸ Sn §iÖn Sn §iÖn 56.4 0.75 No 39.5 9.88 39.46 11.04 isolate average 40121391 Thanh Ho¸ NguyÖt Ên NguyÖt Ên 55.6 0.75 51.8 0.55 No No rural good 40517531 H TÜnh H-ng Tr¹ch H-ng Tr¹ch 54.7 0.73 55.7 0.73 No No 42.79 9.73 43.3 9.87 rural average 40517532 H TÜnh H-ng Tr¹ch H-ng Phóc 57.9 0.31 No isolate average 40519111 H TÜnh Kú Phó Kú Phó 52.2 0.99 No rural average 40907211 Qu¶ng TrÞ Gio H¶i Gio H¶i 52.3 0.99 No 38.9 9.43 39.41 10.42 rural good (NS) 41107031 Thõa Thiªn HuÕ H¶i D-¬ng Th¸i D-¬ng 57.3 0.73 No 42.15 9.31 42.4 9.48 rural average 41115191 Thõa Thiªn HuÕ A Ngo A Ngo 54.9 0.45 44.0 0.56 No No rural average 50701091 B×nh §Þnh GhÒnh R¸ng Quang Trung 2 51.7 0.47 54.3 0.66 No No 38.72 9.42 38.96 10.69 city very good 50701171 B×nh §Þnh Lª Hång Phong Lª Hång Phong 53.1 0.83 No city very good 50713332 B×nh §Þnh C¸t Th¾ng C¸t Th¾ng 2 56.8 0.62 No rural good 60101151 Kon Tum Vinh Quang Vinh Quang 1 58.5 0.61 No 33.28 12.68 33.8 13.83 rural very good (NS) 60301132 Gia Lai Tr B¸ Tr B¸ 2 55.1 0.76 No 39.39 10.29 38.81 12.21 city good 60305091 Gia Lai Hi Yang Hi Giang 55.0 0.89 No isolate good 60507151 §¾k L¾k Phó Xu©n Minh H 50.9 0.79 No 40.44 10.03 40.88 10.88 rural good 60509151 §¾k L¾k Thèng NhÊt Thèng NhÊt 56.8 0.55 54.0 0.69 No No rural very good (NS) 70103171 Hå ChÝ Minh City B×nh Tr-ng §«ng NguyÔn V¨n Trçi 56.7 0.59 No 42.51 8.78 44.3 9.24 small town good V ietnam 70307051 L©m §ång L¹c Xu©n L¹c Viªn 57.8 0.41 No 40.53 10.15 40.29 11.17 rural very good 70307171 L©m §ång Tu Tra KamBtte 59.2 0.83 No isolate average Reading 70313131 L©m §ång Léc Ng·i PTCS Léc Ng·i 54.7 0.67 No rural good 70913131 T©y Ninh Long ThuËn Long ThuËn B 52.1 0.31 53.8 0.89 No No 35.58 8.76 37 10.34 rural very good (NS) 71103471 B×nh D-¬ng Thíi Hßa Thíi Ho 42.2 0.90 50.4 0.90 No No 38.87 8.99 41.98 9.71 rural good and 80317211 §ång Th¸p X· Mü An H-ng B MÜ An H-ng 3 43.3 0.97 No 33.99 10.73 36.19 11.21 rural good Mathematics 80901092 VÜnh Long Ph-êng 5 ThiÒng §øc 53.6 0.82 54.9 0.37 No No 36.76 9.81 38.55 11.29 city very good (NS) 80909311 VÜnh Long Ng·i Tø Ng·i Tø A 57.7 0.67 59.3 0.92 No No rural average 81509211 CÇn Th X· T©n Ph-íc H-ng T©n Ph-íc H-ng 43.2 0.75 No 32.16 9.12 34.23 11.41 isolate good 82103072 B¹c Liªu X· Ninh Hßa C Ninh Ho 53.3 0.85 No 31.39 11.49 32.93 12.13 isolate good Assessment 82103191 B¹c Liªu X· Ph-íc Long B Ph-íc Long 54.1 0.79 No rural good 82107171 B¹c Liªu X· Long §iÒn §«ng Long §iÒn §«ng A1 47.7 0.81 No rural good 82313131 C Mau T©n ¢n TiÓu häc 2 55.0 0.97 No 33.38 10.38 34.37 11.49 rural good Study 459 460 Appendix 1.9: Ambiguous Questions V ietnam Questions likely to be misinterpreted by pupils, teachers, and school heads Reading Question Variable Problem case Reason PQ4 PVIET Kinh pupils answer "No" or "sometimes" Some pupils answer whether they have conversation at home or not. and PQ13 PTRAVEL 150-180 minutes Some pupils answer the distance from home where their family live, but not from home where they currently resident. Mathematics PQ27.2 & 27.3 PEXTUVIE & 48 hours per week Some pupils answer the extra tuition hours combining the regular class hours in school. PEXTUMAT TQ5 TED Chose "completed primary" Some teachers answer the level which they are currently teaching. TQ15 TDSKCH Only 1 or 2 students have chair and desk Some teachers answer how many desks and chairs in classroom, but they do not answer how many Assessment pupils have desks and chairs. In most schools, one desk can accommodate several students. TQ21 THRSOUTS 50 hours per week Some teachers answer preparation hours including teaching hours. TQ24 TEXHRS 40-50 hours per day Some teachers answer extra teaching hours per week. Study SQ3 STRAVEL 150-180 minutes Some school heads answer the distance from their hometown. SQ8 SEDUC "completed primary education" Some school heads answer the educational level at which they are currently teaching. SQ22.1-6 SDISCLIN etc. 150-180 km Some school heads answer distance in "meter", but not in "km". Some school heads answer the biggest library or clinic in their province, but do not answer the nearest ones. SQ25.1 SNPREDT 60 teachers are "completed primary education" Some school heads answer the number of teachers in their school. Some school heads do not answer the FINAL educational qualification of teachers. SQ31.4a & b SSHFT4PU & The pupil number in full day is the total pupil Some school heads answer the TOTAL number of other shift because "full day" is placed at the SSHFT4CL number of morning shift and afternoon shift. bottom in the questionnaire. SQ35 STFINSP etc. 78 times inspection Some school heads answer the number of teachers who were inspected. If there are 39 teachers at a school, they answer 39 inspections * 2 times inspection =78. SQ46 STOILB etc. 16 toilet rooms Some school heads answer the number of toilets, but not the number of toilet rooms. One toilet room possibly have several toilets. PQ= Pupil booklet TQ= Teacher booklet SQ=School head questionnaire Vietnam Reading and Mathematics Assessment Study Chapter 2 PUPIL TEST DEVELOPMENT AND CALIBRATION T his chapter outlines the procedures for developing the pupil achievement tests. In all, four tests were developed: a pupil mathematics test, a teacher mathematics test, a pupil reading comprehension test, and a teacher reading comprehension test. The following sections in this chapter deal with the development and interpretation of these achievement tests: 1. Curriculum mapping 2. Specification and blueprints 3. Item selection and drafting 4. Panelling 5. Test Trials 6. Test Administration 7. Test Calibration (Item Response Modelling) 8. Test scores 9. Interpreting the tests: Competence Levels 10. Interpreting the tests: Benchmarks 11. Combined reading and mathematics tests Curriculum mapping An early step involved a curriculum mapping to identify those elements of curriculum outcomes that were considered important and which would, in all likelihood, remain after the intended curriculum change process. A new mathematics curriculum was introduced in the Grade 2000. It was important that the monitoring program would enable the MoET (Ministry of Education and Training) to include the new curriculum in its goals, and to establish 1This chapter was written by Patrick Griffin, Assessment Research Centre, Faculty of Education, The University of Melbourne 461 Vietnam Reading and Mathematics Assessment Study baseline achievement data so that the effect of the change in curriculum could later be established. For this reason the objectives were selected so that there was sufficient coverage of both the old and new curricula. A similar approach was taken with the development of the reading tests. Language assessment in Vietnamese primary schools It is important to point out that this Grade 5 study has represented a shift in assessment practice in Vietnam schools. This has been especially the case in the assessment of reading comprehension. Assessing it in isolation from other language skills and competencies is unusual in Vietnamese primary schools. A one-off test such as this one and also a multiple-choice test would not be part of the normal experience of Grade 5 pupils. Assessment in the subject 'Vietnamese Language' is normally carried out in accordance with the MoET Circular 15/GD-DT dated 2 August 1995. The way in which language has been assessed until this survey has been described in Appendix 2.1. It will be seen that the form of assessment used in the study represents a new form of assessment and its narrow focus is also somewhat new. Primary Level Mathematics in Vietnam The objectives of the mathematics curriculum in Vietnam were similar in some ways to the objectives of other national mathematics curricula. Despite this, the content of the Vietnamese curriculum differed in many significant ways because of the strong emphasis on calculation in number and routine operations. There was also an introduction to space and geometry through structures recognition and calculation in the middle primary Grades. The new curriculum in 2000 also became a vehicle for changes in teaching methods and pupils were encouraged to learn in a more active manner. It also implies that teachers were encouraging pupils to learn this way. There was an attempt to shift away from hand calculation as an end in itself, but this was dependent on the adoption and provision of resources to enable this change to take place. There was also an intention to introduce more open 'problem solving' into the mathematics curriculum. The new curriculum emphasises arithmetic, quantities and measuring, geometry and problem solving. These four strands continue throughout primary education from Grades 1 to 5. In Grade 3 some data/statistics are introduced and this, too, continues to the end of the primary curriculum. Lessons follow the prescribed textbook intensely and teachers stress the rote learning of facts. 'Problem solving' is generally interpreted in terms of the number of steps in a mathematics task and pupils are taught the sequence of steps involved in 462 Vietnam Reading and Mathematics Assessment Study problem solving. Exploration and generic approaches to identifying problem solving strategies are not generally taught. The structure of the mathematics test for Grade 5 therefore emphasised number, measurement, geometry and statistics. Problem solving was embedded throughout the test, using the content of the strands or sub- domains of mathematics, and emphasised items with two and three steps involved in their solution. Geometry and statistics were combined for the purposes of the test construction. Early Literacy Education in Vietnam Literacy education involves the formal and structured teaching of a combination of writing, reading, grammar, vocabulary and spelling using set textbooks for each grade at the primary education level. The textbooks in language have been structured so that pupils only do writing tasks, for example, on a certain topic after they have read about it, learned relevant spelling, grammar points, and the necessary vocabulary for that topic. These background facts and skills are then practiced in writing tasks. This integrated approach to literacy education means that reading literacy focuses a lot on familiar folk stories in the narrative genre. Children learn how to read and write individual letters and words at Grade 1. At Grade 2, reading and writing focus on sentence and paragraph level texts. Each topic is covered in one class, mostly in the form of asking pupils to answer questions based on a picture. Grade 3 teachers spend two class periods on each topic. In the first class period, the focus is on an oral telling of stories that the pupils have already read and learned, and the second class period is for pupils to re-tell the same stories in their own words. Creativity is encouraged at this level and is in the form of use of synonyms and paraphrasing. At Grades 4 and 5 an important development occurs. The mode of teaching shifts from 'practising' to 'developing'. Two or three class periods are spent on each topic, with the third class period being used for comments on writing tasks that the pupils have done. Pupils in Grade 4 read longer and more complex materials and are given lessons in the basic story structure, consisting of an introduction, body and conclusion. In writing, cohesion is emphasised by practising the use of connecting words and ideas and applying chronological or other logical sequences. Reading lessons commonly include texts that include the description of an object, an animal or a landscape. This introductory expository text exposes the pupils to a new expository genre. At 463 Vietnam Reading and Mathematics Assessment Study Grade 5, creative writing extends to a requirement for lively, expressive essays and these in turn are encouraged through exposure to a similar style of narrative text. The subjects are generally restricted to people and activities in everyday contexts. Integrated lessons link reading and writing through exercises involving the use and development of compound sentences, figures of speech linked to comparison, association and so forth. Before each writing task, teachers help their pupils construct a detailed outline for the essay. In the "comment class", teachers do indeed 'comment' on the essay structure, cohesion, and use of vocabulary and syntax, and help pupils to correct the commonly found errors in their writing. In reading classes pupils are encouraged to read aloud and teachers "comment" on their expression, pronunciation and reading errors. In this approach to reading and writing in Vietnam's schools, there is a reduced emphasis on documents such as advertisements, tables, charts, maps and so on. In the new curriculum, these aspects of reading and writing were to be introduced. This study included document literacy in the reading test and the results were expected to be used in forming and reforming the curriculum in the mother tongue. As a consequence, the reading test included three dimensions or strands- narrative, document and expository text types. Specification and Blueprints The first step in deciding what the tests will measure is to analyse the curriculum and select the learning outcomes and objectives to be assessed in the tests. Test development panels, one for reading and one for mathematics, identified common objectives across the old and new curricula. The test development panels consisted of subject-matter specialists mostly from the National Institute of Educational Sciences. A table of specifications (or blueprint) for the test was then constructed for each subject. These were the blueprints for the test design and determined the selection of learning outcomes and in turn the items and source materials. Pupil achievement was measured towards the end of their time in Grade 5. It was, in a way, the culmination of learning that had taken place up to the end of Grade 5 and represented an indication of the state of learning at the end of primary education in Vietnam. The pupils were to be administered tests in reading and mathematics. The achievement outcomes desired were: a total score levels of reading and mathematics competence required for success if they were to become independent learners in junior secondary school 464 Vietnam Reading and Mathematics Assessment Study (Grade 6 is the first grade of junior secondary school) levels of reading comprehension and mathematics which it was deemed all people in Vietnam should minimally possess if they were to be regarded as independent citizens. The levels were not to be presented as scores, but as descriptions of the skills, knowledge and competencies that a person, regardless of age or education level, ought to demonstrate. These different aspects of reading and mathematics have been presented below, as well as the way in which the levels of mastery were arrived at. The curriculum implications of these levels, as well as the competence levels in general, feature as important aspects of learning and teaching in Vietnamese primary schools. In the following sections an explanation of the structure of the tests has been given. This has been followed by a description of how the subject matter specialists at the MoET identified the cut-scores on the test that corresponded to the two independent competency levels in the two learning areas. The structure of the reading test The tests were designed to provide a valid measure of basic reading comprehension skills for Grade 5 pupils in all of the 61 provinces participating in this study. Reading specialists from the NIES also reviewed the test items to ensure that they conformed to the national syllabus. The items were trial tested in Thanh Hoa province and subjected to both a classical item analysis and a Rasch scaling. The final test consisted of 60 items. Of the 60 reading items, 10 items were in common with the teacher test. Five items on the pupil test were set to be very easy in order to cater for those pupils who had not developed reading, or even word knowledge, skills. These items were to be drawn from previous projects focusing on Grade 3 achievement tests. Five items were to be set to cater for very high achievers. These items should be new and should involve linking information from a range of contexts to provide material for inference. A further 30 items were expected to cover the levels and the content domains as specified in the blueprint provided below. All items were presented in a set, multiple choice, format using four alternatives with one correct answer. It was proposed that the pupil test consist of items assessing reading comprehension over six levels of difficulty or competence. The levels of reading were specified to focus on three types of text, in the domains of narrative, expository and document literacy. These levels therefore formed the proposed dimension that underpinned the reading test. These levels were 465 Vietnam Reading and Mathematics Assessment Study influenced by the analysis of the Grade 5 tests in the previous five-province study (Griffin, 2000). Level 1: Pupils at this level can link words and pictures where the pictures depict common objects of a 'concrete' nature. Level 2: Pupils at this level can link words to more abstract concepts such as propositions of place and direction and, perhaps, ideas and concepts such as comparatives and superlatives (happiest, biggest, larger, etc.). Level 3: Pupils at this level can link words from one setting to words in another setting (such as a short, sentence length text), where there was a word match. Level 4: Pupils at this level can deal with longer text passages, containing a sequence of ideas and content, where understanding is based on an accumulation of information by reading forward through the text. Level 5: Pupils at this level had reached a level where they can search backwards or forwards through a text seeking confirmation of understanding or linking a piece of information to ideas or information previously encountered. Level 6: Pupils at this level could link ideas from separate parts of a text and demonstrate an ability to infer an author's intention. Figure 2.1: Proposed development of reading to underpin the test development The 60-item test covered three main domains of reading as follows: 1. Narrative. Based on continuous texts in which the aim was to tell a story, whether fact or fiction. 2. Expository. Based on continuous texts that were designed to describe, explain, or otherwise convey factual information or an opinion to the reader. 3. Document. Based on structured information presented in the form of tables, maps, graphs, lists, or sets of instructions. The pupils were requested to search, locate, and process selected facts rather than read every word of a continuous text. Finally, consistent with the usual restrictions placed on materials in the reading curriculum, the tests contained 13 reading passages ranging in length from 100 words to 350 words and were mainly narrative folk tales or expository texts dealing with common tasks and procedures. The texts 466 Vietnam Reading and Mathematics Assessment Study presented a mixture of familiar and unfamiliar materials to the pupils and teachers for comprehension exercises and matched the type of reading materials used for reading and writing instruction in Grade 5. The structure of the reading test has been summarised in Figure 2.2. How constraints such as word limits, text level, item numbers and the number of prompts) were used in establishing a blueprint for the test has also been shown. How the design specifications were translated across levels and domains has also been summarised. In the first row the three domains of text types have been identified and in the first column the levels within each Levels Narrative Expository Documents Level 1 -- -- Word/picture association involving nouns and/or adjectives requiring the simple linkage of a picture to a word in order to answer the question 3 3 Level 2 Word/picture association Word/picture association involving Word/picture association involving involving positional or directional positional or directional prepositions positional or directional prepositions prepositions requiring the linkage requiring the linkage of a picture to a requiring the linkage of a picture to a of a picture to a position or a position or a direction in order to position or a direction in order to direction in order to answer the answer the question answer the question question 1 1 1 3 Level 3 Recognising the meaning of a Recognising the meaning of a single Linking simple piece of information to single word and being able to word and being able to express it as a item or instruction express it as a synonym in order synonym in order to answer the to answer the question question 6 5 2 13 Level 4 Linking information portrayed in Linking information portrayed in Systematic search for information sequences of ideas and content, sequences of ideas and content, when when reading forward when reading forward reading forward 8 10 2 20 Level 5 Seeking and confirming Seeking and confirming information Linking more than one piece of information when reading when reading backwards through text information in different parts of a backwards through text document 4 5 3 12 Level 6 Linking ideas from different parts Linking ideas from different parts of Use of embedded lists and even of text. Making inferences from text. Making inferences from text or subtle advertisements where the text or beyond text. beyond text. message is not explicitly stated 5 2 2 9 24 23 13 60 Prompts 1,5,7,11,12 3,6,8,9,10 2,4,13 Figure 2.2: The structure of the reading test (dimensions and item allocation) 467 Vietnam Reading and Mathematics Assessment Study dimension have been defined. The number of items in each dimension at each level has been embedded within each cell of the matrix. The reading specialists panel selected or wrote all items in the reading tests and defined the cognitive skills and reading strategies involved in correctly answering each item. This latter step was undertaken in order to identify the skill levels referred to above. The test blueprint shown in Figure 2.2 was drafted as terms of reference for the item writers, who then selected prompts and matched them as closely as possible to the descriptions in the blueprint. They then wrote items to match the skills and competencies defined in the blueprint, while maintaining links to the Grade 5 reading aspects of the Vietnamese language and literacy curriculum. The structure of the mathematics test The mathematics test also consisted of 60 items covering three domains of mathematics. The mathematics item writing team, composed mainly of curriculum specialists from the National Institute for Educational Science (NIES), designed the test to provide a measure of basic mathematics achievement consistent with the curriculum analysis provided earlier constructed the tests. The specialists from the NIES also reviewed the test items to ensure that they conformed to the national syllabus. The items were trial tested in Thanh Hoa province and subjected to both a classical item analysis and a Rasch scaling. The final test consisted of 60 items. Of the 60 items, seven items were in common with the teacher test. Five items in the pupil test were set at levels that were very easy, in order to cater for those pupils who had not developed mathematics, or even basic number, skills. These items were to be drawn from previous studies focusing on Grade 3 achievement tests. Five items were to be set to cater for very high achievers. A further 30 items were expected to cover the levels and the content domains as specified in the blueprint provided below. All items were presented in a set format using four alternatives with one correct answer. It was further required as part of the item writer terms of reference, that the pupil test consist of items assessing mathematics competence over six levels. The levels of competence were specified to focus on the three areas of 'number', 'measurement' and 'space and data'. These levels therefore formed the proposed dimension that underpinned the mathematics test and were influenced by the analysis of the Grade 5 tests in the previous five-province study (Griffin, 2000). In specifying the test blue print, levels were expressed in terms of the item difficulty that item writers were expected to produce (see Figure 2.3). The items were set to assess mathematics achievement at six levels and 468 Vietnam Reading and Mathematics Assessment Study Level 1: Pupils at this level should complete tasks that involved the linking of patterns or shapes to simple digits. This is the easiest level of development and is likely to underpin all others. Level 2: Pupils at this level should complete tasks that require recognising and naming basic shapes and units of measurement as well as undertaking single operations using up to two digit numbers. Level 3: Pupils at this level should complete tasks that assess skills in the previous levels and recognise simple fractions in both numerical and graphic form. Identification of data in tabular form and basic calculations associated with simple measurement units. Basic understanding of numeration with simple computations would be expected. Level 4. Pupils at this level should complete tasks that extend and complete number patterns, translate shapes and patterns and to convert measurement units when making simple one step calculations. Level 5: Pupils at this level should complete tasks that require combining operations in order to link information from tables and charts in performing calculations. This also applies to measurement units. Two and three step problems should be set where the first step may be the identification of appropriate information to use in subsequent steps of the computation. Level 6: Pupils at this level should complete tasks that emphasise data interpretation and computation linking data from tables and graphic displays in order to undertake computations involving several steps and a mix of operations. Figure 2.3: The Proposed levels of competence in mathematics across the three dimensions described above, but not all levels were expected to be evident in every dimension of the test. The 'Number' and also the 'Space/Data' tasks were not expected to have six levels each. Measurement was expected to have four levels. Distractors for the items were written to enable diagnosis to be undertaken. The dimensions were defined as: Number: Operations and number line, square roots rounding and place value, significant figures, decimals, fractions, percentages and ratio. Measurement: measurement of distance, length, area, capacity, money and time. Space and Data: geometric shapes and bar, pie and line graphs and tables presenting data describing common phenomena for Grade 5 pupils. These domains and an extended version of Bloom's knowledge, 469 Vietnam Reading and Mathematics Assessment Study comprehension and applications levels were operationalised have been presented in Figure 2.4, together with the numbers of items for each cell of the blueprint. Once again a series of levels were defined in order to assist item writers to spread the difficulty levels of the items in the mathematics test and to ensure that a full range of mathematics competence levels was established. Number Measurement Space/data Level 1 Number recognition, linking patterns to numbers. 6 0 0 6 Level 2 Single operations using two digit Recognise units of measurement Linking of patterns and graphs to numbers single digits. Recognise and name basic shapes 4 4 2 10 Level 3 Simple fractions Basic calculations with simple Identify data in tabular form measurement units 4 4 4 12 Level 4 Extend and complete number Convert measurement units when To translate shapes and patterns patterns, undertaking one step computations 4 4 4 12 Level 5 Combining operations in order to Two or three step operations as in Combining operations in order to link link information from tables and Number, using measurement units information from tables and charts charts when performing calculations and conversion when performing calculations 4 4 4 12 Level 6 Combining operations in order to Combining operations in order to Linking data from tables and graphs in undertake computations involving undertake computations involving order to undertake computations several steps using a mixture of several steps using a mixture of involving several steps and with a operations and combinations of operations and translation of units mixture of operations fractions, decimals and whole numbers 3 2 3 8 Total 25 18 17 60 Figure 2.4 Description of the levels of mathematics competence Teacher tests Two tests were also devised for teachers. Basically, all that was required of the teacher tests was that they produced a good range of achievement for teachers and that the difficulty of the tests overlapped with the pupil tests. Several items were selected to be common with the pupil tests so that the results could be directly compared. In the mathematics test there were six items common to the teacher and pupil tests. In the teacher-reading test, ten items were common with the pupil test. Because the pupil curriculum was not important in terms of measuring the teacher reading and mathematics 470 Vietnam Reading and Mathematics Assessment Study achievement, the structures of the teacher tests were not dominated by the Vietnamese curriculum. The tests were devised to ensure that they were not so difficult that teachers could not complete them, but at the same time they were not so easy that teachers felt affronted by taking such a test. The teacher tests consisted of a total of 45 items each and were analysed as a single measurement. This was not the case with the pupil tests. A full description of the teacher tests' development has been given in Chapter 3. Item selection and drafting Because of the number of pupils in the proposed sample, it was decided that only multiple choice items would be used because of cost and of the ease of scoring and recording answers. There were several sources of items that were taken into account. Some publicly released items from the IEA-TIMSS study were modified to match the Vietnamese curriculum. Permission was obtained from the Southern Africa Consortium for Monitoring Education Quality (SACMEQ) management committee to use a selection of items from the SACMEQ tests. Other sources of items consisted of the tests used in the Five-Province Study and various other publicly released items and tests. In all, the item writers in mathematics did not have to write a large number of new items because of the global nature of the mathematics curriculum and the ease of modifying items to match the local curriculum. In reading comprehension, however, there was a greater demand placed on the item writers. The reading comprehension test was developed under greater local constraint than the mathematics test. Reading curriculum in Vietnam is integrated with all other language components of the curriculum. Panelling Panelling is a process by which specialists examine the items and offer constructive advice to improve the items. It is usually carried out in a meeting format during which the specialists examine the items and make notes that are passed to the test developer and discussed with the item writers. Specialist groups from the MoET, the NIES and from schools were recruited to panel or scrutinise all items. In reviewing the items the panel looked for obvious gender, race, culture and other forms of bias and sensitivity. The process of item review involved the considerations listed below and in addition, international test specialists reviewed each of the items that were prepared for the trials. Cognitive demand (knowledge, comprehension and application) Strand (Number, Measurement and Geometry in Mathematics and 471 Vietnam Reading and Mathematics Assessment Study Reading, Writing, Spelling and Word use in Vietnamese) Test balance Curriculum relevance Probability of a correct response by Grade 5 pupils Diagnostic value and interpretation of each distractor. Test Trials Three rotated forms of the pupil tests and one form of the teacher test were trialled. Both classical and Rasch analyses of the test data were undertaken as well as differential item functioning analysis. (DIF). As a result of this, it was possible to identify poorly performing test items and in the mathematics test a total of 76 items remained after the initial analysis and review. Of these, there were 38 Number items, 17 Space and Data items and 19 Measurement items. Recommendations for deletion were made but the final decision regarding the omission of items was left to the Ministry of Education and Training representatives who were advised to take into account the substantive interpretation of the test construct and the curriculum value of the item set. A meeting was then held by Ministry officials who reviewed the entire item pool in order to select the final test set. The teacher mathematics and reading tests was trialled as a single trial form. The mathematics test was anchored to the pupil test at five points. The test was relatively easy for the teachers. The mean logit estimate for the teacher sample of 2.83 indicated that the average teacher ability was well above the difficulty of the test and well above the pupil ability level. A similar result was obtained with the teacher reading test using 10 anchor points. The ease of the tests was not considered to be a bad thing as the test would then not be threatening. However it was recommended that the item order be adjusted to reflect the increasing difficulty throughout the test and to reflect the order shown on the variable map, which is a chart that illustrates the relationship between the item difficulty distribution and the pupil achievement distribution. The teacher test items, although being relatively easy, worked well in terms of fitting the model to the data. It was clear however that two of the misfitting items were link items with the pupil tests and these had to be retained in the final pupil test as well as in the teacher test. The only change recommended to the teacher test was a change in item order, and the addition of some more difficult items at the end of the test. Summary statistics for the trial tests have been presented in Table 2.1. It can be seen that almost 300 items were trialled. Using samples of just under 400 pupils and 200 teachers, item and person separation indices were employed as indices of reliability as they provide information beyond the classical equivalents. The person separation index can be regarded as an index of criterion validity and the item separation index can be regarded as a measure of construct validity. Information has also been provided about the overall fit of the data to the Rasch simple logistic model using the Infit and Outfit measures. These have been explained in more detail in later sections in this chapter. 472 Vietnam Reading and Mathematics Assessment Study Table 2.1: Summary Results of the Test Trials Sample Item Case Item Case Test Items size Mean Reliability Reliability Infit Outfit Infit Outfit (SD) (SD) (SD) (SD) (SD) (SD) Pupil Reading 110 396 N Zni.Wni(1.39) Wni = (xni - pni)20.91N Wni N N 2 /1.34 0.96 / 1.00 0.95 0.99 0.97 (.11) (.23) (.14) (0.41) i=1 i=1 i=1 i=1 Pupil Mathematics 109 394 0.08 0.98 0.88 0.99 1.03 1.00 1.03 (0.83) (0.11) (0.28) (0.17) (0.43) Teacher Reading 45 197 2.83 .94 .70 1.21 1.10 1.01 1.10 (.31) N (0.88) N (1.44) (0.95) (0.10) (0.83) vi = (xni - pni)2 /N W ni Teacher Mathematics 45 199i=1 2.94 .94 i=1 0.65 1.03 0.96 0.99 0.96 (0.50) (0.77) (0.26) (0.44) (0.15) (0.60) The teacher tests were anchored to the pupil tests and a mean item difficulty illustrates the difference in difficulty level between the two tests. It was noted that the difference in ability between teachers and pupils exceeded the difference in the mean difficulty levels of the two tests. The teacher test was harder than the pupil test, but the teachers were even more able than the test suggested. Given that the testing of teachers is a sensitive issue, it was decided to maintain the teacher test broadly as it was, but to add a small number of items that were based on more demanding passages. Test Calibration (Item Response Modelling) In calibrating the tests, the important thing was that measure of "achievement" or "ability" must be valid across all participating pupil and teacher sub groups. This was not only possible but imperative for meaningful interpretations of achievement whether they were at a national or provincial level. Identifying the variable relied on calibration procedures both within and between samples and this could be achieved with the simple logistic Rasch model (SLM), although other international studies such as TIMSS used a more advanced variation of this and TIMSS-R used a multiple parameter model. The simple logistic model has been shown to predict accurately both the behaviour of test items and of persons. Other, more complex models have consistently given theoretical and practical difficulties. For example, when guessing and discrimination are used as additional parameters lengthy computations are required to score the test and the simple one-to-one relationship with the raw score is lost, but it is possible that these parameters might help to shed light on inter and intra provincial and regional differences in item behaviour. However for the main study the explanation of differences in achievement formed the main focus. Hence, emphasis was placed on the one parameter model for calibration and interpretation purposes. What the Rasch SLM imposed however was a test design that had 473 Vietnam Reading and Mathematics Assessment Study a proposed dominant underlying variable which was then operationalised in the items. That is, the items were deliberately selected according to their contribution to the interpretation of the construct. This was also the requirement of the TIMSS and TIMSSR test designs. According to the SLM, the probability of a given response to an item does not depend on which individuals attempt the item but on the pattern of responses given. The model does not depend on which items make up the test, or the order in which they appear, or on the responses to preceding items on the test. The SLM assumes that the individual's response to an item is conditioned only by ability to answer questions in the content area of the test, and not by motivation, guessing tendency or any personal attribute other than the ability in the domain of interest. The model assumes just one item parameter (difficulty) and a single person parameter (ability). The ability and difficulty parameter estimates are mapped onto a single interval scale. Both parameters are measured in the same units called logits. The single scale enabled both persons and items to be placed on the same continuum defining an underlying variable and the underlying variable to be interpreted in terms of the skills required for the pupil to make a correct response. The items on the test items were scored right or wrong using a dichotomous score of one or zero respectively. Scoring each item in this manner treats them as independent dichotomous items, in which each pupil, n, has an ability n and each item has a difficulty parameter 1, 2, 3... representing the difficulty of attaining a score of 1, on each of item 1 to k. Each of these parameters governs the likelihood of a pupil with ability, n, obtaining a score of 1 rather than 0. The analysis models the relationship between the pupil ability and the difficulty parameters of each of the items. Given that the set of items on each test have variable maximum score of 1, the Rasch simple logistic model, using the computer program Quest (Adams and Khoo, 1995) was employed to derive the estimates of item difficulty and person writing ability. The probability of the correct response was obtained by... Pr{x =1\n,i} = e(n-i) 1+ e( n-i ) These probabilities (that the score was x=1 for an item i) enabled the estimates of the ability n and the difficulty parameters i to be obtained. These estimates were then simultaneously plotted on a chart called a variable map which illustrates the relative position of the pupils against the difficulty levels assigned to each of the test items. These have been shown in Figure 2.5. When the pupil ability was at the same level as the item difficulty then the odds that the pupil would score xi=1 for the item were 50/50. This is an important characteristic of the Rasch analysis and allowing the ability and difficulty parameters to be mapped onto the same scale has important implications. These have been discussed in the following sections. Two measures of accuracy of the test procedure were used. The first was the 474 Vietnam Reading and Mathematics Assessment Study measure of the standard error of measurement for each of the item difficulty estimates. The second was a measure of the extent to which the data fitted the Rasch model. This measure is the mean squared differences between the estimated (or modelled) difficulty and the observed difficulty of each score point, weighted by the variance of the assigned scores. This is called the INFIT mean square and this stands for the Information Weighted Mean Squared residual goodness of fit statistic. The expected value of the INFIT is 1.0 and accepted range of these values lies between 0.77 and 1.30 (Adams and Khoo, 1995) and when the items sets are all within these limits, this is taken as evidence of a single dominant dimension underpinning the test performances of the pupils. Fit to the model The expected outcome of a person-item interaction (E) is expressed as the probability 'pni' of an observed score 'xni' by person 'n' to item 'i'. The person indicator, 'n', can take the values from n=1 to n=N, the number of people taking the test. The item indicator, 'i', can take the values from i=1 to i=k, the number of items on the test. The variance of the expected scores is obtained by Wni= pni (1- pni) and the standard deviation of this is Wni The difference between the observed (O) and expected scores is given by (O- E) = (xni - pni). These differences are standardised in order to obtain a Z-score by dividing the difference by the standard deviation of the estimates. The square of this value is distributed approximately as Chi squared. The average of these squared values (over all people or over all items) is called the OUTFIT, which stands for the "outlier sensitive mean squared residual goodness of fit statistic". Zni = xni - pni Wni Z N N 2 xni - pni)2 / N ni / N = ( n=1 n=1 Wni The OUTFIT for items is averaged over N persons as shown above and the OUTFIT for persons is obtained by averaging over k items as shown below. Z k k 2 xni - pni)2 / k ni / k =( i=1 i=1 Wni There is a problem with using OUTFIT however because it is sensitive to extreme scores. That is unexpected person-item outcomes tend to affect OUTFIT measures. To counter these effects the mean squared difference between the Observed and Expected scores or residuals (xni-pni) is weighted by the variance Wni. This is called the "Information weighted mean squared residual goodness of fit statistic." The short name for this statistic is the INFIT and it is calculated in much the same way as the outfit except that. 475 Vietnam Reading and Mathematics Assessment Study Z N N N N 2 ni.Wni /W ni= (xni - pni)2 / W ni i=1 i=1 i=1 i=1 which gives the weighted INFIT test of fit N N vi = (xni - pni)2 /N W ni i=1 i=1 for item i when it is averaged over N persons, and k k vn = (xni - pni)2 /k W ni i=1 i=1 for person n when it is averaged over k items. Both fit estimates are sensitive to sample size. Obviously with such a large sample, very few items or persons will be shown to misfit and the guidelines for the upper and lower values will be difficult to obtain. Both INFIT and OUTFIT expect a value of 1.0 when the model fits the data. Adams and Khoo (1995) have pointed out that..."a fit mean square of (1±x) indicates a 100x percent more variation between the observed and model predicted response patterns than would be expected if the model and the data were compatible." (p.92). Fit is useful for investigating how accurately the model can be used to predict performance. The relationship between ability and performance should be such that as ability increases, the chances of success on each item also increases. When the relationship between ability (or difficulty) and performance breaks down, the fit statistic indicates the extent to which the relationship has been lost. If the loss of relationship is severe, then the same person-item interaction pattern is not operating on the item or person in question as for the other items and person interactions. If the loss of the relationship is repeated over many persons, then the item may not be acting as an indicator of either the item or person location on the variable that is being measured by the test. The question, then, is whether or not the item should be excluded from the test. Usually misfitting items are omitted from the test for the purposes of calibration, but whether they are omitted from the test permanently has not been resolved. Perhaps items not should be omitted from the test on this basis alone unless it was obvious that there was a problem with the item on substantive grounds. Serious misfit can almost always be understood and usually indicates an unanticipated problem, mostly with the quality of the item or its interaction with a specific context. Users of the Rasch model have sometimes been accused of letting the model determine and define domains for them, but this is seldom the case. It is often the case that the problem is 476 Vietnam Reading and Mathematics Assessment Study with the conceptualisation of the construct but that is not the case in this study. Item misfit can be an indication that performances in the domain, as it was originally conceptualised, cannot be summarised in a single number"". (Masters, 1998,p1.) Item misfit is sometimes an indication that performances in an area which, although originally conceptualised as one domain, must be reported on more than one dimension. Messick (1993) has argued that all the important parts of a domain usually must be assessed. But this is not to say that performances in all parts of a domain must be summarised in a single number. This project and another (SACMEQ) have presented some unique data analysis and calibration problems. A comparison of the calibration results using the computer programs RUMM2010 (Andrich, 2002) and Quest (Adams and Khoo, 1995) has shown that even misfitting persons may have to be removed from the analyses for the purposes of calibration and then included again for scoring. Reliability Estimates Traditional approaches to reliability estimation assume a classical measurement model. In this approach it is assumed that the raw score is composed of two components, a true score and an error component. Cronbach's approach is to calculate the ratio of the true score variance to the total variance and this is classically known as reliability. It is usually called the Cronbach Alpha. It is calculated using k 2 k 1- i Rx = i=1 k -1 2 where k is the number of items and s2 is the variance of observations over all pupils. Both Cronbach and Rasch separation indices are estimates of the ratio of 'true' measure variance to the 'observed' measure variance. With a Rasch estimate of reliability allows an investigation of two measures. The first is the pupil standard deviation with measurement error removed and the second is the average precision of the pupil measures. The ratio of the adjusted standard deviation (with the error removed) to the average precision (that is the mean measure standard error) is called the separation index. In the case of the Rasch reliability a separation index can also be devised and the interpretation of these has been discussed by Wright and Masters (1982) who showed that the item separation index can be used as an index of construct validity and the person separation index can be used as an index of criterion validity. 477 Vietnam Reading and Mathematics Assessment Study SEk 2/ k Rx =1- i=1 2 Wright (2001) described reliability as a 'cryptic index because it amalgamates the distribution of the sample and the measurement characteristics of the test into one correlation reporting repeatability (not quality)'. He focused interest on two statistics. The first was the pupil standard deviation with measurement error removed (the sample error adjusted standard deviation). The second was the mean precision of the pupil measures (the mean standard error). The ratio of the sample error-adjusted S.D. to the average measure S.E. is the called the "separation" and it provides an insight to the validity of the inferences based on the test scores. Establishing validity using Rasch modelling. Wright and Masters (1982) showed that separating the items and identifying the skills underpinning each item could define the variable underpinning the test. Items that cluster together do not provide sufficient information about the variable to allow interpretation, but if a sequence of clusters can be identified and each has a cohesive and interpretable meaning the variable can be clearly identified. Once items have been calibrated along the variable, they can be interpreted in terms of the item writers' intentions. To achieve this a skills audit is undertaken. However, even the writers intention can sometimes be misleading and a pilot study with pupils from the target population can identify the cognitive skills used by pupils obtaining the correct answer. Examination of the item score threshold locations provides information about the connections between an item and the underlying construct the set of items was designed to measure. In addition to providing a 'map' of pupils' increasing understanding, examination of model fit can provide information about how justified it is to measure the underlying construct with the particular set of items chosen (Wilson 1991). Good fit to the model suggests that the items are measuring the same one-dimensional construct, that is, the assessment has construct validity. Wright and Master's (1982) process for defining the variable starts with determining the item variance, and improving the estimate of this by adjusting for the calibration error in each item. Sai = Si - Sei 2 2 2 (2) Where Sai is the overall item variance adjusted for error; 2 Si is the observed item variance; and 2 Sei is the mean of the item calibration variances. This leads to the adjusted item standard deviation, SAi. The average calibration error (Sei) is given by the root mean square error 478 Vietnam Reading and Mathematics Assessment Study (S2ei). The ratio of the adjusted item standard deviation, Sai, to the average calibration error, Sei, provides a measure of the item standard deviation in units of error, and is termed the Item Separation Index, Gi: Gi = Sai / Sei (3) This is a measure of how well the items are separated along the variable. The reliability of the item separation, Ri`, for a particular sample can be defined as the ratio of the average adjusted item variance, Sai 2, to the observed variance, Si2 , that is the proportion of the variance that is not accounted for by estimation error: 2 2 2 2 Gi = Sai / Si = Gi / (1 + Gi ) (4) Ideally, this value, RI, will be close to 1. This figure can be obtained from the Quest computer software (Adams & Khoo, 1995) used for the analysis in this study. Once it has been established that the measured variable has direction and can define several statistically separate levels, the level to which the variable meets the intentions of the assessment developer needs to be determined. If the items in the test or assessment adequately address the underlying variable intended by the assessment developer, then those items should have adequate fit to the model. The extent to which the set of items defines the variable, as described above, and fits the model is a measure of how well they provide construct validity, that is measure a single, interpretable, underlying trait. A second analogous set of questions relates to the extent to which the persons undertaking the assessment are distributed along the variable defined by the items. If the persons are clustered at extremes of the variable we have a "ceiling" or "floor" effect - the difficulty levels of the items that define the variable does not match the ability measure of the persons being assessed. Where persons are clustered too closely together any inferences drawn from the assessment about pupils' achievement may be compromised. Statistics are derived in the same way as the item statistics and are termed as Test Reliability of Person Separation, RP, and person fit measures. These statistics are also provided by Quest (Adams & Khoo, 1995). These person measures provide additional information about the construct since they describe the extent to which the sample of test takers has responded in anticipated ways. If the responses of the pupils do not provide good fit, or the separation of persons along the variable is poor, it suggests that the assessment does not adequately represent the ability and understanding of the sample. It thus provides one approach to concurrent validity (Wright & Masters, 1982). The consistency of pupils' responses is one approach to establishing 479 Vietnam Reading and Mathematics Assessment Study concurrent validity. A second way of establishing this is to consider the performance of pupils on related measures. Rasch modelling also provides a method for undertaking this through test equating. If a set of tests, or assessments, addresses the same substantive construct then they may be linked to bring all test items, and pupils onto a single measured scale, provided that the original tests provide reliable measures against the underlying construct. It is inappropriate to link tests that do not address the same substantive variable, or that are not statistically reliable. The initial step in the equating process is to establish the item difficulties of all items on each test. Providing there are common items across all test forms, it is possible to fix the difficulty levels of these common items and use this set as a basis for equating all the tests. This process is called anchoring, and this specific form, using item difficulties as a base, is known as common item equating. It is also possible to link through common person equating, where the same set of pupils undertake several tests addressing the same construct. Once a set of tests has been anchored they may be directly compared to establish differences and similarities in either the behaviour of the items or the behaviour of the test takers. If these behaviours are similar across a set of reliable tests addressing the same construct, concurrent validity is established. Test Scores The test data were analysed using both classical and Rasch model analyses. Classical analysis uses a test score as an estimate of ability. The test score in this instance was obtained by counting the number of correct items. Analysis according to the classical model assumes that this test score estimate of ability is a combination of a true score and error. A variance estimate was obtained for each of these - the true score and error - and from these, the reliability of the test data was obtained. The reliability is the ratio of the true score variance to the total variance of the test scores. In this model the aim of test construction is to optimise the correlation between the item score and the test total score (the discrimination index). This often means that the correlation is maximised if the percent correct on each item is about 50 percent. Under the classical model, the test needs to be constructed according to a set of discrete outcomes or objectives representative of the curriculum in order to achieve this result. This emphasis is relaxed with a Rasch model analysis of test data but the design of the test makes more stringent demands on the developer. Using the Rasch approach, an underlying variable was hypothesised (as in the test blueprints outlined above) with a set of skill levels describing increasing difficulty of tasks or increasing levels of ability of pupils. Items were prepared to match each of the levels on that variable and together the set of items was expected to provide indications across the range of difficulty 480 Vietnam Reading and Mathematics Assessment Study associated with the skill and knowledge levels at Grade 5 in Vietnam's primary schools. In order to achieve this a series of expert panels were established to nominate the levels of increasing difficulty and then to develop items for each of the levels. The variables in this study were "Reading Comprehension" and "Mathematics". Curriculum strands (or sub domains) described in the test blueprint section were regarded as curriculum emphases rather than discrete variables. Sub scores prepared for these sub domains or strands were used more for curriculum purposes than the explain differences in pupil performance. Pupil measures It is evident that, in a test of 60 items covering six levels of ability and three sub domains, it was not possible to have a lot of items in every cell of the test blueprint. Measures of the domains of reading and mathematics could still be derived from the test scores in each of the sub domains, but their traditional (alpha) reliability would be expected to be low. Rasch separation reliabilities are also reported and an explanation of these can be found in Wright and Masters (1982). The analyses of the pupil performance therefore focused on two main measures: the reading and mathematics achievement measures. Two measures have been used in this report to describe pupil performance. The first is a transformed score in which the ability estimates, were standardised with a mean of 500 and a standard deviation of 100. The second is a similar score, but derived when the pupil responses to test item are combined with the teacher test item responses and mapped onto the same scale. Reading and math sub-scores Despite the caution stated earlier, sub-scores were produced for narrative, expository and document literacy from the reading tests and for number, measurement, and space from the mathematics tests. These scores were also standardised to the mean of the overall test that had a mean of 500 and a standard deviation of 100. These meant that each of these sub scores could be compared to each other and to the overall test mean score. Reading and mathematics levels A primary purpose of the data analysis in the study was the identification of skill levels (or levels competence) in reading and mathematics displayed by the pupils. Each of these levels of competence were identified and described by a panel of specialists selected by the Ministry and representing curriculum, assessment and teaching specialists. The process for identifying and defining the competence levels has been described in the following sections of this chapter. The level reached by a pupil indicated the highest level of competence that the pupil had typically demonstrated. This information has consequences for curriculum planning for the school level operations as well as for the national curriculum. At the school level, for instance, teachers need to focus instruction on the next level in the list. At the 481 Vietnam Reading and Mathematics Assessment Study national level, new resources (teaching/learning materials) and teacher training need to be produced in order that teachers may deal with the different levels appropriately. The Reading Test: Calibration and Interpretation The calibration estimates for the reading test items have been presented in Table 2.2. For each item, the summary statistics were as follows. The p- value represents the proportion of the sample with the correct answer. This is the mean item score. The item standard deviation was presented next, followed by the correlation between the item score and the test total score. The alpha reliability estimate of the test has then been presented and the estimate of whether this value would be affected if the item were to have been omitted from the test. The sample size and the test length combine to make the reliability estimate extremely stable and omitting single items would have had no effect. The reliability remained at 0.92. These are the classical item analysis statistics. The next set of statistics arises from the item response analysis results. These include the difficulty (logit) and measurement error (SEM), and the infit and outfit estimates. The last set of data provides the descriptive analysis of the item in detailing the proportions of pupils selecting each alternative, the proportion making a multiple response and the proportion omitting the item. It can be seen that the test was relatively easy in that the mean pupil ability measures were high. The measurement errors were very small, but this is appropriate given the very large sample size. The INFIT values were all within the range of 0.7 to 1.3 and hence there was evidence of a dominant underlying dimension in the variable being measured. The mean item difficulty was arbitrarily set to be zero. The variance of item difficulty levels was 1.12 with a reliability of item separation of 0.99. The mean item INFIT was 0.99 with a variance of 0.01. There were no items with zero scores and no items with perfect scores. The mean pupil ability estimate was 0.97, indicating that the pupil ability level was slightly higher than the difficulty of the overall (easy) test. The variance of the pupil ability estimates was 1.54 which was slightly greater than the variance of item difficulties. This indicates that the task was not well matched to the range of pupil abilities, particularly at the upper end of the ability range. The reliability of the pupil separation index was 0.91. The mean squared INFIT index was 1.00 with a variance of 0.03. This evidence indicates that the test was measuring a single dominant variable and that a single dominant latent 482 Vietnam Reading and Mathematics Assessment Study variable underpinned the set of items. It also indicated that the test successfully separated the pupils on the basis of ability (i.e. that it possessed acceptable criterion validity) as well as demonstrating construct validity. On the latter point however, there is no external evidence of the nature of the construct criterion. On the basis of possible confusion in the English translation of the items, four items were eliminated from the reading test. These items appeared to have content problems in the English version and, as the report was to be published in English, it was decided to remove them from the test. Eliminating four items did not affect the pupil ability estimate, and had a negligible effect on the already high reliability and other parameters of the test. Many of the characteristics of the test can be identified from the variable map, which has been presented shown for the pupil reading test as Figure 2.5. The chart has several sections to it. Working from the left of the figure the first characteristic of the chart is a scale that ranges from -2.0 to +4.0. This is the logit scale and is the metric of the Rasch model analysis that enables pupil ability and item difficulty to be mapped onto the same scale. The distribution of pupil ability is presented next and each 'X' represents approximately 60 pupils. It clearly shows that the range of ability is quite broad. Two cut points represented on the variable map, one at approximately a scale value of 1.0 and the other at about -0.5. These are the 'independence' levels discussed later in this chapter (see page 30). A detailed explanation of the methods of deriving these scores has been provided later (see pp. 31). The next component of the chart is the distribution of items illustrating their relative difficulty. Item 54 is the most difficult (it has the highest 'logit' value) and item 32 is the easiest item on the test (it has the lowest 'logit' value). It is also clear that the range of item difficulty is not as broad as the range of pupil ability. Many pupils were more able than the difficulty of the most difficult item. This indicates that the test was relatively easy. There was however a group of pupils at the lower end of the ability distribution well matched to the easier items. This outcome conformed with the results of the test trials. The next section of the chart illustrates how the items on the test divided into the sub-domains of narrative, expository and document literacy. It can be seen that the distribution of item difficulties in each of the domains covered the range of the test but none of the domains matches the range of pupil ability. Treating perfect and zero scores There are several approaches to the treatment of perfect and zero scores when 483 Vietnam Reading and Mathematics Assessment Study Table 2.2: Calibration estimates for the Reading test items Read Item p-value SD r-tot a-omit Logit SEM INFIT OITFIT A B C D MR Omit READ01 0.63 0.48 0.40 0.92 0.14 0.01 1.02 1 245 626 031 083 010 004 READ02 0.84 0.36 0.28 0.92 -1.26 0.01 1.05 1.26 028 100 843 020 005 003 READ03 0.45 0.50 0.51 0.92 1.06 0.01 0.89 0.87 122 212 188 446 023 008 READ04 0.67 0.47 0.24 0.92 -0.1 0.01 1.2 1.23 232 041 670 038 008 010 READ05 0.81 0.39 0.35 0.92 -1.03 0.01 1.01 0.99 033 069 062 814 004 018 READ06 0.87 0.33 0.37 0.92 -1.54 0.01 0.95 0.88 073 872 021 023 006 005 READ07 0.79 0.41 0.33 0.92 -0.86 0.01 1.05 1.05 097 791 045 028 025 014 READ08 0.70 0.46 0.44 0.92 -0.26 0.01 0.97 0.94 109 114 699 052 014 012 READ09 0.58 0.49 0.48 0.92 0.37 0.01 0.93 0.89 075 251 581 067 001 024 READ10 0.68 0.47 0.36 0.92 -0.17 0.01 1.06 1.08 124 682 124 048 007 014 READ11 0.80 0.40 0.38 0.92 -0.93 0.01 0.98 0.95 801 066 095 027 006 005 READ12 0.74 0.44 0.41 0.92 -0.53 0.01 0.99 0.94 078 742 093 062 018 007 READ13 0.63 0.48 0.46 0.92 0.1 0.01 0.95 0.9 151 094 634 090 023 008 READ14 0.60 0.49 0.47 0.92 0.29 0.01 0.95 0.91 091 141 133 597 003 035 READ15 0.79 0.41 0.37 0.92 -0.83 0.01 1.00 1.00 061 096 788 045 005 005 READ16 0.89 0.31 0.46 0.92 -1.78 0.01 0.84 0.58 039 020 894 040 002 005 READ17 0.67 0.47 0.34 0.92 -0.11 0.01 1.08 1.11 672 129 074 110 008 007 READ18 0.82 0.38 0.41 0.92 -1.08 0.01 0.94 0.85 064 044 821 061 003 007 READ19 0.82 0.38 0.39 0.92 -1.09 0.01 0.96 0.93 823 081 034 049 003 011 READ20 0.84 0.37 0.41 0.92 -1.2 0.01 0.94 0.83 003 835 032 097 002 004 READ21 0.35 0.48 0.45 0.92 1.58 0.01 0.92 0.91 237 187 214 346 008 009 READ22 0.57 0.50 0.33 0.92 0.45 0.01 1.11 1.14 185 107 567 126 004 012 READ23 0.74 0.44 0.35 0.92 -0.5 0.01 1.05 1.06 164 048 034 738 008 007 READ24 0.85 0.35 0.40 0.92 -1.35 0.01 0.94 0.81 007 853 024 043 003 007 READ25 0.73 0.44 0.29 0.92 -0.47 0.01 1.12 1.21 078 147 732 032 002 008 READ26 0.29 0.46 0.45 0.92 1.88 0.01 0.91 0.88 103 154 431 294 009 010 READ27 0.72 0.45 0.54 0.92 -0.39 0.01 0.85 0.78 136 720 098 029 004 012 READ28 0.83 0.37 0.46 0.92 -1.18 0.01 0.88 0.74 833 058 029 071 004 004 READ29 0.75 0.43 0.44 0.92 -0.61 0.01 0.94 0.89 079 100 058 754 002 007 READ30 0.50 0.50 0.50 0.92 0.8 0.01 0.9 0.89 138 120 213 497 019 014 READ31 0.77 0.42 0.43 0.92 -0.71 0.01 0.94 0.95 052 080 770 089 003 006 READ32 0.90 0.30 0.43 0.92 -1.88 0.01 0.86 0.6 903 038 021 031 002 004 READ33 0.43 0.49 0.28 0.92 1.15 0.01 1.16 1.24 427 006 259 215 029 009 READ34 0.42 0.49 0.29 0.92 1.18 0.01 1.13 1.24 141 233 422 182 007 015 READ35 0.90 0.30 0.34 0.92 -1.89 0.01 0.94 0.87 016 018 052 903 001 009 READ36 0.87 0.33 0.39 0.92 -1.54 0.01 0.94 0.8 020 063 031 872 001 012 READ37 0.84 0.37 0.48 0.92 -1.25 0.01 0.85 0.69 024 841 063 060 001 009 READ38* 0.12 0.32 -0.21 0.92 3.25 0.01 1.47 4.16 116 192 104 561 014 013 READ39* 0.59 0.49 0.38 0.92 0.32 0.01 1.05 1.05 151 591 100 088 011 059 READ40* 0.22 0.42 0.02 0.92 2.35 0.01 1.36 2.14 103 221 128 509 003 035 READ41* 0.24 0.42 0.28 0.92 2.24 0.01 1.08 1.23 110 545 071 236 005 033 READ42 0.64 0.48 0.55 0.92 0.07 0.01 0.86 0.79 151 077 101 639 008 024 READ43 0.55 0.50 0.48 0.92 0.53 0.01 0.93 0.91 145 120 551 154 008 021 READ44 0.39 0.49 0.33 0.92 1.37 0.01 1.09 1.17 216 187 387 176 009 025 READ45 0.57 0.50 0.48 0.92 0.43 0.01 0.94 0.91 303 063 571 044 004 016 READ46 0.85 0.36 0.43 0.92 -1.34 0.01 0.9 0.78 051 852 036 041 002 017 READ47 0.66 0.47 0.51 0.92 -0.04 0.01 0.9 0.84 065 660 145 105 003 022 READ48 0.37 0.48 0.38 0.92 1.47 0.01 1.01 1.07 163 156 274 367 016 023 READ49 0.67 0.47 0.40 0.92 -0.1 0.01 1.02 1.06 057 670 098 142 006 025 READ50 0.61 0.49 0.49 0.92 0.21 0.01 0.93 0.89 092 103 614 158 005 029 READ51 0.57 0.49 0.44 0.92 0.41 0.01 0.98 0.97 060 128 194 574 003 042 READ52 0.87 0.34 0.49 0.92 -1.5 0.01 0.83 0.58 869 030 026 045 001 029 READ53 0.62 0.49 0.40 0.92 0.17 0.01 1.03 1.06 024 236 620 083 003 034 READ54 0.26 0.44 0.39 0.92 2.09 0.01 0.94 1.01 274 193 223 259 005 045 READ55 0.49 0.50 0.37 0.92 0.86 0.01 1.07 1.09 248 485 076 142 002 047 READ56 0.80 0.40 0.42 0.92 -0.9 0.01 0.95 0.9 797 091 044 027 004 037 READ57 0.50 0.50 0.41 0.92 0.77 0.01 1.02 1.02 216 150 088 503 003 040 READ58 0.29 0.45 0.40 0.92 1.9 0.01 0.96 0.97 338 223 094 291 006 048 READ59 0.62 0.48 0.35 0.92 0.15 0.01 1.08 1.1 080 624 019 052 005 050 READ60 0.50 0.50 0.38 0.92 0.8 0.01 1.04 1.07 498 114 078 253 005 052 all 0.64 0.92 * items eliminated from the test. 484 Vietnam Reading and Mathematics Assessment Study Logit Pupils All Reading Items Narrative Document Expository XX 4.0 XXXX XXXXX 3.0 XXXXXXX XXXXXXX XXXXXXXX XXXXXXXX 54 54 2.0 XXXXXXXXXXXXXXXXXX 26 58 26 58 XXXXXXXXX XXXXXXXXXX 21 48 48 21 XXXXXXXXXXXXXXXXXXXX 44 44 XXXXXXXXXX 33 34 33 34 XXXXXXXXXXXXXXXXXXX 3 3 1.0 XXXXXXXXXXXXXXXXXX 30 55 57 60 57 60 30 55 XXXXXXXXX XXXXXXXXXXXXXXXX 22 43 45 51 45 43 51 22 XXXXXXXXXXXXXXXX 9 14 9 14 XXXXXXXXXXXXXX 1 13 42 50 53 59 1 59 42 50 13 53 0.0 XXXXXXXXXXXXX 4 17 47 49 4 17 47 49 XXXXXXXXXXXX 8 10 8 10 XXXXXXXXXX 23 25 27 25 27 23 XXXXXXXXX 12 29 12 29 XXXXXXXX 7 15 31 15 31 7 XXXXXXX 5 11 56 5 56 11 -1.0 XXXXXX 18 19 20 28 18 19 20 28 XXXXX 2 24 37 46 2 24 46 37 XX 6 36 52 6 36 52 XXX X 16 32 35 16 32 35 -2.0 X Figure 2.5 Variable Map of the Pupil Reading Test (N = 72666 L = 56 (four items omitted)) 485 Vietnam Reading and Mathematics Assessment Study calibrating test data using the Rasch model. Strategies are needed for this set of circumstances, because calibration using item response modelling ignores the data linked to perfect or zero scores. This is because the calibration process involving the Rasch model takes the logarithm of the ratio of successes to failures and a score with no successes or with no failures cannot therefore be calibrated. The procedures used in most software programs tend to eliminate the cases from the data file. However this has a serious effect on the calculations of means and standards errors because the mean logit score can be affected, and because the number of cases is also affected, thus impacting on the computation of errors. Strategies that have been tried include the following. 1. Elimination: This procedure treats the perfect and zero scores as missing data. This is clearly an incorrect procedure. Ability estimates are generally inflated and the number of cases is reduced thus affecting sampling error calculations. 2. Rescoring. In this procedure the score patterns of the extreme score cases are altered. For those cases with perfect scores the hardest item on the test is rescored as incorrect, thus lowering the raw scale tally. For those cases with zero scores the easiest items on the test is rescored as correct. This strategy make the assumption that the extreme items are closest to the ability levels of the relevant students. That is the difficulty of easiest item is closest to the ability of the zero scorers. The difficulty of the hardest item is assumed to be closest to the ability of the most able student or the perfect scorers. Rescoring therefore is assumed to have a minimal effect on the ability estimates of the extreme scorers and is assumed to have a minimal effect the overall ability estimates. 3. Extrapolation. In this procedure the test characteristic curve is used to relate the raw scores and the logits. Using this approach the asymptotic behavior of the test characteristic curve illustrates the likely logit score associated with a perfect zero or perfect total on the test. However, it assumes that all those with a zero score have an ability that is beyond the lower bound estimates of the test's measuring capacity and that all cases with zero scores have low and equal estimates of ability below the lower bound limit of the test. The ability of cases with perfect scores is assumed to be beyond the upper limit of the test and all cases with perfect scores have high and equal estimates of ability beyond the upper bound limit of the test. The first part of this assumption is likely to be true, but the assumed equality of high and low ability groups is more difficult to defend. It is however argued to be a reasonable set of assumptions given the absence of further information from the test. It is also only appropriate when missing is regarded as incorrect, as perfect scores for cases with missing data inflate the ability estimates. 486 Vietnam Reading and Mathematics Assessment Study The procedure adopted in this study involved the extrapolation based on the test characteristic curve. The extrapolated scores were set at -5.5 for lower bound estimates associated with zero scores and +6.0 for upper bound estimates associated with perfect scores The characteristic curves for the pupil and teacher tests are shown in Figures 2.6 and 2.7 and the score to logit conversion tables are shown in Table 2.3.. Figure 2.6: Teacher test calibration Figure 2.7: Pupil test calibration 487 Vietnam Reading and Mathematics Assessment Study Table 2.3: Score To Logit Conversion for tests Pupil Teacher Raw score Read Logit Math Logit Read Logit Math Logit 0 1 -4.55 -4.85 2 -3.83 -4.11 3 -3.39 -3.65 -2.93 4 -3.07 -3.32 -2.57 5 -2.81 -3.04 6 -2.60 -2.81 7 -2.41 -2.61 -1.74 8 -2.24 -2.43 -1.52 9 -2.09 -2.26 -1.43 -1.32 10 -1.95 -2.11 -1.25 -1.13 11 -1.82 -1.96 -1.09 -0.95 12 -1.69 -1.82 -0.93 -0.78 13 -1.58 -1.70 -0.78 -0.62 14 -1.47 -1.57 -0.64 -0.46 15 -1.36 -1.45 -0.5 -0.31 16 -1.26 -1.34 -0.37 -0.17 17 -1.16 -1.23 -0.23 -0.03 18 -1.06 -1.12 -0.1 0.11 19 -0.96 -1.02 0.03 0.25 20 -0.87 -0.92 0.15 0.38 21 -0.78 -0.82 0.28 0.5 22 -0.69 -0.72 0.4 0.63 23 -0.61 -0.62 0.52 0.76 24 -0.52 -0.53 0.64 0.88 25 -0.44 -0.43 0.77 1.01 26 -0.35 -0.34 0.89 1.13 27 -0.27 -0.25 1.02 1.26 28 -0.18 -0.16 1.14 1.38 29 -0.10 -0.07 1.27 1.51 30 -0.02 0.02 1.4 1.64 31 0.07 0.11 1.54 1.77 32 0.15 0.20 1.68 1.91 33 0.23 0.29 1.82 2.05 34 0.32 0.38 1.97 2.2 35 0.41 0.47 2.13 2.36 36 0.49 0.57 2.3 2.53 37 0.58 0.66 2.48 2.71 38 0.67 0.75 2.67 2.91 39 0.76 0.85 2.89 3.13 40 0.85 0.94 3.13 3.38 41 0.94 1.04 3.42 3.69 42 1.04 1.14 3.77 4.07 43 1.14 1.25 4.24 4.59 44 1.24 1.35 5 5.43 45 1.35 1.46 46 1.46 1.58 47 1.57 1.70 48 1.69 1.82 49 1.82 1.95 50 1.96 2.09 51 2.10 2.24 52 2.26 2.40 53 2.43 2.57 54 2.63 2.77 55 2.85 2.99 56 3.12 3.26 57 3.44 3.59 58 3.89 4.04 59 4.63 4.77 60 488 Vietnam Reading and Mathematics Assessment Study Interpreting the tests: Competence levels In addition to ability measures and the transformed score (the 500 score) reported in Volume 2, other measures related to curriculum and educational outcomes were derived from the data. The first of these has been referred to as the competence levels and related directly to the definition of criterion-referenced interpretation of tests. Glaser (1963) first defined criterion-referenced performance and development in terms of the tasks performed. However, this definition lost the idea of multiple tasks that form a cohesive and developmental continuum, and the misinterpretation of the concept in the 1970s led to the distortion of the concept. Glaser later clarified criterion referencing as 'the development of procedures whereby assessments of proficiency could be referred to stages along progressions of increasing competence.'(1981, p.935). The words "stages along progressions of increasing competence" are of immense importance in test design and calibration. However, criterion referencing is regarded now as a means of interpretation rather than as a means of test design. Criterion referenced interpretation is the correct term rather than criterion referenced testing. It is also an excellent framework within which to use item response modelling. Combining the ideas of criterion-referenced interpretation with an item response modelling directly links the position of a person or an item on a variable (as shown in the variable map) to an interpretation of what a pupil, or groups of pupils, can do, rather than focussing on a score or the performance relative to a percentage or a group. It also orients the use of the test data towards substantive interpretation of the measurement rather than reporting a score or grade. The procedure gives meaning to test scores. It is this application that it used here and the substantive interpretation of the levels of increasing competence that is addressed now. The underlying constructs hypothesised presented in Figures 2.1 and 2.3 were examined using the variable map generated by the Rasch model analysis. It can be seen from each of the following variable maps that several items grouped together at different points along the uni- dimensional scale and the major question was whether these clusters could be interpreted as having something in common. Each item is reviewed for the skills involved in responding to the item and it a matter of substantive interpretation. The process requires an understanding or 489 Vietnam Reading and Mathematics Assessment Study empathy with 'how the pupils think' when they are responding to the items. Experienced teachers are very good at this and those dealing with Vietnamese language instruction, and who are accustomed to dealing with the marking scheme, readily identified the levels within the test. The variable map shows that items can be grouped according to similar difficulty levels. Given that the ability of the pupils is matched to the difficulty of the items and the items and pupils are mapped onto the same scale, the pupils can also be grouped within the same 'ability' / 'difficulty' range as the items that have similar difficulty levels. This grouping of items (and pupils) identifies a kind of 'transition point', where an increase of item difficulty is associated with a change in the kind of cognitive skill required to achieve a correct answer. Recall the relationship described above. When ability and difficulty are equal the odds of success are 50/50. From this it can be deduced that, if the pupil were to improve a little, he or she would have a better than even (50/50) chance of succeeding on items in this group. It could be argued that the main task of a teacher is to increase the odds of success in each of these competency levels to a level greater than 50/50. If this improvement is close to the transition point, then the pupils are beginning to exhibit ability associated with a change in cognitive skill. The skill level demonstrated by the pupil was defined by the set of cognitive skills demanded by the group of items. Curriculum and teaching specialist panels appointed by the Ministry of Education undertook the content analysis of the skills/competencies required to succeed on the identified set of items. This led to an understanding of the kinds of skills being demonstrated by pupils at this level on the continuum underpinning the pattern of item difficulty estimates on the underlying variables (reading or mathematics). Moreover, the odds of 50/50 at the transition points could be linked to a change in the required cognitive skill and this could be directly translated into an implication for teaching. If the skill changed, then this had an implication for a change in teaching and discussions with curriculum specialists were held to identify the kind of instruction needed to progress the pupil on the variable. A summary description of these skills can then be assigned to each item and pupil group. The first point (item grouping) is justified on statistical and conceptual grounds if the items have behaved in a cohesive manner that enables an interpretation of a variable underpinning the test. This is sometimes described as a Rasch-like manner because it is also a requirement of the 490 Vietnam Reading and Mathematics Assessment Study Rasch model analysis. The second point (labelling the skills) is based on conceptual rather than on statistical grounds. If the items within a group do not suggest a meaningful and unifying set of skills or competencies, the set may need to be 'adjusted' to make the interpretation clearer. That is, some items may need to be omitted because, despite statistically appropriate qualities, they may not be conceptually relevant to the underlying construct or to identifiable and comprehensible levels within the construct. This is a far more powerful reason for omitting items from a test than a misfit analysis. Under these circumstances, they might not belong in the test at all. These procedures can, at times, also identify gaps in the item set. There is a further advantage to this procedure. If the content analysis 'back translates' to match or closely approximate the original hypothesised construct used to design and construct the test, it can also be used as evidence of the construct validity. When this is linked to the index of item separation there are two pieces of evidence for the construct validity of the test (See Wright and Masters 1982, p.94) The technique of "levels" has been used sparingly but has emerged in several international studies. Greaney and others used the procedure in their report on the 'Education For All' project (Greaney, V., Khandker, S.R. and Alam, M. 1990) in which they cited Griffin and Forwood's (1990) application of this strategy in adult literacy. To assist in this procedure the logit values of the item difficulties were ordered according to increasing item difficulty. Each item was also analysed for the underpinning cognitive skill involved in obtaining the correct answer. The results of these analyses have been presented in Figure 2.8 and Table 2.4. The difficulties of the test items were also plotted in increasing order of difficulty and the sets of items were examined to identify specific clusters or groupings. The two criteria described above were used. First, there have to be identifiable sets of items and these sets need to have a common substantive interpretation of the underpinning skill. Grouping items on the variable map is a first step, but it is imprecise because of the constraints of printers and line feeds and this may place some items with different difficulty on the same physical line merely because of a hardware restriction. Nevertheless it is a good first step as an inspection of the variable map can often identify broad categories and clusters. The chart in Figure 2.5 illustrated where the 491 Vietnam Reading and Mathematics Assessment Study Table 2.4: Skills audit for each of the 60 Pupil reading test items Item # logit Cognitive skill underpinning the correct response 1 2.04 Match exact words and paraphrase from Chinese origin 2 0.63 Locating information from text 3 2.97 Combining several ideas (format-all of the above) 4 1.8 Understanding implications 5 0.87 Understanding author's main purpose 6 0.35 Locating information from text 7 1.04 Locating information from text 8 1.64 Integrating reading and math skills 9 2.28 Integrating reading and math skills 10 1.73 Understanding meaning of vocabulary 11 0.96 Locating information from text 12 1.37 Locating information from text (format - negative questions, using background knowledge) 13 2 Locating information from text (format - negative questions) 14 2.2 Locating information from text (understanding signal words-"prediction" in the stem) 15 1.06 Understanding meaning of words 16 0.11 Exact match of text with adjacent text 17 1.79 Understanding meaning of sentences 18 0.82 Locating information from text 19 0.8 Understanding relationship between events in text 20 0.7 Locating information from text 21 3.51 Combining several ideas and using outside knowledge 22 2.35 Locating specific information from text 23 1.4 Locating information from text 24 0.54 Understanding meaning of word 25 1.43 Locating information from text 26 3.81 Combining several ideas (format-all of the above) 27 1.51 Understanding meaning of sentences 28 0.71 Locating information from text 29 1.29 Locating information from text 30 2.71 Inferring meaning from context (format-negative question) 31 1.19 Locating information from text 32 0.01 Match exact words and paraphrase 33 3.07 Deducing meaning from context 34 3.1 Requiring interpretation beyond text level, unfamiliar topic 35 0 Matching word and visual stimulus 36 0.35 Matching word and visual stimulus 37 0.65 Locating information from text 38 2.39 Locating specific information from text (too many details in long options ) 39 2.23 Locating information from text & illustration 40 2.65 Locating specific information from text 41 4.18 Link a concept to a visual stimulus and bring outside knowledge to the solution 42 1.97 Locating information from text 43 2.44 Locating specific information from text 44 3.29 Understanding author's main purpose on the basis of the title 45 2.33 Locating specific information from text 46 0.55 Understanding relationship between events in text 47 1.86 Understanding implications 48 3.39 Understanding figurative meaning of word (format-negative question) 49 1.8 Locating information from text 50 2.11 Inferring meaning from context (option d attracts some above average pupils) 51 2.32 Locating specific information from text 52 0.39 Locating information from text 53 2.08 Locating information from context 54 4.03 Combining several ideas and using outside knowledge (format-all of the above) 55 2.77 Understanding main idea, choosing a title 56 1 Locating information from text 57 2.68 Locating specific information from text 58 3.82 Combining several ideas, requiring interpretation beyond text level 59 2.05 Understanding author's main purpose 60 2.71 Understanding figurative meaning 492 Vietnam Reading and Mathematics Assessment Study difficulty of items changed. The question then arose that if the difficulty increased for sets of items, did the nature of the underpinning skill also alter? The two sets of information were then explored in unison. Natural breaks in difficulty were identified and then the items and the cognitive descriptions were examined to determine if a set with a common substantive interpretation could be found. A panel of specialists from the Vietnamese Ministry of Education and Training and the National Institute for Educational Science joined the item writers for this exercise. Together they identified the breaks in the variable and then offered the substantive interpretation of the levels of competence. These have been presented in Figure 2.9. Figure 2.8 Relative difficulties of reading items and cut points for competence levels Reading Skill Levels Level 1 Matches text at word or sentence level aided by pictures. Restricted to a limited range of vocabulary linked to pictures Level 2 Locates text expressed in short repetitive sentences and can deal with text unaided by pictures. Type of text is limited to short sentences and phrases with repetitive patterns. Level 3 Reads and understands longer passages. Can search backwards or forwards through text for information. Understands paraphrasing. Expanding vocabulary enables understanding of sentences with some complex structure. Level 4 Links information from different parts of the text. Selects and connects text to derive and infer different possible meanings. Level 5 Links inferences and identifies an author's intention from information stated in different ways, in different text types and in documents where the message is not explicit. Level 6 Combines text with outside knowledge to infer various meanings, including hidden meanings. Identifies an author's purposes, attitudes, values, beliefs, motives, unstated assumptions and arguments. Figure 2.9: Interpretation of the reading kevels from the analysis of reading test item sets These identified reading skill levels should bear a close resemblance to the 493 Vietnam Reading and Mathematics Assessment Study proposed levels in the test blueprint. A comparison of the proposed and obtained levels has been presented in Figure 2.10. Proposed Derived Level 1: Pupils at this level should be assessed in terms of linking Matches text at word or sentence level aided by pictures. Restricted to words and pictures where the pictures depict common objects of a a limited range of vocabulary linked to pictures 'concrete' nature. Level 2: Pupils at this level were expected to be able to demonstrate Locates text expressed in short repetitive sentences and can deal with an ability to link words to more abstract concepts such as propositions text unaided by pictures. Type of text is limited to short sentences and of place and direction and, perhaps, ideas and concepts such as phrases with repetitive patterns. comparatives and superlatives (happiest, biggest, larger, etc.). Level 3: Pupils at this level were expected to be able to demonstrate Reads and understands longer passages. Can search backwards or an ability to link words from one setting to words in another setting forwards through text to for information. Understands paraphrasing. (such as a short, sentence length text), where there was a word match. Expanding vocabulary enables understanding of sentences with some complex structure. Level 4: Pupils at this level were expected to be able to demonstrate Links information from different parts of the text. Selects and connects an ability to deal with longer text passages, containing a sequence of text to derive and infer different possible meanings. ideas and content, where understanding is based on an accumulation of information by reading forward through the text. Level 5: Pupils at this level were expected to be able to demonstrate Links inferences and identifies an author's intention from information that they had reached a level where they can search backwards or stated in different ways, in different text types and in documents where forwards through a text seeking confirmation of understanding or the message is not explicit. linking a piece of information to ideas or information previously encountered. Level 6: Pupils at this level were expected to be able to demonstrate Combines text with outside knowledge to infer various meanings, that the pupils could link ideas from separate parts of a text and including hidden meanings. Identifies an author's purposes, attitudes, demonstrate an ability to infer an author's intention. values, beliefs, motives, unstated assumptions and arguments. Figure 2.10: A comparison of the proposed and derived skill levels While the levels in the proposed model and the observed test performance were not identical there was a sufficiently good match to indicate that the test design was successful and that the item writers were able to prepare items to match a test blueprint. Given the high item separation index and the match between the proposed and derived skill levels, there was strong evidence of construct validity in the reading test. Interpreting the tests: Benchmarks Benchmarks: Two benchmark levels were also established. They were based on the pupil's ability to cope with reading and mathematics tasks encountered in specific circumstances. The first benchmark was based on a pupil's ability to use reading and mathematics skills that were deemed to be needed to function in Vietnamese society. Those below this benchmark were described as 'pre-functional'. A second benchmark was based on an estimation of a pupil's ability to cope with the reading and mathematics tasks in the next grade of education, Grade 6, which is the first Grade of secondary education. 494 Vietnam Reading and Mathematics Assessment Study The two benchmarks helped to identify three groups of pupils. Those below the first benchmark would need considerable help to enable them to function and participate fully in Vietnamese society. Those above this benchmark but below the second would need assistance to help them cope with the reading and mathematics involved in secondary education. Pupils above the second benchmark were expected to be able to cope with the reading and mathematics involved in secondary education. Benchmark 1: pupils who were described as pre functional because they had not yet reached this benchmark of demonstrating reading or mathematics required for everyday activities in Vietnamese society. The label used in the tables is 'Pre functional'. It does not mean that a pupil is illiterate or non- numerate. There are basic skills that these pupils can demonstrate, but the skill level is not yet deemed by experts to be at a sufficient level to enable the person to be an effective member of Vietnamese society. Pupils who could demonstrate the kinds of skills needed to cope with life in Vietnam were those above this lower benchmark. These pupils were designated as functional in terms of their capacity to participate independently in Vietnamese society. The label used in the tables was 'Functional'. However it was deemed that they would need some remedial assistance to be able to cope with the reading and mathematics required at Grade 6 level. Benchmark 2: Those pupils whose performances above the second benchmark, were described as demonstrating the kinds of skills that were desirable in order to learn independently at the next level of schooling, without needing remedial assistance. The label used in the tables was 'Independent'. Establishing Standards and Cut Scores In testing, it is common to interpret the scores for a range of purposes. These purposes include pupil placement, selection, diagnosis and monitoring growth. Placement and selection based on test scores have been common for decades and a range of approaches have been used to identify the test score most appropriate for each of these decisions. In an educational context, standards are important. When standards are linked to assessments or measurements a cut-score or decision point is required, but the process of arriving at this decision point has been problematic. Before approaching this problem, it is important to make a distinction between cut- scores and performance standards because, in education practice, setting standards usually involves establishing cut-scores for a test. A cut score is defined as a point on the score scale, and standard is defined as a level of performance or competence. Judgement based methods of cut score definition, such as Angoff's summed 495 Vietnam Reading and Mathematics Assessment Study probabilities, determine the cut- score independent of performance data. It is decided on the basis of a review and scrutiny of the items themselves, leading to the judgement that the lowest acceptable limit or cut-score should be set at some agreed-upon value. The Angoff technique In 1971 Angoff presented a technique that involved asking expert judges to state the probability that the 'minimally acceptable person' would answer each item correctly. In effect, the judges would think of a number of minimally acceptable persons, instead of only one such person, and would estimate the proportion of minimally acceptable persons who would answer each item correctly. The sum of these probabilities, or proportions, would then represent the minimally acceptable score (Angoff, 1971, p. 515). In doing so, the cut-off score is made independent of performance data. The probability of 'minimally acceptable persons' is called a Minimum Pass Level, (MPL). With a number of judges independently making these judgments it would be possible to decide by consensus on the nature of the scaled score conversion without actually administering the test. The MPLs are averaged over judges to get the item MPL, and the items MPLs are summed over the items in the test to get a passing score. Two decisions were put to the specialist panels in Vietnam. 'How much' mathematics and reading comprehension is needed for each of two purposes. The first was that amount needed for a person to perform everyday activities as an independent citizen of Vietnam. The second was related to 'how much' reading and mathematics was needed for a pupil to learn independently in lower secondary school. For the second decision the panel was asked to identify an item that represented the threshold of minimum numeracy and reading ability for a person to perform as an effective citizen. These were defined as the level of numeracy and reading comprehension for an individual to cope in the community with everyday numeracy and reading tasks. The probability of success for this item was also set at 0.5. Each item on the test was then compared to the level of difficulty for this item using a 10-point scale. Values of the scale were treated as probabilities and the sum of the probability was used as an estimate for the score for the basic numeracy level. In Mathematics item 10 was chosen as the minimally competent representative item, while in reading the representative item was number 16. To facilitate the process, the judges were first asked to select two items. The first was to be representative of the difficulty for the successful transition from Grade 5 to Grade 6. This item was then judged to have probability of success of 0.5 for pupils who were at that level of ability. All other items were then rated on a 10-point scale relative to this item for which a rating of 5 was assigned. Easier items were assigned a score less than 5 and harder items were assigned a score that was more than 5. Each score represented the probability of success on the item for the minimally competent pupil and a 496 Vietnam Reading and Mathematics Assessment Study score of 1 represented a difficult item and a low probability of success and a score of 10 represented an easy item and a relatively high chance of success. These ratings were treated also as the probability of success for the pupil at an ability level required for successful transition from Grade 5 to Grade 6. In mathematics item 17 was chosen as the representative item. In reading the relevant item was item 58. 17. What percentage of the following shape does the coloured part occupy? A. 2% B. 4% Figure 2.11: Sample item representing the transition skill level This modification of the Angoff procedure was used to estimate the cut scores for relevant reading and mathematics skill levels and then for a skill level needed to 'progress' from Grade 5 to Grade 6 with an optimal chance of success in Grade 6. 10. The following addition operation: 5 4 2 9 + 1 3 9 5 equals : A. 6714 B. 6814 C. 6724 D. 6824 Figure 2.12: Sample item representative functional skill level The panels defined the minimum (functional) skill required for reading as the ability to identify information in a simple text using set procedures learned by rote. For progress from Grade 5 to Grade 6 with skills developed in the primary grades of education the pupil could be expected to be able to read a passage, differentiate between facts and opinion, identify the main idea. The pupil also should be able to show evidence of using a range of reading techniques. Examples of items that enabled pupils to demonstrate these skills are shown below. The procedure was used to establish the relevant cut scores for Reading which were identified as 22.2 and 38.7 respectively. These were high given that the test was reduced to a total of 57 items after deleting three items. The sample 'essential' and 'progress' items have been illustrated below. 497 Vietnam Reading and Mathematics Assessment Study Essential: Ability to identify information in a simple text. THE ARROGANT PEACOCK Once, there was a very beautiful and very arrogant peacock. All day, he stretched his neck, spread his wings, danced with his wide-open tail and considered himself to be the most beautiful. One day, the Peacock arrived at the lakeside, full of himself, serious and dignified. All of a sudden, he saw in the lake another bird looking as like to him as two peas in a pod. He stopped, showed off his big colourful fan tail. Immediately, the bird in the lake also stopped, showed off his similar tail. Growing angry, the Peacock glowered, and raised his crest. The bird in the lake also glowered, and raised his crest. Very angrily, the Peacock plunged into the lake intending to catch the other bird. However, there was no bird in the water. After some struggling, when he was about to sink, the Peacock fortunately caught a tree root and could get out of the water. Looking back into the lake, seeing the bird in the lake soaking wet, shivering with teeth clattering, he laughed with satisfaction. A racket-tail bird who witnessed the scene broke into laughter, and told the Peacock: "Uncle Peacock, haven't you realised that the bird in the lake is the very image of yourself?" 6. Where did the Peacock see the bird looking like him? A. in the forest B. in the middle of the field C. in the lake D. on a tree branch Figure 2.13: sample item representing basic skill level Independent Learner: Ability to read a passage, differentiate between facts and opinions and identify the main idea; use a range of reading techniques. 498 Vietnam Reading and Mathematics Assessment Study THE SPARROW I was returning from a hunt and was walking along the path into the garden. My dog was running in front of me. Suddenly it stopped and started crawling. It seemed to scent something. I looked along the path and saw a shiny golden young sparrow with a pinch of fluffy feathers on its head. It had fallen down from the nest above. The hunting dog slowly approached. All of a sudden, high from nearby high on a branch, an old sparrow with shining black chest darted down like a falling rock dropped right in front of the dog's muzzle. Its feathers were all fluffed out. It cried despairingly and heart-renderingly. It hopped two or three steps closer to the dog's wide-open muzzle - full of teeth. The old sparrow darted down to save its baby. It covered the baby with its body and was itself shaking with terror. Its voice was small but sounded fierce and thick and hoarse. In its eyes, the dog was a gigantic devil that was prepared to sacrifice itself. Of course it could have hidden up in the tree but a power greater than its desire to hide had swept it down to earth. My dog stopped and stepped backwards ... We must understand why it did so because there was a great power in front of it. I hurriedly called the embarrassed dog away, fully filled with admiration for the sparrow. 58. Why can we say: "The story has a happy ending"? A. because the dog stepped backwards when it saw the old sparrow. B. because the young sparrow escaped death. C. because the old sparrow did not have to worry any more. D. because of all these reasons in A, B, C Figure 2.14 Sample item representing the Grade 5-6 transition skill level This procedure enables cut scores to be set independent of performance data. It may be decided on the basis of a review and scrutiny of the items themselves, leading to the judgement that the lowest acceptable score (cut- score) should be set at some agreed-upon value. With a number of judges independently making these judgements it was possible to decide by 499 Vietnam Reading and Mathematics Assessment Study Table 2.5: Mean Estimates of success on reading and mathematics tests for benchmarks Reading independent Functional Mathematics Independent Functional Items Items READ01 .6 .3 PM01 .9 .5 READ02 .7 .5 PM02 .9 .5 READ03 .6 .4 PM03 .6 .4 READ04 .4 .3 PM04 .9 .4 READ05 .5 .2 PM05 .6 .4 READ06 .9 .5 PM06 .7 .4 READ07 .7 .5 PM07 .6 .4 READ08 .7 .4 PM08 .8 .5 READ09 .7 .5 PM09 .7 .4 READ10 .7 .4 PM10 .9 .5 READ11 .9 .6 PM11 .9 .5 READ12 .6 .3 PM12 .6 .3 READ13 .6 .3 PM13 .7 .4 READ14 .7 .4 PM14 .5 .3 READ15 .7 .4 PM15 .7 .4 READ16 .9 .5 PM16 .5 .3 READ17 .7 .4 PM17 .5 .3 READ18 .7 .4 PM18 .6 .4 READ19 .5 .3 PM19 .5 .4 READ20 .9 .6 PM20 .5 .3 READ21 .4 .2 PM21 .4 .3 READ22 .5 .3 PM22 .4 .3 READ23 .9 .6 PM23 .5 .3 READ24 .8 .5 PM24 .4 .3 READ25 .9 .6 PM25 .4 .2 READ26 .8 .6 PM26 .4 .2 READ27 .6 .3 PM27 .4 .3 READ28 .7 .5 PM28 .4 .3 READ29 .7 .5 PM29 .5 .3 READ30 .5 .2 PM30 .4 .3 READ31 .8 .4 PM31 .5 .3 READ32 .8 .5 PM32 .4 .1 READ33 .2 .1 PM33 .6 .4 READ34 .2 .1 PM34 .5 .4 READ35 .9 .6 PM35 .4 .2 READ36 .9 .6 PM36 .3 .3 READ37 .9 .6 PM37 .5 .4 READ38 .6 .3 PM38 .4 .3 READ39 .6 .3 PM39 .4 .3 READ40 .5 .3 PM40 .5 .3 READ41 .4 .2 PM41 .4 .3 READ42 .7 .5 PM42 .4 .3 READ43 .5 .3 PM43 .5 .3 READ44 .1 .1 PM44 .4 .3 READ45 .8 .5 PM45 .4 .3 READ46 .7 .5 PM46 .5 .3 READ47 .5 .3 PM47 .4 .3 READ48 .5 .3 PM48 .4 .3 READ49 .7 .4 PM49 .4 .2 READ50 .4 .2 PM50 .4 .3 READ51 .8 .4 PM51 .4 .2 READ52 .9 .6 PM52 .4 .3 READ53 .6 .3 PM53 .3 .2 READ54 .4 .2 PM54 .2 .1 READ55 .5 .3 PM55 .3 .1 READ56 .9 .6 PM56 .2 .1 READ57 .9 .6 PM57 .2 .1 READ58 .5 .2 PM58 .2 .2 READ59 .4 .2 PM59 .1 .1 READ60 .6 .3 PM60 .2 .1 all 38.7 22.2 28.5 18.3 Logits +0.56 -0.36 +0.09 -0.46 500 Vietnam Reading and Mathematics Assessment Study consensus on the nature of the scaled score conversion without actually administering the test. The estimates were averaged over judges to get each item estimate. These were then summed over the items in the test to get a required score. The probability estimates have been reported only to one decimal place since this was the accuracy required of the judges (Table 2.5). From Table 2.6 it can be seen that 10.7 percent of pupils had not yet reached a level of functional benchmark in reading and 2.8 percent had not reached the functional benchmark in mathematics. The standard errors of sampling (SE) for these figures were 0.30 and 0.13. The sampling in this study was so designed that it is possible to estimate these percentages as actual numbers of pupils. The equivalent numbers for 10.7 percent in reading were 205,763 and for 2.8 percent in mathematics were 54,739. Table 2.6: Percentages and sampling errors of pupils reaching functionality levels in reading and mathematics Functionality Read Math % SE % SE Independent Reached a level of reading and mathematics to enable 51.3 0.58 79.9 0.41 independent learning in Grade 6 Functional Reached the level for functional participation in Vietnamese 38.0 0.45 17.3 0.36 society Pre functional Below the level considered to be a minimum for functional 10.7 0.3 2.8 0.13 purposes in Vietnamese society Test Stability The stability of the test across provinces was also evident in the mathematics test (see Figure 2.15). Plotting the weighted mean logit and the weighted mean infit values for each province, together with standard errors for these indices shows that the relationship between ability and performance was consistent across provinces. While the mean ability varied a great deal across provinces, the Infit value remained relatively stable. That is the analysis of fit to the Rasch model supported the proposition that the tests were measuring substantively the same variable in each of the provinces, so that the differences in performance were real differences, and not differences contrived by alternative external factors unrelated to the test. It demonstrates that the test provided a fair measure of ability across all provinces and that differences in ability are explained by other factors outlined in Volume 1 of this report. This evidence, added to the validity information provided above, lends further support to the validity of the test. Further tests of fit for ethnic and location independent variables were also explored for both Reading and Mathematics. The means of both ability and INFIT for these factors have been shown in the Figure 2.16. It is clear that Infit is not related to ethnicity 501 Vietnam Reading and Mathematics Assessment Study Figure 2.15 Relationship between fit and score on the Reading test or to location. However ability estimates are related to ethnicity and to location. This supports the assertion that the tests were fair and unbiased for people from different locations and ethnic backgrounds and that the test is able to detect real differences among these groups. It was evident that the tests were measuring the same variables for sub-groups and that differences in the achievement measures represent real differences and not an artefact of bias in the test for low scoring groups. Figure 2.16: Indices of score and fit against location and ethnicity 502 Vietnam Reading and Mathematics Assessment Study The Mathematics Test: Calibration and Interpretation The calibration statistics for the pupil mathematics test have been presented in Table 2.7. It can be seen that the summary follows the same pattern as for the reading comprehension test. For each item, the summary statistics were as follows. The p-value represents the proportion of the sample with the correct answer. This is the mean item score. The item standard deviation has been presented next, followed by the correlation between the item score and the test total score. The alpha reliability estimate of the test has then been presented together with the estimate of whether this value would be affected if the item were to have been omitted from the test. The sample size and the test length combine to make the reliability estimate extremely stable and omitting single items would have had no effect. The reliability remained at 0.92. These are the classical item analysis statistics. The next set of statistics are from the item response analysis results. They include the difficulty (logit) and measurement error (SEM), and the INFIT and OUTFIT estimates. The last set of data provides the descriptive analysis of the item in detailing the proportions of pupils selecting each alternative, the proportion making a multiple response and the proportion omitting the item. It can be seen that the measurement errors were again very small. The INFIT values are all within the range of 0.7 to 1.3 and this is evidence of a dominant underlying dimension in the variable being measured. The mean item difficulty is arbitrarily set to be zero. The variance of item difficulty levels was 1.69 with a reliability of item separation of 0.99. The mean item INFIT was 1.00 with a variance of 0.01. There were no items with zero scores and no items with perfect scores. The mean pupil ability estimate was 0.86, indicating that the pupil ability level was slightly higher than the difficulty of the overall (easy) test. The variance of the pupil ability estimates was 1.44, slightly greater than the variance of item difficulties and this indicates that the task was not well matched to the range of pupil abilities, particularly at the upper end of the ability range. The reliability of the pupil separation index was 0.91. The pupil mean-squared INFIT index was 0.99 with a variance of 0.02. This evidence indicates that the test was well matched to the majority of the pupil sample and that a single dominant latent variable underpinned the set of items. It also indicated that the test successfully separated the pupils on the basis of ability (i.e. that it possessed acceptable criterion validity) as well as demonstrating construct validity. On the latter point however, there is no external evidence of the nature of the construct criterion. Eight pupils scored zero on the test and 70 pupils had perfect scores on the test. No items were eliminated from the Mathematics test. The variable map for the pupil mathematics test has been presented as Figure 2.17. Two cut points are represented, one approximately at a 503 Vietnam Reading and Mathematics Assessment Study scale value of 0.2 and the other at about -0.5. These are the 'independence' levels discussed previously (see page 30). Item 59 was the most difficult and item 4 was the easiest item on the test (note the logit values on Table 2.7). It is also clear that the range of item difficulty was not as broad as the range of pupil ability. Many pupils were more able than the difficulty of the most difficult item. This indicates that the test was relatively easy. At the same time, though, a group of pupils at the lower end of the ability distribution was well matched to the easier items. This outcome is consistent with the results of the test trials. It is also evident from the chart that the test components were not of equivalent difficulty. The sub domains increased in order of difficulty from 'number' to 'measurement' to 'space/data' item sub sets. And the latter is considerably more difficult than the other item sets. Competence levels Following the same procedures as were outlined for the reading test, the competence levels in the pupil mathematics test were also identified. The cognitive skills underpinning the items have been described in Table 2.8 and the item relative difficulty levels have been shown in Figure 2.18. Six skill levels were identified using the ordered difficulty clusters and the cognitive skills embedded within each item. Natural breaks in difficulty were identified and then the items and the cognitive descriptions were examined to determine if a set with a common substantive interpretation could be found. A panel of specialists from the Vietnamese Ministry of Education and Training and the National Institute for Educational Science joined the item writers for this exercise. Together they identified the breaks in the variable and then offered the substantive interpretation of the skill levels. These derived levels have been shown in Figure 2.19. As with the reading test, the obtained levels were then compared with the proposed levels in the test blueprint. This comparison has been shown in Figure 2.20. For the mathematics test, the match was not as close as was found for the reading comprehension test. There are a number of possible explanations for this. The use and modification of available test items from other studies such as the IEA-TIMSS or the SACMEQ tests may have influenced the item selection. Adherence to local practices in mathematics assessment may have also dominated the test design. It may also have been a matter of interpretation of the initial blue print (see Figure 2.3). The stability of the test across provinces was also evident in the mathematics test. Plotting the weighted mean logit and the weighted mean infit values for each province, together with standard errors for these indices shows that the 504 Vietnam Reading and Mathematics Assessment Study Table 2.7: Calibration estimates for the mathematics test items Item p SD r-tot a-omit Logit SEM infit outfit a b c d MR omit PM01 0.89 0.32 0.28 0.92 -1.63 0.01 1 1.06 024 018 041 886 012 019 PM02 0.95 0.23 0.21 0.92 -2.52 0.02 0.98 1.16 005 026 946 012 001 009 PM03 0.92 0.27 0.25 0.92 -2.11 0.02 0.97 1.14 014 924 019 022 001 020 PM04 0.96 0.21 0.15 0.92 -2.71 0.02 1.02 1.34 005 006 955 020 002 012 PM05 0.80 0.40 0.36 0.92 -0.9 0.01 0.99 0.93 028 069 803 072 001 027 PM06 0.92 0.27 0.21 0.92 -2.12 0.02 1.01 1.32 024 924 033 009 001 010 PM07 0.72 0.45 0.49 0.92 -0.34 0.01 0.89 0.81 057 078 717 093 001 053 PM08 0.64 0.48 0.39 0.92 0.1 0.01 1.03 1.04 287 035 023 640 004 010 PM09 0.85 0.35 0.40 0.92 -1.3 0.01 0.91 0.76 052 041 853 016 002 036 PM10 0.94 0.25 0.14 0.92 -2.31 0.02 1.05 1.67 004 023 034 935 001 003 PM11 0.92 0.27 0.23 0.92 -2.04 0.01 0.99 1.18 018 035 919 022 001 005 PM12 0.80 0.40 0.45 0.92 -0.86 0.01 0.89 0.81 028 074 797 048 007 046 PM13 0.84 0.37 0.24 0.92 -1.16 0.01 1.07 1.43 074 836 061 016 001 012 PM14 0.69 0.46 0.42 0.92 -0.2 0.01 0.98 0.95 693 018 028 036 002 061 PM15 0.90 0.29 0.25 0.92 -1.85 0.01 1 1.12 905 021 052 015 001 007 PM16 0.84 0.37 0.30 0.92 -1.15 0.01 1.01 1.18 024 044 078 835 001 019 PM17 0.36 0.48 0.52 0.92 1.54 0.01 0.87 0.87 048 035 100 364 002 010 PM18 0.35 0.48 0.37 0.92 1.59 0.01 1.08 1.12 009 618 354 010 002 008 PM19 0.70 0.46 0.45 0.92 -0.22 0.01 0.94 0.93 017 182 697 092 001 010 PM20 0.82 0.39 0.32 0.92 -0.99 0.01 1 1.18 030 815 073 062 001 019 PM21 0.68 0.47 0.43 0.92 -0.14 0.01 0.97 0.97 031 978 682 153 004 052 PM22 0.64 0.48 0.42 0.92 0.1 0.01 1 0.99 093 097 107 640 003 061 PM23 0.53 0.50 0.32 0.92 0.65 0.01 1.14 1.24 113 120 224 534 003 006 PM24 0.68 0.47 0.39 0.92 -0.14 0.01 1.02 1 092 683 092 093 001 038 PM25 0.74 0.44 0.46 0.92 -0.46 0.01 0.91 0.87 987 082 737 069 003 022 PM26 0.33 0.47 0.48 0.92 1.76 0.01 0.91 0.92 300 313 325 040 001 020 PM27 0.37 0.48 0.42 0.92 1.53 0.01 1 1.03 152 149 365 317 002 015 PM28 0.87 0.34 0.26 0.92 -1.46 0.01 1.03 1.18 028 029 870 033 011 030 PM29 0.77 0.42 0.45 0.92 -0.7 0.01 0.91 0.84 022 169 024 775 002 009 PM30 0.70 0.46 0.46 0.92 -0.22 0.01 0.93 0.87 015 697 024 240 002 022 PM31 0.77 0.42 0.33 0.92 -0.65 0.01 1.04 1.09 136 767 045 023 001 028 PM32 0.52 0.50 0.51 0.92 0.71 0.01 0.91 0.89 521 201 175 040 036 028 PM33 0.88 0.33 0.36 0.92 -1.52 0.01 0.94 0.77 068 026 087 022 001 008 PM34 0.81 0.39 0.42 0.92 -0.95 0.01 0.92 0.84 036 087 809 044 001 023 PM35 0.54 0.50 0.47 0.92 0.6 0.01 0.95 0.94 172 544 052 194 002 036 PM36 0.64 0.48 0.38 0.92 0.09 0.01 1.05 1.05 640 108 084 117 001 051 PM37 0.47 0.50 0.45 0.92 0.95 0.01 1 0.99 254 095 153 474 002 023 PM38 0.62 0.48 0.25 0.92 0.18 0.01 1.19 1.34 281 035 624 037 002 021 PM39 0.47 0.50 0.50 0.92 0.98 0.01 0.92 0.89 188 187 469 083 001 073 PM40 0.83 0.38 0.41 0.92 -1.1 0.01 0.91 0.84 029 091 829 026 001 025 PM41 0.70 0.46 0.45 0.92 -0.22 0.01 0.95 0.91 697 031 083 158 002 028 PM42 0.36 0.48 0.18 0.92 1.53 0.01 1.3 1.48 087 440 364 053 002 054 PM43 0.30 0.46 0.17 0.92 1.93 0.01 1.3 1.56 032 493 296 142 001 036 PM44 0.37 0.48 0.58 0.92 1.48 0.01 0.8 0.76 384 059 375 155 002 025 PM45 0.54 0.50 0.37 0.92 0.62 0.01 1.08 1.09 095 176 116 540 001 071 PM46 0.42 0.49 0.53 0.92 1.25 0.01 0.88 0.86 146 179 417 199 001 058 PM47 0.46 0.50 0.57 0.92 1.02 0.01 0.83 0.79 094 200 460 164 001 081 PM48 0.57 0.50 0.53 0.92 0.47 0.01 0.88 0.82 083 139 568 127 001 082 PM49 0.57 0.49 0.44 0.92 0.46 0.01 0.99 0.98 026 029 320 572 002 051 PM50 0.55 0.50 0.31 0.92 0.58 0.01 1.15 1.2 112 085 169 547 002 085 PM51 0.28 0.45 0.40 0.92 2.04 0.01 0.97 1.12 439 108 093 278 001 081 PM52 0.64 0.48 0.45 0.92 0.11 0.01 0.96 0.9 086 075 637 118 001 083 PM53 0.40 0.49 0.36 0.92 1.35 0.01 1.08 1.13 092 250 398 154 003 102 PM54 0.63 0.48 0.46 0.92 0.18 0.01 0.96 0.92 625 135 084 039 004 113 PM55 0.41 0.49 0.51 0.92 1.29 0.01 0.9 0.88 305 068 409 124 002 093 PM56 0.43 0.50 0.33 0.92 1.18 0.01 1.14 1.22 427 431 040 025 005 072 PM57 0.60 0.49 0.45 0.92 0.31 0.01 0.97 0.94 139 044 599 106 008 104 PM58 0.56 0.50 0.45 0.92 0.51 0.01 0.98 0.97 123 101 108 562 001 105 PM59 0.18 0.39 0.26 0.92 2.7 0.01 1.13 1.41 172 499 185 045 002 097 PM60 0.26 0.44 0.28 0.92 2.16 0.01 1.14 1.31 460 260 105 061 003 111 505 Vietnam Reading and Mathematics Assessment Study Logit Pupils All Mathematics Items Number Measure Space 5.0 X 4.0 XXX XXXX XXXXX 3.0 XXXXXX XXXXXXX XXXXXXX 59 59 XXXXXXXX XXXXXXXX 2.0 XXXXXXXX 51 60 51 60 XXXXXXXXXXXXXXXXX 43 43 XXXXXXXXX 26 26 XXXXXXXXXXXXXXXXXX 17 18 27 42 44 17 27 42 18 44 XXXXXXXXX 53 55 53 55 XXXXXXXXXXXXXXXXXXX 46 56 46 56 1.0 XXXXXXXXXXXXXXXXXXX 37 39 47 39 37 47 XXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX 23 32 35 45 50 23 32 50 45 35 XXXXXXXXXXXXXXXXXXX 48 49 58 49 48 58 XXXXXXXXXXXXXXXXXXXX 57 57 0.0 XXXXXXXXXXXXXXXXXXX 8 22 36 38 52 54 8 22 38 36 52 54 XXXXXXXXXXXXXXXXXX 21 24 21 24 XXXXXXXXXXXXXXXX 7 14 19 30 41 7 14 19 30 41 XXXXXXXXXXXXXX 25 25 XXXXXXXXXXX 29 31 29 31 XXXXX 5 12 5 12 -1.0 XXXXXXX 20 34 20 34 XXXXX 13 16 40 13 40 16 XXX 9 9 X 1 28 33 1 33 28 X -2.0 X 15 15 3 6 11 3 6 11 10 10 2 2 4 4 -3.0 22 21 17 Figure 2.17 Variable Map of the Pupil Mathematics Test (N = 72666 L = 60) 506 Vietnam Reading and Mathematics Assessment Study Table 2.8: Cognitive skills identified in pupil mathematics test items Item Logit 1 -1.64 Equality and inequality sign, single operation 2 -2.52 Decimal fractions and place value single operation 3 -2.11 Multiple by a single numeral single operation apply rule of long multiplication 4 -2.71 Order of decimal fractions use of place value comparison of units 5 -0.9 Estimation of height of a common object 6 -2.12 Addition of decimal fractions with carrying single operation 7 -0.34 Place value requiring direct interpretation of digit value 8 0.1 Understanding terms for following number verbal prompt dependent 9 -1.3 Grouping using division, single operation, knowledge of equations 10 -2.31 Simple addition with carrying single operation 11 -2.04 Single operation subtraction with carrying 12 -0.86 Finding areas and comparing areas using two operations 13 -1.16 Division single operation, place value, completion of equation, understanding of terms quotient 14 -0.2 Number sentence, equivalent fractions, division 15 -1.85 Analogue clock face recognition of time visual prompt 16 -1.15 Subtraction of decimals in verbal prompt 17 1.54 Percentage fractions from a visual prompt order fraction, convert to percent 18 1.59 Embedded figures square in a an irregular shape 19 -0.22 Conversion of weight measure Kg to g, two operations 20 -0.99 Subtraction with verbal prompt, distracting information 21 -0.14 Number sentence, multiplication and estimation, more than two operations 22 0.1 Understanding calendar months measurement, verbal dependent 23 0.65 Measurement of familiar object with an irregular shape 24 -0.14 BODMAS, three steps, two operations order of operations, common factors and brackets 25 -0.46 Division, understanding quotient, counting, two operations 26 1.76 Calculation of elapsed time in hours, subtract and then add 27 1.53 BODMAS order of operations X, + _ 28 -1.47 Recognition and counting, visual prompt, single operation, understanding of height 29 -0.7 Subtraction of vulgar fractions involving LCD 30 -0.22 Equations, number sentences, division and subtraction two operations 31 -0.65 Conversion of meters to cms. and addition, two operations 32 0.71 Solving simple equations using addition and subtraction than using division with answers 33 -1.52 Multiple by 10, single operation 34 -0.95 Conversion of vulgar fraction to decimal 35 0.6 Visual prompt recognition of embedded figures, trapezium 36 0.09 Volume, spatial prompt, perception of a cube and count 37 0.95 Conversion of mixed area measures to a single metric 38 0.18 Estimation from a visual prompt involving volume 39 0.98 Problem solving using multiplication by a fraction and then subtraction, order of operations 40 -1.1 Division, verbal prompt, place value 41 -0.22 Count, multiple, and convert measures Kg to tonnes, three operations 42 1.53 Problem solving, verbal prompt, using division, subtraction 43 1.93 Recognition of right angles in an irregular shape 44 1.48 Understanding average, calculation using an average, subtraction verbal prompt order of Operations 45 0.62 Recognise shapes, calculate perimeter, add and subtract, visual prompt, unfamiliar Irregular shape 46 1.25 Visual prompt conversion of weight measures, subtraction and addition 47 1.02 Problem solving using addition and conversion of volume units- order of operations important 48 0.47 Visual prompt, formulae, multiplication and subtraction 49 0.46 Visual prompt, addition, multiplication and subtraction using units of currency 50 0.58 Convert fraction to percentage embedded in a verbal prompt 51 2.04 Problem solving involving relationship between length and volume, multiple applications of a single operation. 52 0.11 Estimation from a pie chart, convert area to a number visual prompt 53 1.35 Visual prompt compass directions, map reading, order of sequential steps 54 0.18 Recognition of rotated image from a visual prompt 55 1.29 Sequential addition, counting arithmetic progression with 7 terms 56 1.18 Transformation reflection in an unfamiliar contextual visual prompt 57 0.31 Recognition of a net to box transformation based on a visual prompt 58 0.51 Visual prompt, counting addition and subtraction 59 2.7 Embedded and dependent number pattern 60 2.16 Spatial perception three dimensions, involving rotation 507 Vietnam Reading and Mathematics Assessment Study Figure 2.18: Pupil mathematics test and item difficulty order and skill levels relationship between ability and performance was consistent across provinces (see Figure 2.21). That is the fit analysis supports the proposition that the tests were measuring substantively the same variable in each of he provinces, so that the differences in performance as evidenced by the ability parameter were real differences, and not differences contrived by alternative external factors unrelated to the test. The mean infit value is stable across provinces but the Mathematics Competency Levels Level 1 Reads, writes and compares natural numbers, fractions and decimals. Uses single operations of +,-,x and : on simple whole numbers; works with simple measures such as time; recognises simple 3D shapes. Level 2 Converts fractions with denominator of 10 to decimals. Calculates with whole numbers using one operation (x,-,+ or : ) in a one-step word problem; recognises 2D and 3D shapes. Level 3 Identifies place value; determines the value of a simple number sentence; understands equivalent fractions; adds and subtracts simple fractions; carries out multiple operations in correct order; converts and estimates common and familiar measurement units in solving problems. Level 4 Reads, writes and compares larger numbers; solves problems involving calendars and currency, area and volume; uses charts and tables for estimation; solves inequalities; transformations with 3D figures; knowledge of angles in regular figures; understands simple transformations with 2D and 3D shapes. Level 5 Calculates with multiple and varied operations; recognises rules and patterns in number sequences; calculates the perimeter and area of irregular shapes; measurement of irregular objects; recognised transformed figures after reflection; solves problems with multiple operations involving measurement units, percentage and averages. Level 6 Problem solving with periods of time, length, area and volume; embedded and dependent number patterns; develops formulae; recognises 3D figures after rotation and reflection and embedded figures and right angles in irregular shapes; use data from graphs and tables Figure 2.19: Mathematics Competency levels identified in the Pupil Mathematics Test 508 Vietnam Reading and Mathematics Assessment Study Proposed Derived Level 1 The tasks at this level involved the linking of patterns or shapes to Reads, writes and compares natural; numbers, fractions and simple digits. This is the easiest level of development and is likely decimals. Uses single operations of +,-,x and on simple whole to underpin all others. numbers; works with simple measures such as time; recognises simple 3D shapes. Level 2 Recognise and name basic shapes and units of measurement as Converts fractions with denominator of 10 to decimals. Calculates well as undertaking single operations using up to two digit numbers. with whole numbers using one operation (x,-,+ or ) in a one-step word problem; recognises 2D and 3D shapes. Level 3 Items should assess all the skills in the previous levels and Identifies place value; determines the value of a simple number recognise simple fractions in both numerical and graphic form. sentence; understands equivalent fractions; adds and subtracts Identification of data in tabular form and basic calculations simple fractions; carries out multiple operations in correct order; associated with simple measurement units. Basic understanding of converts and estimates common and familiar measurement units in numeration with simple computations would be expected. solving problems. Level 4 Extend and complete number patterns, translate shapes and Reads, writes and compares larger numbers; solves problems patterns and to convert measurement units when making simple involving calendars and currency, area and volume; uses charts and one step calculations. tables for estimation; solves inequalities; transformations with 3D figures; knowledge of angles in regular figures; understands simple transformations with 2D and 3D shapes. Level 5 Combining operations in order to link information from tables and Calculates with multiple and varied operations; recognises rules and charts in performing calculations. This also applies to measurement patterns in number sequences; calculates the perimeter and area of units. Two and three step problems should be set where the first irregular shapes; measurement of irregular objects; recognised step may be the identification of appropriate information to use in transformed figures after reflection; solves problems with multiple subsequent steps of the computation. operations involving measurement units, percentage and averages. Level 6 The tasks had an emphasis on data interpretation and computation Problem solving with periods of time, length, area and volume; linking data from tables and graphic displays in order to undertake embedded and dependent number patterns; develops formulae; computations involving several steps and a mix of operations. recognises 3D figures after rotation and reflection and embedded figures and right angles in irregular shapes, data from graphs and tables; Figure 2.20: Comparison of the proposed and obtained skill levels for the Pupil Mathematics Test Figure 2.21: Relationship between fit and score on the mathematics test across provinces 509 Vietnam Reading and Mathematics Assessment Study mean ability measure varies considerable. This indicated that the Mathematics test was measuring the same construct across provinces and that the differences in achievement were real differences explained by background actors detailed in Volume 1 of this report and not by an unfair test. Once the tests of reading and mathematics had been calibrated, there was an additional question about the relative difficulty of the two tests. The differences in the benchmark achievements could have been due to the math test being easier or perhaps to the reading panel having much higher expectations than the mathematics panel members. One way of addressing this question was to combine all items and jointly calibrate them using common person equating. This assumed that the items were measuring the same underlying variable. If the items all could be shown to fit the model, this comparison could be regarded as appropriate. It would also mean that the items may be providing a general measure of achievement. In Figure 2.22 below the distribution of pupils and items has been shown when the two tests were jointly calibrated. The analysis showed that the mathematics test had a slightly wider range of difficulty than the reading items. The parallel distributions have been shown in Figure 2.22. When the total pool of 116 items was mapped onto the same scale, and as shown in Figure 2.20, both INFIT and OUTFIT indices fell within an acceptable range. In Table 2.9 the item response details from the calibration of the total item pool have been shown. Items 1 to 60 are the mathematics test items and items 61 to 120 are the reading test items. From this analysis, it is evident that the mathematics test contains more easy items compared with the reading test, but the mathematics test is also slightly harder overall than the reading test and has a wider spread of difficulty. The differences are not large and the two tests could be regarded as identical in terms of difficulty and width. Certainly the differences in the tests could not explain the differences in expectations reported in Chapter 2 of Volume 2. It seems that the differences in benchmark performances are most likely attributable to lower expectations in mathematics than in reading. In a sense, this is understandable given the importance of the language in the culture of Vietnam. 510 Vietnam Reading and Mathematics Assessment Study Combined Reading and Mathematics tests Pupil Ability Mathematics Items Reading items X 4.0 X X X XX 3.0 XX XXXXX XXXXXX 59 XXXXXX XXXXXXXXX 2.0 XXXXXXX 51 60 114 118 XXXXXXXXXXX 26 43 86 XXXXXXXXXXXXXXXX 17 18 27 42 81 XXXXXXXXXXXX 44 53 104 108 XXXXXXXXXXXXXXXX 46 55 56 93 94 1.0 XXXXXXXXXXXXXXXX 37 39 47 63 115 XXXXXXXXXXXXXXXXXXXX 32 90 117 120 XXXXXXXXXXXXXXXX 23 35 45 48 50 58 82 103 XXXXXXXXXXXXXXXXXXX 49 57 69 74 105 111 XXXXXXXXXXXXXX 8 22 36 38 52 54 61 73 102 110 113 119 0.0 XXXXXXXXXXXXXXXX 21 24 64 70 77 107 109 XXXXXXXXXXX 7 14 19 30 41 68 87 XXXXXXXXXXXX 25 72 83 85 89 XXXXXXXX 29 31 91 XXXXXXX 5 12 20 34 65 67 71 75 116 -1.0 XXXX 13 16 40 78 79 80 88 97 XXX 9 62 84 106 X 28 33 66 96 112 X 1 76 15 92 95 -2.0 3 6 11 10 2 4 -3.0 Mean 0.89 (1.12) Mean 0.05 (1.29) Mean -0.05 (1.03) Figure 2.22 Comparative distributions of reading and math items. 511 Vietnam Reading and Mathematics Assessment Study Table 2.9: Item concurrent calibration of the mathematics and reading items Item % Correct Logit INFIT Outfit Source 1 88.63 -1.57 1 1.01 2 94.63 -2.45 0.98 1.09 3 92.35 -2.05 0.97 1.07 4 95.50 -2.64 1.02 1.33 5 80.34 -0.84 1 0.95 6 92.38 -2.05 1.01 1.25 7 71.74 -0.28 0.9 0.82 8 64.01 0.15 1.03 1.03 9 85.28 -1.24 0.92 0.78 10 93.54 -2.24 1.04 1.52 11 91.89 -1.98 1 1.13 12 79.74 -0.8 0.92 0.82 13 83.58 -1.09 1.07 1.33 14 69.34 -0.14 1 0.96 15 90.47 -1.78 1 1.1 16 83.53 -1.09 1.01 1.14 17 36.40 1.56 0.9 0.92 18 35.36 1.62 1.07 1.11 19 69.74 -0.16 0.96 0.93 20 81.52 -0.93 1 1.11 21 68.23 -0.08 0.99 0.99 22 63.97 0.15 1.02 1.01 23 53.42 0.69 1.13 1.22 24 68.29 -0.08 1.04 1.02 25 73.74 -0.4 0.93 0.89 26 32.52 1.77 0.93 0.95 27 36.53 1.55 1.02 1.05 28 86.98 -1.4 1.03 1.16 29 77.45 -0.64 0.91 0.83 30 69.67 -0.16 0.95 0.88 31 76.68 -0.58 1.04 1.06 32 52.09 0.76 0.92 0.9 33 87.57 -1.46 0.94 0.76 34 80.94 -0.88 0.93 0.85 35 54.38 0.64 0.97 0.96 36 64.04 0.15 1.05 1.05 37 47.43 0.99 1 0.99 38 62.44 0.23 1.2 1.33 39 46.85 1.02 0.94 0.93 40 82.88 -1.03 0.93 0.86 41 69.71 -0.16 0.96 0.92 42 36.45 1.56 1.28 1.44 43 29.58 1.94 1.26 1.5 44 37.47 1.5 0.82 0.8 45 54.02 0.66 1.09 1.1 46 41.69 1.28 0.9 0.89 47 46.03 1.06 0.86 0.83 48 56.83 0.52 0.9 0.86 49 57.17 0.5 0.99 0.98 50 54.68 0.63 1.15 1.19 51 27.77 2.05 0.98 1.16 52 63.70 0.17 0.97 0.92 53 39.76 1.38 1.08 1.14 54 62.52 0.23 0.97 0.93 55 40.87 1.32 0.92 0.91 56 43.05 1.21 1.11 1.18 57 59.91 0.37 0.98 0.94 58 56.18 0.55 1 0.99 59 18.48 2.69 1.11 1.4 60 26.01 2.16 1.12 1.31 mean 0.05 1.00 1.04 var 1.65 0.01 0.03 512 Vietnam Reading and Mathematics Assessment Study Item % correct logit INFIT Outfit Source 61 62.58 0.23 1.01 0.98 62 84.27 -1.15 1.04 1.28 63 44.64 1.13 0.92 0.9 64 66.98 -0.01 1.17 1.22 65 81.45 -0.92 1 0.99 66 87.23 -1.42 0.94 0.87 67 79.13 -0.75 1.03 1.03 68 69.89 -0.17 0.95 0.93 69 58.13 0.46 0.91 0.87 70 68.21 -0.08 1.05 1.09 71 80.14 -0.82 0.98 0.97 72 74.22 -0.43 0.99 0.98 73 63.37 0.19 0.96 0.93 74 59.74 0.37 0.95 0.91 75 78.83 -0.73 0.99 1.02 76 89.39 -1.65 0.87 0.66 77 67.18 -0.02 1.09 1.14 78 82.07 -0.97 0.95 0.88 79 82.25 -0.98 0.98 0.97 80 83.54 -1.09 0.95 0.88 81 34.61 1.66 0.98 0.97 82 56.68 0.53 1.1 1.14 83 73.78 -0.4 1.05 1.08 84 85.32 -1.24 0.94 0.84 85 73.25 -0.37 1.11 1.23 86 29.41 1.95 0.98 0.96 87 72.02 -0.3 0.89 0.85 88 83.31 -1.07 0.9 0.79 89 75.44 -0.5 0.96 0.92 90 49.70 0.88 0.9 0.88 91 76.98 -0.61 0.96 1.02 92 90.28 -1.76 0.89 0.66 93 42.72 1.23 1.18 1.27 94 42.20 1.26 1.13 1.22 95 90.30 -1.76 0.95 0.91 96 87.22 -1.42 0.94 0.82 97 84.12 -1.14 0.87 0.73 102 63.87 0.16 0.88 0.83 103 55.08 0.61 0.94 0.92 104 38.65 1.44 1.1 1.17 105 57.06 0.51 0.95 0.93 106 85.19 -1.23 0.92 0.84 107 66.01 0.04 0.92 0.88 108 36.74 1.54 1.03 1.07 109 67.04 -0.01 1.03 1.11 110 61.36 0.29 0.94 0.9 111 57.40 0.49 0.99 1 112 86.88 -1.39 0.86 0.65 113 61.97 0.26 1.03 1.07 114 25.91 2.17 1.01 1.05 115 48.53 0.94 1.09 1.11 116 79.68 -0.79 0.96 0.95 117 50.27 0.85 1.04 1.04 118 29.14 1.97 1.03 1.02 119 62.40 0.24 1.07 1.11 120 49.76 0.88 1.06 1.08 mean read -0.05 0.99 0.97 var read 1.06 0.01 0.02 513 Vietnam Reading and Mathematics Assessment Study References Adams, R. J. and S. T. Khoo (1995). Quest Interactive Item Analysis Software. Melbourne, ACER. Andrich, D. (2002). RUMM: A Rasch model Analysis Program. Perth, University of Western Australia. Angoff (1971). Scales, norms and equivalent scores. Educational Measurement. R.L.Thorndike. Washington, American Council on Education.: 508-600. Berliner, D. (2001). "Learning about learning from expert teachers." International Journal of educational Research 35: 433-434. Brock, P. (2000). Standards of professional practice for accomplished teaching in Australian classrooms. Canberra, Australian College of Education. Brock, P. (2002). "Towards establishing and implementing a standards framework in the NSW quality teaching awards process: A personal perspective." Unicorn 28(1): 10-15. Glaser, R. (1963). "Instructional Technology and the measurement of learning outcomes: some questions." American Psychologist 18: 519- 521. Glaser, R. (1981). "The future of testing: A research agenda for cognitive psychology and psychometrics." American Psychologist 36: 923-936. Glaser, R. (1990). Expertise. The Blackwell dictionary of cognitive psychology. M. W. Eysenk, A. N. Ellis, E. Hunt and P. Johnson-Laird. Oxford, England:, Blackwell Reference. Greaney, V., S. R. Khandker, et al. (1999). Bangladesh: Assessing Basic Skills. Dhaka, University Pres Ltd. Griffin, P. and Forwood (1990) Griffin, P. (1998). Vietnamese National Study of Pupil Achievement in Mathematics and Vietnamese. Hanoi, National Institute for Education and Science. Griffin, P. (2001). Performance Assessment of Higher Order Competencies. Annual Conference of the American Education Association, Seattle. Griffin, P., Smith, P. and Ridge, N. (2001). The Literacy Profiles in Practice: An Assessment Approach. Portsmouth, Heinemann. Hambleton, R. K. and Swaminathan H. (1979). Item Response Theory. Principles and Applications. Boston, Mass. Kluwer -Nijhoff. Lord F M (1980) Applications of Item Response Theory to Practical Testing Problems, , Hillsdale, N.J.: Lawrence Erlbaum 514 Vietnam Reading and Mathematics Assessment Study Masters, G. (1982). "A Rasch model for partial credit scoring." Psychometrica 47: 149-174. Messick S M, Beaton A E, Lord F M 1983 NAEP Reconsidered: A New Design for a New Era. Educational Testing Service, Princeton, New Jersey Wilson M 1991 Unobserved categories. Rasch Measurement 5(1): 128 Wilson, M. (1999). Measurement of Developmental Levels. Advances in Measurement in Educational Research and Assessment. G. N. Masters and J. P. Keeves. New York: Pergamon. Wright, B. and G. Masters (1982). Rating Scale Analysis. Chicago, MESA Press. Wright, B. and M. Stone (1979). Best Test Design. Chicago, MESA Press. Wu, M., R. J. Adams, et al. (1998). ConQuest: generalised Item Response Modelling Software. Melbourne: ACER. 515 Vietnam Reading and Mathematics Assessment Study Appendix 2.1: Normal scoring and making for Vietnamese primary exams. There are two types of tests: regular tests (R) and periodic tests (P). There are at least four regular (Ri) tests (i=Sep,¼, May) for each semester. There is one test for reading, one for dictation, one for vocabulary and syntax, and one for essay writing. These tests can be in the form of 15-minute written tests, oral tests, or practice exercises. In Grade 1, there are two periodic tests in the second semester of the school Grade. Pupils from Grades 2 to 5 are required to do two periodic tests each school semester (Pm where m=1,2,3,4). Each periodic test consists of a reading component and a writing component. The writing component at Grades 2 and 3 includes a dictation and a 25-minute essay. At Grades 4 and 5, pupils do a dictation, one or two exercises in vocabulary and syntax, and a 40-minute essay. The result of each periodic test is calculated by averaging the marks for the reading and the writing components. Any decimal is rounded up to the nearest whole mark if the writing component is marked higher than the reading component, or rounded down if the reading component is marked higher. The marking method is analytical, with specific scores assigned for each sub- component. In the teaching and assessment of reading, pupils are marked for their reading speed and their answers to comprehension questions. Pupils are marked on a 1-10 scale with one being the lowest and ten being the highest and no decimal point is used in the mark. Five marks are assigned for the essay, three marks are for the dictation and two marks for the exercises. One mark is deducted for every three spelling mistakes in the dictation, though identical spelling mistakes are counted only once. For essays, the introduction is awarded one mark, the conclusion one mark, and the body of the text three marks. A pupil's overall achievement in the subject is calculated on four separate occasions each school Grade according to the following formulae: A pupil's overall achievement in the subject is calculated 4 times in a school Grade according to the following formulae: A student's overall achievement in the subject is calculated four times in a school year according to the following formulae: 1. Achievement as at mid-semester 1, (S1m), are based on September and October Assessments Oct S1m ( Ri P1)/(nr 2) i Sep 516 Vietnam Reading and Mathematics Assessment Study where nr is the number of regular assessments for the first semester Dec S1m ( Ri P2)/(nr 2) i nov 2. Achievement as at the end of Semester 1 (S1e): Dec S1e ( Rj 0.5Rjan P2x2 )/(nr 2) j Sep 3. Overall Assessment for Semester 1 S1. (S1 m (S1 x2))/3 e 4. Achievement as at mid-semester 2- based on January and February Assessments (S2m): Feb S2m ( Ri P3x2)/(nr 2) i Jan 4. Achievement as at the end of Semester 2 (AES2): Apr S2e ( Rj 0.5Rmay P4x2 )/(nr 2) j Mar 5. Achievement for the whole of Semester 2 (AS2) is as follows: S2. (S2 m (S2 x2))/3 e 6. Achievement for the whole school year (AY) is: S.. (S1 . (S2 x2))/3 . This conflates to a single annual scoring formula that weights performances at the end of the Grade much more heavily than the performances at the start of the Grade. This assumed that performances at the end of the Grade were more demanding than performances at the beginning. While this is most likely true the Rasch procedure adopted in this project weights automatically for the difficulty or the amount of demand that each sub task places on the pupil and the estimates the ability of the pupil from the tasks that they perform and the quality shown in each performance. 1 Oct Feb Apr S.. Rr 2 Rr 4 Rr RJan 2Rmay 2P1 4(P2 P3) 8P4 9(nr 2) r Sep r Nov r Mar 517 Vietnam Reading and Mathematics Assessment Study Each score (Rr and Pj) range is limited to values between 0 and 10 and the final score is converted to a score within this range and then converted to a grade as shown in Table A2. Table A2: Score Conversion to Pupil Classifications Classification Grade Score Range Excellent A 9.0 - 10 Good B 7.0 - 8.9 Average C 5.0-6.9 Weak D 4.9 or below 518 Vietnam Reading and Mathematics Assessment Study Chapter 3 THE DEVELOPMENT OF THE TEACHER TESTS, EQUATING PUPIL AND TEACHER RESULTS, AND INTERNATIONAL BENCHMARKING Introduction I n Chapter 2, the development and calibration of the pupil achievement tests were described. In this chapter the following have been described: 1. The Teacher tests 2. Equating the pupil and teacher results 3. International benchmarking The Teacher Tests Two tests were developed for teachers. Like the pupil tests, there was a reading and a mathematics test for teachers. In these tests, the main concern was that they enabled variation in teacher mathematics and reading skills to be demonstrated. Trials of the tests in the Than Hoa province showed that the tests were very easy for the teachers. As a result, some additional items were added to increase the difficulty, but the tests remained relatively easy. This was not considered to be a concern as the process was unlikely to cause controversy if the teachers were not threatened by the tests. Designing tests for teachers, however, did present a dilemma. While the project team was content to allow the tests to be relatively easy, the ease of the tests would be likely to reduce variance and hence lower correlations with pupil 1 This chapter was written by Patrick Griffin, Assessment Research Centre, Faculty of Education, University of Melbourne. 519 Vietnam Reading and Mathematics Assessment Study performances. Each of the tests contained 45 items. Five items in the mathematics test were common with the pupil mathematics test and 10 items were common in the reading test. In Table 3.1 the calibration details for the test have been described and the variable map for the teacher reading test has been given in Figure 3.1. In the table the item total correlation (R-tot), the reliability of the test if the item is omitted (A-omit), the proportion correct (p), the difficulty estimate (logit) and its standard error (SEM), the infit and out fit for each item, the percentages chosen for each alternative for each item, the percentage omitting the item, and proportion making multiple responses (MR) to the item, and the proportion not responding to the item at all (omit) have all been presented. The latter indices often indicate a confusion with an item. Omission behaviour is low and tends to be constant throughout the test. The multiple response index does not appear to reach any value that identifies any specific item as confusing. Calibrating the Teacher Reading Test. The Variable map for Teacher Reading From the variable map it can be seen that the distribution of teacher achievement was far higher than the distribution of item difficulty. This demonstrated that the reading test was easy for the teachers. For example, the item at average difficulty on the test (item 25) was at the lowest ability level of the teachers. Almost all of the teachers in the sample have abilities higher than the difficulty of the average item on the test. There were still a considerable number of items within the teachers' range of ability and a skills audit of these items helped to identify the higher levels of competency demonstrated by both teachers and pupils when the tests were concurrently equated. The test was very easy. This could be seen from the fact that the mean item difficulty was 0.83. Many items showed a proportion correct of more than 0.95. Similar analyses were conducted for the teacher mathematics test. In Table 3.2 the calibration results for the teacher mathematics test have been presented. The item total correlation (R-tot), the reliability of the test if the item is omitted (A-omit), the proportion correct (p), the difficulty estimate (logit) and its standard error (SEM), the infit and outfit for each item, the percentages chosen for each alternative for each item, the percentage omitting the item, and proportion making multiple responses (MR) to the item and the proportion not responding to the item at all (omit) have all been presented. The latter indices often indicate 520 Vietnam Reading and Mathematics Assessment Study Table 3.1: The Item Calibration Reading Teacher r-tot A omit Mean sd p logit sem infit outfit a b c d mr omit TCHRD01 0.2453 0.7623 0.9437 0.2305 0.94 -0.95 0.05 0.99 0.73 2.3 94.4 0.5 2.6 0.2 0.1 TCHRD02 0.1956 0.7634 0.8638 0.3431 0.86 0.09 0.04 1.04 1 0.3 12.6 86.4 0.3 0.2 0.2 TCHRD03 0.2722 0.7604 0.8532 0.354 0.85 0.19 0.03 0.99 0.87 2.8 3.6 6.6 85.3 1.4 0.2 TCHRD04 0.2516 0.7613 0.8671 0.3395 0.86 0.06 0.04 1 0.88 11.4 0.6 86.7 0.8 0.2 0.2 TCHRD05 0.1369 0.7651 0.9649 0.1841 0.96 -1.46 0.07 1.04 0.95 1.8 0.8 0.5 96.5 0.1 0.2 TCHRD06 0.1756 0.7649 0.9868 0.1143 0.98 -2.5 0.11 0.99 0.65 98.7 0.7 0.1 0.4 0 0.1 TCHRD07 0.2207 0.764 0.9811 0.1363 0.98 -2.12 0.09 0.98 0.64 0.7 0.3 98.1 0.7 0 0.2 TCHRD08 0.1596 0.7651 0.9868 0.1143 0.98 -2.5 0.11 0.99 0.76 0.3 0.4 0.4 98.7 0.1 0.1 TCHRD09 0.0918 0.7687 0.7671 0.4227 0.76 0.81 0.03 1.14 1.22 0.5 76.7 21.6 0.6 0.4 0.2 TCHRD10 0.1585 0.7645 0.9452 0.2275 0.94 -0.98 0.05 1.03 1.09 1 0.6 3.1 94.5 0.1 0.7 TCHRD11 0.3021 0.7597 0.893 0.3091 0.89 -0.2 0.04 0.96 0.83 89.3 1.4 1.1 7.5 0 0.6 TCHRD12 0.3042 0.7587 0.6792 0.4668 0.68 1.31 0.03 0.97 0.95 3.2 15.3 12.1 67.9 0.2 1.3 TCHRD13 0.1932 0.7645 0.9822 0.1324 0.98 -2.18 0.09 0.99 0.72 0.1 1 98.2 0.6 0.1 0.1 TCHRD14 0.1991 0.7636 0.9537 0.21 0.95 -1.16 0.06 1.01 0.83 3.1 95.4 0.2 0.4 0.8 0.1 TCHRD15 0.2595 0.7609 0.7295 0.4443 0.73 1.04 0.03 1.01 0.98 8.2 1 17 72.9 0.6 0.3 TCHRD16 0.1955 0.7637 0.8115 0.3911 0.81 0.52 0.03 1.05 1.04 16 81.2 1.4 0.8 0.5 0.2 TCHRD17 0.3194 0.7581 0.7788 0.4151 0.78 0.74 0.03 0.96 0.87 18.7 1 1.5 77.9 0 0.9 TCHRD18 0.1965 0.7635 0.8402 0.3664 0.84 0.3 0.03 1.04 1.07 2.9 84 1.4 10.4 0.7 0.7 TCHRD19 0.1925 0.7648 0.6053 0.4888 0.60 1.68 0.03 1.06 1.09 1.7 60.5 3.3 33.5 0.4 0.5 TCHRD20 0.2818 0.7599 0.796 0.403 0.79 0.63 0.03 0.99 0.91 3.2 0.4 16 79.6 0 0.7 TCHRD21 0.3406 0.7568 0.7157 0.4511 0.71 1.12 0.03 0.94 0.88 25.6 0.7 0.5 71.6 0.6 1.1 TCHRD22 0.3758 0.7552 0.7499 0.4331 0.75 0.92 0.03 0.91 0.82 9.7 75 113.1 1 0.1 1.1 TCHRD23 0.3463 0.7567 0.7627 0.4254 0.76 0.84 0.03 0.94 0.86 76.3 0.8 13.7 7.9 0.1 1.2 TCHRD24 0.2834 0.7598 0.6638 0.4724 0.66 1.39 0.03 0.99 0.97 18.4 6.2 66.4 6.4 0.1 2.5 TCHRD25 0.3007 0.76 0.9083 0.2886 0.90 -0.39 0.04 0.94 0.88 2.5 90.8 1.7 2.8 0 2.2 TCHRD26 0.2081 0.7635 0.7235 0.4473 0.72 1.07 0.03 1.05 1.06 2 2.4 72.3 21.3 0 1.9 TCHRD27 0.3136 0.7614 0.9631 0.1886 0.96 -1.41 0.07 0.91 0.68 96.3 0 1.4 0.8 0.2 1.3 TCHRD28 0.2542 0.7611 0.782 0.4129 0.78 0.72 0.03 1.01 0.98 78.2 13.3 5.7 1 0.2 1.6 TCHRD29 0.2908 0.763 0.9815 0.1349 0.98 -2.14 0.09 0.91 0.52 0.8 98.1 0.2 0.2 0.1 0.5 TCHRD30 0.2786 0.7625 0.9713 0.167 0.97 -1.68 0.07 0.93 0.71 1.6 97.1 0.2 0.1 0.1 0.8 TCHRD31 0.317 0.7603 0.9386 0.2401 0.93 -0.85 0.05 0.92 0.81 1.9 1.8 93.9 1.2 0.6 0.7 TCHRD32 0.3582 0.7589 0.931 0.2534 0.93 -0.72 0.05 0.9 0.67 2 0.5 3.3 93.1 0 1.1 TCHRD33 0.2521 0.7613 0.8716 0.3346 0.87 0.02 0.04 0.99 1.02 87.2 9 0.1 2.8 0 0.8 TCHRD34 0.2875 0.7599 0.8619 0.345 0.86 0.11 0.04 0.97 0.9 86.2 2.6 5.7 4.1 0.3 1 TCHRD35 0.1878 0.7652 0.4196 0.4935 0.42 2.54 0.03 1.06 1.08 28.5 20.2 7.3 42 0.1 2 TCHRD36 0.229 0.7629 0.3948 0.4888 0.39 2.66 0.03 1.01 1.05 52 39.5 1.1 5.9 0.2 1.3 TCHRD37 0.1293 0.7659 0.8646 0.3422 0.86 0.09 0.04 1.07 1.48 5.4 6.2 0.3 86.5 0.1 1.5 TCHRD38 0.3756 0.7614 0.9783 0.1458 0.97 -1.97 0.08 0.86 0.42 97.8 0.9 0.1 0.1 0 1.1 TCHRD39 0.1478 0.7657 0.8167 0.387 0.81 0.48 0.03 1.08 1.22 81.7 1.5 0.5 14.7 0.3 1.3 TCHRD40 0.3377 0.7597 0.9384 0.2404 0.93 -0.84 0.05 0.91 0.73 93.8 0.9 0.9 2.8 0 1.6 TCHRD41 0.3193 0.7608 0.9544 0.2085 0.95 -1.18 0.06 0.91 0.72 0.4 95.4 0.5 2.2 0 1.5 TCHRD42 0.0021 0.7752 0.4539 0.4979 0.45 2.38 0.03 1.25 1.34 22 45.4 5.6 24.4 1 1.5 TCHRD43 0.3678 0.7567 0.8569 0.3502 0.85 0.16 0.04 0.92 0.78 4.8 85.7 1.1 6.2 0.3 2 TCHRD44 0.1953 0.7643 0.6918 0.4618 0.69 1.24 0.03 1.07 1.11 3.6 1.2 23.6 69.2 0.5 1.9 TCHRD45 0.2461 0.762 0.5102 0.4999 0.51 2.12 0.03 1.01 1.01 9.1 51 6.6 31 0.1 2.2 0.77 36.77 confusion associated with an item. Omission behaviour tends to increase towards the end of the test but the multiple response index did not appear to reach any value that identifies any specific item as confusing. Teachers were also assessed in reading and mathematics. The teacher and pupil tests used different sets of items and the teacher test was designed to be more difficult than the pupil test, but a sub set of items was developed that were common to both teacher and pupil tests. These common items enabled all the items as well as the pupils and teachers to be linked and mapped onto a common scale even though completely different groups of people (teachers 521 Vietnam Reading and Mathematics Assessment Study 5.0 XXXX XXXXXXXXXX 4.0 XXXXXXXXXXXXXX XXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXX 36 3.0 35 XXXXXXXXXXXXXXXXXXX 42 XXXXXXXXXXXXXXXXXXXX 45 XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXX XXXXXXXXXXXXX 19 2.0 XXXXXXXXXXX XXXXXXXXX 24 XXXXXXX 12 44 XXXXX 15 21 26 XXXX 22 XXX 9 17 23 1.0 XXX 3 20 28 XXX 16 39 X 18 X 43 32 34 37 X 1 11 0.0 25 4 33 31 40 10 14 30 41 -1.0 5 27 2 29 38 7 13 -2.0 6 8 Figure 3.1: Variable map of the teacher reading test 522 Vietnam Reading and Mathematics Assessment Study Table 3.2: Teacher Math Test Items R-tot A omit p Logit sem Infit outfit a b c d mr omit TCHMA01 0.1553 0.8521 0.99 -2.65 0.12 1.03 1.29 0.10 0.3 0.1 98.9 0 0.6 TCHMA02 0.1423 0.8522 0.99 -3.01 0.14 1.02 1.10 0.00 0.4 0.1 99.1 0 0.4 TCHMA03 0.2564 0.8506 0.91 -0.39 0.05 1.07 1.03 1.80 3.1 2 91.6 0 1.5 TCHMA04 0.1755 0.8519 0.98 -2.45 0.11 1.02 0.89 98.6 0.2 0.1 0.2 0.5 0.4 TCHMA05 0.1632 0.8521 0.99 -2.69 0.12 1.01 1.38 0.40 0.1 98.9 0.2 0 0.4 TCHMA06 0.2223 0.8511 0.94 -0.90 0.06 1.07 0.97 1.40 1.4 94.5 2 0 0.7 TCHMA07 0.2732 0.8504 0.94 -0.76 0.05 1.03 0.93 2.50 1 93.8 2.1 0 0.6 TCHMA08 0.3839 0.8478 0.73 1.26 0.03 0.98 0.91 12.1 5.2 72.8 8.3 0 1.5 TCHMA09 0.3166 0.85 0.95 -1.18 0.06 0.97 0.70 2.20 0.4 0.3 95.6 0.2 1.3 TCHMA10 0.2638 0.8508 0.79 0.83 0.03 1.11 1.15 10.3 1.2 7.1 79.3 0 2.2 TCHMA11 0.2059 0.8516 0.98 -1.97 0.09 1.00 1.11 0.10 97.8 0.6 0.5 0 1 TCHMA12 0.2589 0.8511 0.98 -2.00 0.09 0.98 0.80 0.10 1.3 97.9 0.2 0 0.5 TCHMA13 0.1739 0.852 0.99 -2.75 0.12 1.00 1.15 0.20 0 98.9 0.2 0 0.7 TCHMA14 0.2346 0.8512 0.97 -1.75 0.08 1.00 0.93 0.60 1.5 97.4 0.1 0 0.3 TCHMA15 0.403 0.848 0.91 -0.32 0.05 0.93 0.74 2.80 91.1 4.3 0.7 0.1 1 TCHMA16 0.3852 0.8478 0.75 1.10 0.03 0.98 0.92 6.50 75.3 6.1 10.2 0.2 1.8 TCHMA17 0.2466 0.8522 0.57 2.11 0.03 1.11 1.15 6.10 57.3 15.5 16.3 0 4.7 TCHMA18 0.2868 0.85 0.89 -0.04 0.04 1.05 1.07 2.20 88.9 2.9 3.7 1.5 0.8 TCHMA19 0.3427 0.8491 0.65 1.68 0.03 1.01 0.97 27.7 65.5 2.1 3.7 0.1 0.9 TCHMA20 0.3697 0.8485 0.90 -0.18 0.04 0.96 0.88 3.00 90.1 1.7 1.7 0 3.5 TCHMA21 0.185 0.8517 0.97 -1.54 0.07 1.06 1.31 0.80 0.8 96.8 1.2 0 0.4 TCHMA22 0.3904 0.8476 0.76 1.01 0.03 0.98 0.89 18.4 76.7 2.8 0.8 0 1.3 TCHMA23 0.3422 0.8489 0.77 0.99 0.03 1.03 1.02 77 2.9 4.8 11 0.2 4.3 TCHMA24 0.2809 0.8502 0.92 -0.47 0.05 1.04 0.94 2.4 92.1 0.2 4.3 0 0.9 TCHMA25 0.3388 0.849 0.77 0.95 0.03 1.03 0.99 13 0.8 77.6 7.4 0.2 1.1 TCHMA26 0.4325 0.8465 0.78 0.93 0.03 0.93 0.80 2.6 12.6 77.8 2.7 0 4.3 TCHMA27 0.3129 0.8495 0.81 0.68 0.03 1.05 1.04 81.3 1.2 15.3 0.6 0 1.7 TCHMA28 0.3955 0.8478 0.87 0.14 0.04 0.95 0.81 7.1 87.2 1.4 1 0 3.2 TCHMA29 0.3869 0.8486 0.93 -0.73 0.05 0.92 0.68 1.4 93.6 0.5 2.8 0 1.7 TCHMA30 0.3134 0.8503 0.96 -1.49 0.07 0.93 0.84 0.3 0.3 1 96.7 0.5 1.3 TCHMA31 0.4677 0.8456 0.77 0.94 0.03 0.90 0.81 0.5 10.2 77.7 8.6 0.1 3 TCHMA32 -0.0546 0.858 0.68 1.61 0.03 1.39 2.27 3.8 17.8 3.8 66.5 0.3 7.7 TCHMA33 0.3753 0.8501 0.98 -1.99 0.09 0.88 0.34 97.9 0.2 0.4 0.3 0.1 1.3 TCHMA34 0.1472 0.8548 0.65 1.70 0.03 1.25 1.31 30.1 65.2 1.5 1.8 0.1 1.4 TCHMA35 0.299 0.8504 0.64 1.72 0.03 1.09 1.08 2.8 24.5 64.7 4.8 0 3.2 TCHMA36 0.5037 0.8453 0.85 0.30 0.04 0.85 0.64 3.9 2.4 85.7 2.2 0 5.8 TCHMA37 0.5023 0.8444 0.69 1.45 0.03 0.85 0.76 1.2 14.3 69.6 9.5 0 5.4 TCHMA38 0.464 0.8462 0.85 0.32 0.04 0.89 0.71 6.2 85.5 0.6 4.4 0 3.3 TCHMA39 0.3918 0.8477 0.63 1.79 0.03 0.96 0.92 1.2 31.7 63.5 0.7 0 3 TCHMA40 0.3354 0.849 0.87 0.14 0.04 1.00 1.02 1.7 2.8 5.4 87.3 0 2.8 TCHMA41 0.4809 0.8457 0.84 0.42 0.04 0.88 0.7 1.9 5.8 84.4 2.3 0 5.6 TCHMA42 0.4195 0.8469 0.80 0.76 0.03 0.94 0.89 5.2 1.6 80.1 5 0 8.1 TCHMA43 0.2862 0.8505 0.73 1.25 0.03 1.10 1.15 1 1 72.9 17.5 1.8 6.7 TCHMA44 0.494 0.8451 0.81 0.69 0.03 0.87 0.71 2.9 6 81.2 3.7 0 6.3 TCHMA45 0.4528 0.8458 0.62 1.83 0.03 0.89 0.87 122.3 62.7 12.3 7.8 0.1 4.8 0.85 37.25 0.99 0.97 0.13 2.52 and pupils) completed the different tests. Each of the teacher and pupil tests were linked within mathematics and reading using a set of common items. The common linking items have been presented in Table 3.3. The codes in parenthesis are the sequence numbers of the items in the data file and the item identification in the variable maps reported in this chapter. 523 Vietnam Reading and Mathematics Assessment Study 14 12 10 8 Series1 6 4 2 0 >95 >90 >85 >80 >75 >70 >65 >60 <59 Figure 3.2: Distribution of item difficulty Table 3.3: Common linking items between mathematics and reading tests Mathematics Reading Pupil Teacher Pupil Teacher 45 (45) 3 1 (61) 1 55 (55) 7 2 (62) 2 48 (48) 12 3 (63) 3 56 (56) 34 4 (64) 4 58 (58) 40 5 (65) 5 6 (66) 29 7 (67) 30 8 (68) 31 9 (69) 32 10 (70) 33 Equating pupils and teachers Mapping all the items and people when they have not all completed the same test onto the same scale is called equating. In this case it enabled the scores on the different tests to be matched and interpreted on the same scale by bringing different measures of the same construct into alignment. It enabled 524 Vietnam Reading and Mathematics Assessment Study XXXXXXXX 5.0 XXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX 4.0 XXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX 3.0 XXXXXXXXXXXXXX XXXXXXXXXXXX 62 XXXXXXXXXX XXXXXXXX 80 84 90 XXXXXXX 64 2.0 XXXXXX 82 XXXXXXXX 53 88 XXX 61 XXX 67 68 70 71 76 XX 55 87 XXXX 52 72 79 89 X 86 1.0 X 81 83 XX 73 X 48 63 X 57 65 85 X 60 69 0.0 X 74 51 54 66 75 -1.0 59 56 78 49 -2.0 46 50 58 47 Figure 3.3: The variable map for the teacher mathematics test all the items in each item pool, and the pupils and teachers who attempt them, to be described with the same units of measurement. Their performances, on the different sub-sets of items, were mapped onto the same continuum. Test 525 Vietnam Reading and Mathematics Assessment Study equating made it possible to convert the scores of the teacher test into the scores of the pupil test and vice versa so that the interpretation of the scores on the two tests is independent of which test is used. For this to happen both tests must have been able to be mapped onto, and be shown to measure, the same variable. The pupils' performances could then be interpreted in the same substantive framework, or levels on a construct, as the teacher performance regardless of the sub-set of items or different tests they completed. This is an important property of test equating and item response modelling is the process that makes it possible. An advantage of the Rasch model is that it simplifies the process. Equivalence Two tests are equivalent if the performance or scores can be directly translated from one to the other so that the choice of test is independent of the performance to be measured. If the same variable underpins both tests, it is a matter of indifference which one is used to obtain a measure of a person's position on the variable. Different sets of items can be used to assess different groups of people. Calibration is the process of fixing the position of the test items on the variable (fixing the difficulty estimates) so that we can understand and use both the position and the accuracy of the estimates. Item response model (IRM) (specifically Rasch model) equating methods can be used to establish equivalence of tests and subsets used with a range of groups. IRM equating methods are preferred to classical test model equating methods because they (a) give item statistics that are not group dependent, (b) yield scores describing examinee proficiency that are independent of test difficulty, and (c) do not require strictly parallel tests for establishing equivalence (Hambleton & Swaminathan, 1985; Lord, 1980; Wright & Stone, 1979). In this project the teacher tests and pupil tests were not parallel and were deliberately constructed to be quite different in difficulty level. Calibration In calibrating a measuring instrument designed to measure person ability, a series of items are used and each item is expected to make specific demands on the person responding to the item. The objective is always to compare the capacity of the person relative to the demands of the item. The comparison between the demands of the task and the capacity of the person is the focus of the calibration. This comparison enables the likelihood of success of a person on an item to be estimated. Calibrating items is a process that estimates the difficulty of each item and 526 Vietnam Reading and Mathematics Assessment Study establishes the test's accuracy as a measuring device. Once any sub set of items was calibrated their relative difficulty measures could be used to estimate the difficulties of others drawn from the same item pool. This was because the estimates were used as fixed estimates of item difficulty to compare other items or groups of persons to them. They became an anchored set of estimates. This is an important step because it was important to know both the level of difficulty and the accuracy of measurements when decisions are to be based on information they provide. When the items are calibrated and all other items compared to this set it is called anchoring. It is possible to establish either an anchored set of persons or items for the estimation of other person or item parameters. This is done in order to establish the difficulty of new items or the ability of new groups of persons relative to the estimates of those we have established in previous calibrations. It is possible to either anchor on (fix the estimates of) persons or items. In this project the sub set of ten reading comprehension items that were common to the teacher and pupil tests were used as an anchor set to link the teacher and pupil tests and the five common mathematics items were used as an anchor to map the teacher and pupil mathematics tests onto the same scale. Equating tests can be achieved in several ways but the end result allowed an estimation of how much to 'shift' or 'translate' the scores on one test to bring the difference in ability and difficulty measures into alignment with those of another test. The usual practice is to set the average difficulty of items on one test to zero. Then, if the average of another test set of items is shown to be slightly higher (say 1.0), then we need to shift the person ability estimates on the second test by that amount. Links between tests containing identical sets of items are called a 'common item equating design'. This process uses the calibration of items with one sample of two sub groups, pupils and teachers. The pupil sub group took the common items and some others unique to the pupils. The other, teachers, group answered the common items and all remaining items to establish the relative difficulties of all items and then the relative ability of all the pupils and teachers, regardless of which item set was answered. Linkages were established using the item sets that are common across each of systems - the common items. The end result is a type of symmetry in that it is a matter of indifference which item set a pupil or teacher may have completed, the estimate of ability would have been the same for that pupil or teacher. Symmetry of test equating means that once equivalence between tests has been established, it does not matter subsequently which one is used as a base. To link different samples of people a common sets of items or a common test is needed. This procedure is called common-item equating. The same result can be achieved through the use of one sub set of items for which the calibrated item difficulties are held constant (anchored) and all other sets are brought into alignment with this 'anchor set'. 527 Vietnam Reading and Mathematics Assessment Study Another method of calibrating and equating is through the simultaneous calibration of all items using a computer program (such as RUMM, QUEST, CONQUEST or WINSTEPS) that allows items not taken by sets of pupils or teachers to be treated as missing data. This is called concurrent equating and is the equivalent of every possible set of anchor items and persons. An issue of concern, however, with concurrent equating is that the effect of missing blocks of data on the estimates and on the errors of those estimates is unknown and the experience of this project is that different estimation procedures can affect the difficulty estimates of the items. In 2002, work in this project involving comparisons of RUMM (Andrich, 2001), Quest (Adams and Koo, 1995) and Conquest (Wu, Adams and Wilson, 1998) showed that concurrent equating was particularly sensitive to misfitting persons and items. The recommendation is that both must be omitted from the calibration phase of the data and then re-entered into the data file for scoring. Adams and Khoo's Quest program (1995) was used to equate the tests after setting all missing responses to zero and treating these as incorrect. This was considered appropriate as trials of the test identified an appropriate amount of time for the test administration and data collectors to check all tests and answers sheets for missing data and to follow up. Under these circumstances, it was assumed that pupils and teachers refrained from responding to an item because they were unsure or could not do the item. All missing responses were therefore coded as incorrect. Under these circumstances, there was no necessity to extract, for calibration purposes, a subset of pupils with complete data. Combined Reading Tests When the pupil and teacher data were combined for analysis the combined data set contained 79845 persons and 95 items. Throughout these analyses the probability of success was fixed at 0.5 because of the curriculum implications. The overall item difficulty was 0.0 with a standard deviation of 1.32. and a separation reliability index of 1.00. The mean INFIT was 0.98 and outfit 0.94 with standard deviations of 0.08 and 0.18 respectively. These data provide support for the assumption that the teacher and pupil tests were measuring the same construct which was defined in Chapter 2. The mean ability of the combined group was 1.28,with a standard deviation of 1.16 logits indicating that the addition of the teacher group raised the mean ability level and increased the dispersion as would be expected. The separation reliability was 0.90 which also indicates that the addition of teachers to the sample increases the separation of persons on the variable as might be expected. The mean person infit was 1.00 and the mean person outfit was 0.98 with standard deviation of 0.17 and 0.30 respectively. There were 162 persons with perfect scores. In Figure 3.4 the items numbered from 61 to 120 represent the pupil reading items and the items numbered from 121 to 155 represent the teacher reading test items. 528 Vietnam Reading and Mathematics Assessment Study It was previously shown (see Chapter 2) that the variable map and the item difficulty levels could be interpreted as levels of competence using the skills audit of the items and the relative difficulty of groups of items. The same six levels of reading competence were used and both teachers and pupils were mapped against the set of levels. The results of these analyses have been shown in Figures 3.5 and 3.6. Given that the mathematics competence level for independent learning was set at about level 5, there is evidence that approximately two percent of teachers may be below this level of competence. While this is a low percentage, this has serious implications for the education of Grade 5 pupils in those regions. If this figure were to be generalised to a national base, the number of pupils affected by this phenomenon would be large. 5.0 X XX 4.0 XXX XXXXX XXXXX 3.0 XXXXXX XXXXXXX XXXXXXXXX 101 152 XXXXXXXX 114 XXXXXXXXXX 86 118 2.0 XXXXXXXXXXXXXXXXXX 81 XXXXXXXXXX 108 146 XXXXXXXXXXXXXXXXXXX 104 145 XXXXXXXXXXXXXXXXXX 93 94 XXXXXXXXXXXXX 63 155 XXXXXXXXXXXXXXXXXXXX 90 100 115 117 120 1.0 XXXXXXXXXXXXXXXX 69 98 103 134 XXXXXXXXXXXXXXXX 82 105 111 XXXXXXXXXXXXXXXXXXX 70 74 99 110 127 139 XXXXXXXXXXXX 61 73 102 113 119 154 XXXXXXXXXXX 64 77 107 109 130 136 141 0.0 XXXXXXXXX 68 87 124 132 137 138 XXXXXXXXX 72 83 85 135 143 XXXXXXX 89 91 131 149 XXXXXX 71 75 133 XXXXX 67 78 79 116 144 147 153 XXXX 62 65 80 88 97 -1.0 XXX 84 106 126 XX 66 96 X 76 112 140 X 92 95 125 150 -2.0 129 151 142 122 128 -3.0 121 123 148 -4.0 Figure 3.4: Variable map of the joint calibration of the pupil and teacher reading tests 529 Vietnam Reading and Mathematics Assessment Study Figure 3.5: Pupil and teacher levels on the reading test Pupil Teacher S caled scor e Figure 3.6: Scaled score distribution for teachers and pupils 530 Vietnam Reading and Mathematics Assessment Study Score distributions for teachers and pupils show that teacher reading scores are considerably higher than those for pupils. The overlap is also of concern. There were many teachers with reading levels lower than those of pupils. When the data were combined for analysis using an anchoring procedure the combined data set contained 79845 persons for the calibration of the reading test and 100 items. Throughout these analyses the probability of success was fixed at 0.5 because of the later curriculum implications, rather than the achievement focus. The overall item difficulty was 0.0 with a standard deviation of 1.47. and a separation reliability index of 1.00. The mean INFIT was 0.99 and OUTFIT 1.00 with standard deviations of 0.09 and 0.21 respectively. These data provide support for the assumption that the teacher and pupil tests are measuring the same construct which has been defined in previous tables and charts. The mean ability of the combined group was 0.97 ,with a standard deviation of 1.25 logits indicating that the addition of the teacher group raised the mean ability level and increased the dispersion as would be expected. The separation reliability was 0.91 which also indicates that the addition of teachers to the sample increases the separation of persons on the variable as might be expected. The mean person infit was 0.99 and the mean person outfit was 1.03 with standard deviation of 0.17 and 0.68 respectively. There were 397 persons with perfect scores. In Figure 3.7 item numbers from 1 to 60 represent pupil items and items 156 to 195 represent the unique teacher items. The common data file treated the teacher items as the same variables as the pupil items and created blocks of missing date for all items that were not administered to the pupils and for those not administered to the teachers. Items that were administered but not responded to, were coded as incorrect as were those for which there were multiple responses. The results of the analyses have been presented as a variable map in Figure 3.7. 531 Vietnam Reading and Mathematics Assessment Study 5.0 XX X 4.0 XXX XXX XXX XXXXXXX 3.0 XXXXXXXX XXXXXXXXX XXXXXXXXX 169 XXXXXXXXX 59 XXXXXXXXXX 171 186 190 195 2.0 XXXXXXXXXX XXXXXXXXXXX 51 60 184 188 XXXXXXXXXX 26 43 161 XXXXXXXXXXXXXXXXXXX 18 168 174 177 193 XXXXXXXXXXXX 17 27 42 44 163 175 178 183 XXXXXXXXXXXXXXXXXXX 53 179 1.0 XXXXXXXXXXXXXXXXXXX 46 55 56 XXXXXXXXXXXXXXXXXXXX 37 39 47 192 194 XXXXXXXXXXXXXXXXXXXX 23 32 189 191 XXXXXXXXXXXXXXX 35 170 180 187 XXXXXXXXXXXXXXXXXXXX 45 49 50 167 0.0 XXXXXXXXXXXXXXXXXXX 38 48 58 172 176 XXXXXXXXXXXXXXXXXXX 8 22 36 57 XXXXXXXXXXXXXXXXX 19 24 52 54 XXXXXXXXXXXXXXX 14 21 30 41 160 181 XXXXXXXXXXXXX 7 25 XXXXXXXXX 29 31 162 -1.0 XXXXXXX 173 XXXXX 5 12 20 34 166 182 XXX 13 16 40 X X 9 28 33 -2.0 X 1 15 164 11 158 6 185 3 10 159 156 2 157 -3.0 4 165 Figure 3.7: Variable map for combined teacher and pupil reading tests It was previously shown (see Chapter 2) that the variable map and the item difficulty levels could be interpreted as levels of competence using the skills audit of the items and the relative difficulty of groups of items. The same six levels of competence were used and both teachers and pupils were mapped against the set of levels. The results of these analyses have been given in Figures 3.8 and 3.9. The results revealed 532 Vietnam Reading and Mathematics Assessment Study some interesting and disturbing overlap between the mathematics competence levels of teachers and pupils. Given that the mathematics competence level for independent learning was set at about Level 5, as in reading, there is evidence that approximately one percent of teachers may be below this level of competence. While this is a low percentage, this has serious implications for the education of Grade 5 pupils in those regions. If this figure were to be generalised to a national base, the number of pupils affected by this phenomenon would be large. Figure 3.8: Comparison of teacher and pupil levels of mathematics competence Figure 3.9: Comparisons of mathematics score distributions for pupils and teachers 533 Vietnam Reading and Mathematics Assessment Study A similar pattern could be observed when scaled scores were mapped onto the same scale for pupils and teachers. International Benchmarking The teacher reading tests used in Vietnam contained items that were in common with the Population B tests in the IEA reading literacy study. The mathematics tests did not contain any items in common with international tests, but there were several items that were modelled on international testing program items and could be considered to be parallel items. It was through these links and parallels that the Vietnam pupil performances could be linked to international studies and then placed in an international context. Two tests can be said to be equated if the system of scores of one test can be converted to the system of scores of the other, and vice versa. That is, they can be mapped onto, or interpreted, in terms of the same variable. In order to link the Vietnamese test items to relevant international tests, an anchoring procedure was used linking the tests via the common items in the case of the teacher and pupil reading tests. The only requirement for the equating was that there were sufficient links between all items, pupils and teachers without necessarily requiring every pupil or teacher to answer every item on every test. The linkage requirements had to make it possible for the responses to every item to be compared directly and unambiguously with the responses to every other item. This was possible with the reading tests, but not with the mathematics tests. For the latter, a system of pairwise comparisons was used to establish the relative difficulty of the items followed by an anchoring procedure. This requirement for complete linkage or, as Linacre (1986) calls it, 'connectedness', meant that a system of pairwise comparisons (Andrich, 1996) could establish the relative difficulties of a set of items. Once this was determined, the set could be linked to the total pool of items through a process of common item equating. When this condition was met, it was possible to estimate the item difficulties independent of the item set, context or group being assessed and to link the Vietnam tests (and hence samples of pupils and teachers) to international studies such as TIMSS. For the mathematics test there were no items common to the TIMSS tests. Hence a different procedure was used for links to the TIMSS tests. The Pairwise Procedure. A set of 10 items were chosen from the pupil mathematics test selecting those 534 Vietnam Reading and Mathematics Assessment Study which were modelled on the TIMSS international item set as well as some that were not so closely related. In some cases the Vietnamese item writers had modelled the local items on the TIMSS items but had made sufficient changes such that new estimates of difficulty would be required. The original and the modelled items were collected into a set and two panels of judges were formed. One panel made up of test writers and curriculum specialists in Hanoi and another, made up of postgraduate students, including those who were Vietnamese, in Melbourne. A total of 23 judges examined the items and completed a pairwise comparison exercise initially by ranking the items in terms of estimated relative difficulty. These rankings were then converted to a pairwise comparison table and finally to logit values. The logit values of the TIMSS items were then adjusted to the original published values and the same adjustment made to the Vietnamese pairwise items. The calibrated difficulty of the Vietnamese mathematics items was then adjusted to align with the modified pairwise values and an anchor file established for recalibrating the pupil math test and establishing the translation constant for adjusting the Vietnamese logit values to the TIMSS logit values. The translation constants Tpv, Tpt and Tvt represent the translation shifts from the Pairwise to the Vietnamese calibration, the pairwise to the TIMSS calibrations and the shift from the Vietnamese to the TIMMS calibrations. The last one (tvt )cannot be determined directly, but can be calculated using the following equations. TPT = P -T TVP = P -V TVT = TVP -TPT TVT = T -V Using this approach the shift required to map the Vietnamese maths items and the pupil performance onto the international scale became possible, if not a little problematic. Comparing Mathematics Achievement In equating the mathematics tests a form of common person equating was used. In this instance, a sub-group of TIMSS items was selected and added to a pool of items from the Vietnam mathematics tests. A series of comparisons of items was used to establish relative difficulties of the items, and the procedure involved a panel of judges (n=20) ranking a set of items (k=10) from the hardest to the most difficult and then converting this set of rankings to a pairwise comparison of preferences using a simple computer algorithm. Ranking the items implied that every 535 Vietnam Reading and Mathematics Assessment Study item was compared to every other item. The proportion judging each item to be easier was used to estimate of the probability of success. The difficulty of the first item in the set was set to zero and each tem was compared in turn to that item. Then the difficulty of item 2 was set to zero an each item was compared to it. The matrix of relative difficulties was then used to order and then standardise the estimates of item difficulty with a mean (or sum) of zero and a standard deviation of 1 logit. This was done to ensure that distributions of logits was consistent with those from the calibration data set. Each set of estimates was transformed using this process to ensure that the logit distributions were equivalent. The archived values of item estimates were then obtained for the TIMSS items in the item set. Three sets of item difficulty estimates were obtained. These were the TIMSS estimates, the pairwise estimates and the calibrated Vietnamese mathematics (VM) test item difficulties. Both the Vietnamese and TIMSS item estimates were also compared using a common metric with a mean of 500 and a standard deviation of 100. Each of the calibrated estimates (TIMSS and VM) were regressed onto the pairwise estimates as the anchor set. Codes used for the items reflect the archiving codes for used for the TIMSS items and the VM item numbers on the Vietnam test. The pairwise comparisons item parameter estimates have been presented in the column headed PAIR in Table 3.4. IEA500 VIET500 PAIR VIETLOGIT VIETLOGIT IEAK2 429 1.87 1.23 IEAM5 623 0.21 0.24 IEAK5 450 -0.73 -0.44 IEAD12 597 -0.22 0.23 IEAJ7 547 -0.65 -0.40 VPM23 514 -0.79 0.18 -0.20 VPM45 547 1.05 0.62 0.79 VPM58 539 0.81 0.51 0.71 VPM57 523 0.55 0.31 0.71 VPM48 536 1.01 0.47 0.84 Table 3.4: Approximations of item estimates after pairwise comparisons The pairwise estimates were regressed onto each of the calibrated values for the TIMSS and Vietnamese test item parameters. The regression models for each of these were then combined to form the relationship between the IEA scale and the Vietnamese Math Grade 5 scale. For average Grade 5 pupils in Vietnam the equivalent score on the IEA Grade 4 (Population A) scale would be 570. 536 Vietnam Reading and Mathematics Assessment Study Ai= 72.2Vi +570 where the Vi represents the scaled VM item, i, difficulty estimate and Ai represents the value on the IEA 500 mean scale. The relationships between the pairwise and the calibrated item estimates are different. If they had been measures of the same variable, the coefficient of the pairwise estimates (P) would be identical across the two procedures. This suggests that the Vietnamese tests and the TIMSS tests were not directly comparable. However given the interest in the comparison the somewhat tenuous procedure was used. By assuming identical relationship based on the mean pairwise difficulty estimates, (setting this at zero), the translation constant would be 74, or 0.74 standard deviations. This is a substantial difference and if translated to difference in ability estimates in mathematics it would place the Vietnamese Grade 5 pupils well ahead of the Grade 3 and 4 and close t the levels of Grade 5 pupils in the international sample. While the international tests used different parameters for each of the tests it was nevertheless possible to map them all onto the same underpinning scale and it is on this scale that they are compared.. While this is not surprising, given the grade difference, it provides an indication that the Vietnamese Grade 5 is, on average, about at the same level as the Grade 4 pupils in Hong Kong (587) Irish republic (550) Czech (567) and the Netherlands (577). However these comparisons are tenuous at best and should be treated with caution. No direct comparisons to the Vietnamese Grade 5 population were available. The comparisons presented above were indirect and may well be somewhat unstable and might not be directly interpretable. Calibrating an Anchor set for the Reading test. For the reading tests there were no items in the pupil test common to the IEA reading literacy instruments. The reasons for this were based on decisions by the Vietnamese test development team to focus on pupil achievement relative to the local curriculum. However there were common items across the teacher and pupil tests and the teacher test had items in common with the IEA reading literacy Population A and Population B tests. Population A was defined as the pupils who were 9 years of age in the eighth month of the school Grade. Population B was defined as the pupils who were 14 years of age in the eighth month of the school year. In general terms this would have meant the 4th or 9th grade. Again this is indicative of the lack of direct comparison but the links could be established and have been illustrated in the Figure below. 537 Vietnam Reading and Mathematics Assessment Study L i n k V T t o I B reh aceT ela ma tseTgni Scgni etn adeR B L i n k I A t o I B st Vi tseTgni adeRnommoC Tegnida L i n k V T t o I A ontialupop A adeR IE Re L i n k V P t o V T li Pup m tseTgni A naet onital Vi eadR pu po tseTgnida EAI Re Figure 3.10: Links across IEA and Vietnamese Reading tests In Figure 3.10 the links have been shown between the tests and the populations used in the Vietnamese and the IEA Reading Literacy Study. It enables the Vietnamese Teacher Reading test (VT) to be linked to the Populations A and B reading tests from the IEA Reading Literacy study (Elley, 1992) and then from this link, to establish the comparative performance of the Grade 5 Vietnamese pupils. The procedures was as follows: 1. Select a random sample of 800 from each country and combine to a single file. 2. Calibrate these to identify item difficulty, coding missing and multiple response as wrong 3. Identify the difficulty estimates of the link items 4. Select a similar random sample of teachers from the Vietnamese data (VT) and calibrate all items on this sub sample 5. Identify the difficulty parameters of the link items 6. Plot (regress) the estimates from IEA tests onto the estimates from the Vietnam teacher test. 7. Check slope and intercept (translation constant) as in Wright and Stone(1979) 8. Compute the adjustment to the parameters for the link between VT Reading test and IEA tests 9. Make adjustments to link set for VT Reading test as per translation 10. Set up the relevant anchor file for the VT Reading file and recalibrate VT Reading 11. Set up anchor file for the calibration of the VP Reading test. 12. Write out VP Reading case estimates 13. Convert to 500/100 scores using the logit mean and SD from the IEA calibration. 538 Vietnam Reading and Mathematics Assessment Study This procedure provided a basis for comparison of the teacher and pupil test performances within an international context and enabled the Populations A and B of the IEA Reading Literacy study to be mapped onto a single scale. However these comparisons should not be regarded as definitive. There are too many differences in the populations in terms of age and maturity and there has to be some doubt as to whether the pupils and teachers are drawn from the same population for the purposes of Rasch calibration. In a similar fashion to that shown above, the relationship between the pupil Population B IEA Reading Literacy Study and the Vietnam teacher reading test was computed to be VT= 1.88IB - 0.42. The quality of the link was then examined by fitting a 95 percent confidence band around the line of best fit for the regression of the IEA item difficulty parameter estimates onto the Vietnam item difficulty parameter estimates. Two pieces of evidence were sought. First the line of best fit should have had a unit slope indicating that the same relative order of item difficulties was identified regardless of the item pool in which they were used. Second, the intercept wold indicate the relative mean difference in difficulty between the items and thus produce a translation shift to enable the Vietnam teacher ability estimates to be computed on the IEA standard scale (500/100). For this to be achieved it was necessary to show that the items were measuring a consistent construct. The evidence for this was the stability of the rank order of the item difficulty estimates. If this was to be observed the item estimates would all fall within the confidence band around the regression line. Only one item in the link set was shown to fall outside the confidence limits and accordingly the link was assumed to be satisfactory for the purposes of this comparison. The slope of the regression of the IEA calibration onto the Vietnamese calibration again raises doubt that the tests were measuring the same construct. The expected slope would be 1.0. Figure 3.11: Comparison of items for the Vietnam teacher test and the IEA population B test 539 Vietnam Reading and Mathematics Assessment Study Accepting the limitations of the procedure, the following international table of values was obtained. In the initial calibration of the teacher and pupil data sets the mean pupil case estimate was 0.95 logits and the mean teacher reading logit was 2.61 logits. Incorporating these data into Figure 3.12 below, the translation constant would be 1.66 logits. This would yield a mean pupil logit score on the IEA population B scale of 0.22 logits and this would convert to a standardised international scale score of 414. In the IEA study other countries tested Grades 3 and 4. New Zealand tested grade 5. From the chart it can be seen that the reading performances are comparable to several countries including New Zealand, but more closely related to the Grade 4 performances of Hong Kong, Ireland and Greece. Figure 2.12: Link between IEA Population B and Vietnamese Teacher Test Some general issues concerning the data There are several difficulties that can arise when calibrating test data sets. The first is the issue of perfect or zero scores. Like many issues, there is no single agreed way to treat the problem. One thing is certain, however. The data for the pupil with the perfect score must be retained in the data file and not excluded because of the Rasch model's inability to calibrate the pupil ability. The same is true of items that score zero or perfect scores (everyone 540 Vietnam Reading and Mathematics Assessment Study with the correct answer). Pupils who omit items or who do not reach specific items in the test can also provide data that may lead to incorrect ability and difficulty estimates. When these pupils provide a correct answer to every item they reach or attempt, the problem of perfect scores arises with a smaller number of items in the scale. These pupils pose a difficulty in estimating their ability if all omitted responses are treated as missing data. The estimate for these pupils is quite different if omitted responses are treated as wrong. There are several approaches to the treatment of non-response data. Each responds to different observations and makes different assumptions about pupil non-response. 1. Treating non response as missing data and ignoring them in estimating ability - 2. Treating non response as wrong in all circumstances 3. Treating skipped item non response as wrong and unreached non response as missing data 4. Averaging the number of items reached for a given sub group and then applying treatment 2 5. Dropping all items with a non response rate above a given level and applying treatment 2. 6. Averaging the number of items reached for a given sub population and then applying treatment 3. 7. Dropping all items with a non-response rate above a given level and applying treatment 3. Each approach was tested in the SACMEQ project (Griffin, 2001) based on the assumption that raw scores were expressed as proportions. These were derived either as a percentage score with a base defined by the number of items remaining under each of the treatments above, or a latent trait score such as that derived through the Rasch model. In the latter, the raw score or proportion correct was used as sufficient data for estimation of pupil ability. If only raw scores were used to define achievement or ability, these treatments of the non-response behaviour could be evaluated. In using the Rasch model, the person item interaction becomes the basic source of information for estimating item difficulty and pupil ability. Where evidence of this interaction is not available, specific assumptions need to be made about it and procedures followed to impute the probability of each possible outcome of the interaction, had it occurred. Where the evidence of the interaction is not present (in the form of an answer) it is nevertheless often assumed that the interaction did occur and that the pupil was unable to answer the item. Under this assumption, 541 Vietnam Reading and Mathematics Assessment Study the non response is scored as incorrect. However, this cannot always be assumed to be the case. It may be that the interaction did not occur at all, and under this assumption the non-response is treated as simple missing data. In some treatments, the response pattern to other adjacent items is used as pseudo or contextual evidence and a response to the interaction is imputed (hot decking) especially using Rasch methodology. Under these assumptions, the ability of the pupil is estimated from the items answered and the difficulty of the item is estimated from pupil responses. Then the probability of the pupil's correct response is computed as a pseudo response and treated as a score for the pupil item interaction. Guessing patterns could be imputed in similar ways. Unfortunately, there are no definitive approaches to identifying guessing behaviour at the individual pupil level. For each approach, estimates of guessing behaviour can be compared using a two or three parameter logistic model and an additional criterion can then be applied to the data. In this study, the level of missing data was not high even at the end of each test. Hence it was assumed that the 'not reached' category did not apply to sufficiently large numbers to cause consideration in scoring the test. It was assumed that a pupil or teacher by item interaction had occurred for every item and that the result of this interaction resulted in a response from the pupil. As a result of the previous investigation for SACMEQ, it was decided that, where the response was not recorded and appeared as missing data, it was coded as incorrect. This simplified the procedure and avoided inflation of ability estimates because of missing data effects that occur especially with a paired comparisons estimation procedure. References Adams, R. J. and S. T. Khoo (1995). Quest Interactive Item Analysis Software. Melbourne, ACER. Andrich, D. (2002). RUMM: A Rasch model Analysis Program. Perth, University of Western Australia. Elley, W. B. (1992) How in the World do Students Read? Hamburg: IEA Glaser, R. (1963). "Instructional Technology and the measurement of learning outcomes: some questions." American Psychologist 18: 519-521. Griffin, P. (2000). Vietnamese National Study of Pupil Achievement in Mathematics and Vietnamese. Hanoi, National Institute for Education and Science. Griffin, P. (2001). Performance Assessment of Higher Order Competencies. Annual Conference of the American Education Association, Seattle. Masters, G. (1982). "A Rasch model for partial credit scoring." Psychometrica 47: 149-174. 542 Vietnam Reading and Mathematics Assessment Study Messick S M, Beaton A E, Lord F M 1983 NAEP Reconsidered: A New Design for a New Era. Educational Testing Service, Princeton, New Jersey Wilson, M. (1999). Measurement of Developmental Levels. Advances in Measurement in Educational Research and Assessment. G. N. Masters and J. P. Keeves. New York: Pergamon. Wright, B. and G. Masters (1982). Rating Scale Analysis. Chicago, MESA Press. Wright, B. and M. Stone (1979). Best Test Design. Chicago, MESA Press. Wu, M., R. J. Adams, et al. (1998). ConQuest: generalised Item Response Modelling Software. Melbourne: ACER. 543 Vietnam Reading and Mathematics Assessment Study Chapter 4 SAMPLE DESIGN PROCEDURES: THE VIETNAM GRADE 5 SURVEY T his chapter describes the sample design procedures that were developed for a large-scale survey of the quality of primary education in Vietnam. The data collection for the survey was extensive - involving some 72,660 pupils, 7,178 teachers, and 3,631 school heads in 3,660 schools across 61 Provinces. The main aim of the study was to assess the quality of primary school education in terms of human and material resource inputs to schooling, and the educational achievements of the pupils. The initial data collection for the study was also required to deliver baseline data which could be used as the foundation for later studies that would monitor educational quality at different time points. Some Constraints on Sample Design Sample designs in the field of education are usually prepared amid a network of competing constraints. These designs need to adhere to established survey sampling theory and, at the same time, give due recognition to the financial, administrative, and socio-political settings in which they are to be applied. The "best" sample design for a particular project is one that provides levels of sampling accuracy that are acceptable in terms of the main aims of the project, while simultaneously limiting cost, logistic, and procedural demands to manageable levels. The major constraints that were established prior to the preparation of the sample design for the Vietnam Grade 5 Survey have been listed below. 1This chapter was written by Kenneth N Ross, Mioko Saito, Stephanie Dolata, and Miyako Ikeda of the International Institute for Educational Planning (UNESCO) 545 Vietnam Reading and Mathematics Assessment Study (a) Target Population The target population was to be concerned with Grade 5 pupils enrolled in full-time mainstream primary education in Vietnam. (b) Bias Control The members of the defined target population were to have a known and non- zero probability of selection into the sample so that any potential for bias in sample estimates due to variations from "epsem sampling" (equal probability of selection method) could be addressed through the use of appropriate sampling weights (Kish, 1965). (c) Sampling Errors At the design phase of this study the researchers considered a range of issues related to the precision that would be required of the sample estimates of population parameters. The initial proposal was to undertake a survey that would put specific limits on the sampling accuracy of estimates generated at the national level. These were to require sample estimates from pupils to have sampling errors equal to, or smaller than, an effective sample size of 400 pupils (Ross, 1985). However, the Ministry of Education research team decided that the policy impact of the study would be far greater if specific limits were placed on the sampling accuracy of estimates generated at the provincial level. This decision had major implications for the required total sample size and transformed the survey into one of the largest educational research surveys that has ever been conducted. (d)Administrative and Financial Costs The major administrative and financial costs associated with the data collection were linked with the mobilization and extensive training of a field force in each of Vietnam's 61 Provinces. (e) Other Constraints The instrumentation for the study required pupils and teachers to complete, accurately and independently, a questionnaire and tests of reading and mathematics achievement. In order to ensure maximal validity in the collection of this information from pupils it was necessary to administer the instruments as "group testing sessions" so as to prevent the possibility of pupils being able to discuss and compare their responses. 546 Vietnam Reading and Mathematics Assessment Study The Specification of the Target Population This report is concerned with two main aspects of the sample design. The first part has provided a detailed account of the steps taken to prepare the sample design in a manner that optimised the validity of the data collection and, at the same time, addressed the constraints listed above. The second part has provided a description of the methods that were used to calculate sampling errors and detailed information about response rates by strata (for pupils, teachers, and schools). Throughout the report each statistic presented has been accompanied by its sampling error (SE). In this chapter the concept of sampling error has been explained and this has been illustrated with a discussion concerning the calculation of sampling errors for the pupil reading test at the national and provincial levels. Before the above two aspects of the sample design could be addressed it was necessary to develop a clear description of the target population for the survey. This required the preparation of operational definitions of the desired, defined, and excluded populations, and the use of these to develop a sampling frame for the survey. (a) The Desired Target Population The desired target population for this study was centred around a grade description, and not an age description, of pupils attending schools in Vietnam. The grade level selected was Grade 5. This grade level represented the final year of primary schooling and therefore, because of the high retention rates up to the end of primary schooling, provided an appropriate target population for investigating the contribution of primary education towards the acquisition of reading and mathematics skills by a broad cross- section of the population. After extensive consultation with Ministry of Education staff concerning the key purposes of the study, it was agreed that the desired target population should be described as follows: "All pupils at Grade 5 level in 2001 who are attending registered mainstream primary schools in the 61 provinces of Vietnam." While the emphasis in the definition of the desired target population was placed on pupils, the study was also concerned with reporting estimates describing schools and teachers. However, an emphasis on pupils in the target population definition was retained because estimates for schools and teachers were weighted as to provide estimates of teacher and school variables "for the average pupil" - rather than estimates for teachers and schools as distinct target populations in themselves. The weighting procedures used to establish these estimates have been described in a later section of this chapter. 547 Vietnam Reading and Mathematics Assessment Study (b) The Defined Target Population The use of the word "mainstream" in the definition of the desired target population automatically removed pupils attending special schools prior to any consideration of which pupils should be excluded from the desired target population in order to form the defined target population. During the initial planning stages of the study, there was considerable discussion about whether the excluded population should include schools situated in small provinces. However, after consultation with Ministry of Education staff, it was decided that it would be both desirable and feasible to aim for a full national coverage of the 61 provinces. While all provinces were included in the defined target population definition, it was decided that small schools with fewer than 20 Grade 5 pupils would be excluded. This decision was taken because it was known that these small schools (a) represented a very small component of the total population of pupils, and (b) were known to be mostly located in very isolated areas that were associated with high data collections costs. That is, it was understood that the allocation of these small schools to the excluded population had the potential to reduce data collection costs - without the risks of leading to major distortions in the study population. In Table 4.1 numerical descriptions of the desired, defined, and excluded populations have been presented. From the figures in the final row of the table it may be seen that only 3,964 pupils in 469 schools were excluded to form the defined target population. This excluded population represented only 0.22 percent of the desired target population of 1,812,053 pupils. The defined target population - from which the national sample was selected - contained 1,808,089 pupils in 14,173 schools. 548 Vietnam Reading and Mathematics Assessment Study Table 4.1:Vietnam Grade 5 pupils: Desired, Defined, and Excluded Population Province Desired Defined Excluded Schools Pupils Schools Pupils Schools Pupils % Pupils Ha Noi 257 42402 242 42340 15 62 0.15 Hai Phong 220 37181 214 37123 6 58 0.14 Ha Tay 355 54907 353 54904 2 3 0.00 Hai Duong 274 40428 273 40416 1 12 0.03 Hung Yen 161 26974 161 26974 0 0 0.00 Ha Nam 135 18484 135 18484 0 0 0.00 Nam Dinh 292 42620 290 42617 2 3 0.01 Thai Binh 293 35936 291 35933 2 3 0.01 Ninh Binh 150 23045 147 23033 3 12 0.05 Ha Giang 213 10795 138 10315 75 480 4.45 Cao Bang 251 11628 209 11084 42 544 4.67 Lao Cai 228 16177 193 15835 35 342 2.11 Bac Kan 134 7538 126 7459 8 79 1.05 Lang Son 248 21839 236 21744 12 95 0.43 Tuyen Quang 188 21019 188 21019 0 0 0.00 Yen Bai 235 17444 219 17319 16 125 0.72 Thai Nguyen 201 25482 197 25456 4 26 0.10 Phu Tho 296 32064 294 32053 2 11 0.03 Vinh Phuc 174 27009 169 26976 5 33 0.12 Bac Giang 260 41357 260 41357 0 0 0.00 Bac Ninh 138 22815 138 22815 0 0 0.00 Quang Ninh 216 22637 206 22507 10 130 0.57 Lai Chau 183 10683 137 10323 46 360 3.37 Son La 254 20690 230 20531 24 159 0.77 Hoa Binh 239 21492 234 21438 5 54 0.25 Thanh Hoa 704 98121 701 98100 3 21 0.02 Nghe An 676 86518 663 86428 13 90 0.10 Ha Tinh 312 38070 311 38067 1 3 0.00 Quang Binh 244 21630 237 21568 7 62 0.29 Quang Tri 161 13637 154 13574 7 63 0.46 Thua Thien - Hue 232 25394 228 25365 4 29 0.11 Da Nang 79 13124 78 13119 1 5 0.04 Quang Nam 263 32817 248 32693 15 124 0.38 Quang Ngai 228 28892 222 28801 6 91 0.31 Binh Dinh 226 33546 224 33523 2 23 0.07 Phu Yen 143 19830 141 19794 2 36 0.18 Khanh Hoa 176 23664 174 23636 2 28 0.18 Gia Lai 241 26135 234 26052 7 83 0.32 Kon Tum 97 8574 95 8539 2 35 0.41 Dak Lak 397 49182 391 49129 6 53 0.11 Ho Chi Minh 433 81358 422 81293 11 65 0.08 Lam Dong 239 26398 231 26306 8 92 0.35 Ninh Thuan 118 11415 107 11296 11 119 1.04 Binh Phuoc 115 16849 115 16849 0 0 0.00 Tay Ninh 285 24360 273 24309 12 51 0.21 Binh Duong 109 15410 108 15410 1 0 0.00 Dong Nai 274 49666 271 49666 3 0 0.00 Binh Thuan 212 25868 207 25819 5 49 0.19 Ba Ria - Vung Tau 130 19616 129 19616 1 0 0.00 Long An 242 29952 240 29936 2 16 0.05 Dong Thap 297 36518 295 36498 2 20 0.05 549 Vietnam Reading and Mathematics Assessment Study Table 4.1 (Cont'd): Vietnam Grade 5 pupils: Desired, Defined, and Excluded Population Province Desired Defined Excluded Schools Pupils Schools Pupils Schools Pupils % Pupils An Giang 378 43200 375 43164 3 36 0.08 Tien Giang 235 35809 234 35809 1 0 0.00 Vinh Long 244 24976 242 24961 2 15 0.06 Ben Tre 187 29163 187 29163 0 0 0.00 Kien Giang 248 31634 244 31591 4 43 0.13 Can Tho 310 40983 310 40983 0 0 0.00 Tra Vinh 195 21721 191 21677 4 44 0.20 Soc Trang 246 26488 243 26438 3 50 0.02 Bac Lieu 140 17814 139 17795 1 19 0.11 Ca Mau 231 31075 229 31067 2 8 0.02 Vietnam 14642 1812053 14173 1808089 469 3964 0.22 The Stratification Procedures The stratification procedures adopted for the study employed explicit and implicit strata. The explicit stratification variable, "Province", was applied by separating the sampling frame that described the defined target population into separate provincial lists of schools prior to undertaking the sampling. The implicit stratification variable was "school size" - as measured by the number of pupils in the defined target population within each school. The main reason for choosing province as the explicit stratification variable was that the Ministry of Education wanted to have 61 provinces form domains for the study. That is, the Ministry wanted to have reasonably accurate estimates of population characteristics for each province. There were two other reasons for selecting province as the main stratification variable. First, the use of province as the explicit stratification variable was expected to provide an increment in sampling precision due to between- province differences in important educational variables - especially between predominantly urban and predominantly rural provinces. Second, this approach provided a broad geographical coverage for the sample - which was necessary in order to spread the fieldwork across Vietnam in a manner that prevented the occurrence of excessive administrative demands in particular provinces. The use of school size as an implicit stratification variable within provinces offered increased sampling precision because it provided a way of sorting the 550 Vietnam Reading and Mathematics Assessment Study schools from "very rural" (small schools) to "very urban" (large schools). This kind of sorting was known to be linked to the main criterion variables for the study - with rural schools likely to have lower resource levels and lower pupil achievement scores than urban schools. Sample Design Framework The general sample design framework adopted for the study consisted of a stratified two-stage cluster sample design. At the first stage schools were selected within strata with probability proportional to the number of pupils in the defined target population. At the second stage a simple random sample of a fixed number of pupils was selected within each selected school. In order to establish the number of schools and pupils that were required to satisfy the specified sampling accuracy standards, it was necessary to know the (a) minimum cluster size (the minimum number of pupils within a school that would be completing any single item or test), and (b) coefficient of intraclass correlation (the estimated average size of the coefficient of intraclass correlation for the items and tests). (a) Minimum Cluster Size The value of the minimum cluster size referred to the smallest number of pupils within a school that would be completing any single item or test. The value selected for this study was guided by a consideration of the following issues. It was important that the minimum cluster size was set at a level that permitted test administration within schools to be carried out in an environment that ensured that: (i) the test administrator was able to conduct the testing according to the standardized procedures specified for the study; (ii) the sample members were comfortable and unlikely to be distracted; (iii) the sample members responded carefully and independently to the tests and questionnaires; and (iv) the testing did not place an excessive administrative burden on schools. After consideration of the four requirements listed above, and in consultation with staff of the Ministry of Education, it was decided to limit the sample in each selected school to a simple random sample of 20 pupils. (b) Coefficient of Intraclass Correlation The coefficient of intraclass correlation (rho) provides a measure of the tendency of pupil characteristics to be more homogeneous within schools than would be the case if pupils were assigned to schools at random. The estimated size of rho may be calculated from previous surveys which have employed similar target populations, similar sample designs, and similar criterion variables. 551 Vietnam Reading and Mathematics Assessment Study The values of rho for educational achievement measures are usually higher for education systems where pupils are allocated differentially to schools on the basis of performance - either administratively through test results, or structurally through socio-economic differentiation among school catchment zones. In general terms, a relatively larger value of rho means that, for a fixed total number of sample members (pupils in this study), a larger number of primary sampling units (schools in this study) need to be selected in order to obtain the same sampling precision as would be obtained for a relatively lower value of rho. That is, higher values of rho normally require larger numbers of schools to be selected into the sample. It is important to note that values of rho tend to be higher for pupils-within-classes than for pupils- within-schools and therefore caution needs to be exercised when employing "intact classes" as the final stage of a sample design. The following formula may be used for estimating the value of rho in situations where two-stage cluster sampling is employed using (approximately) equal sized clusters (Ross, 1985). estimated rho = (b. s(a)2- s2) / (b - 1)s2 where s(a)2 is the variance of cluster means, s2 is the variance of the element values, and b is the cluster size. A prior small-scale study of reading levels of Grade 5 pupils in five provinces of Vietnam carried out by the Ministry of Education had shown that the coefficient of intraclass correlation (rho) was around 0.50 - which was an extremely high value by world standards. This estimate of rho was expected to be higher than the value that would occur within provinces because it was known that variation between-provinces was rather high. Therefore, in the absence of further information about the likely magnitude of the coefficient of intraclass correlation, it was decided to use an estimated value of 0.3 in order to guide decisions concerning the within province sample designs. (c) Sample Design Tables In Appendix 4.1 to this chapter, a set of sample design tables has been presented for various values of the minimum cluster size, and various values of the coefficient of intraclass correlation. The construction of these tables has been described by Ross (1987). It is important to remember that the tables refer specifically to two-stage sample designs that employ simple random sampling of equal-sized clusters. Nevertheless, they provide a good starting point for estimating the number of schools and pupils that are required in order to meet the sample design standards specified for many educational research studies. The sample design tables do not allow for gains in sampling precision that are associated with effective choice of strata and therefore may tend to provide conservative estimates of the numbers of schools and pupils that are required. However, it should be noted that the tables may be used in a manner that does make some adjustments for losses in sampling precision that are associated 552 Vietnam Reading and Mathematics Assessment Study with a disproportionate allocation of the sample across strata by (i) applying the sample design tables separately to the strata, and then (ii) using the formula for estimating the variance of a stratified sample in order to combine sampling error estimates for each stratum (Ross, 1991). To illustrate the use of these tables, the fourth and fifth columns of the tables list a variety of two-stage samples that would result in an effective sample size of 400. That is, these columns describe sample designs which provide 95 percent confidence limits of ± 0.1s for means and ± 5 percent for percentages (where s is the value of the pupil standard deviation). In each table, "a" has been used to describe the number of schools, "b" has been used to describe the minimum cluster size, and "n" has been used to describe the total sample size. For example, consider the intersection of the fourth and fifth columns of figures with the third row of figures in the tables. The pair of values a=112 and n=560 indicate that if rho is equal to 0.1 and the minimum cluster size, b, is equal to 5, then the two-stage cluster sample design with an effective sample size of 400 would be five pupils selected from each of 112 schools - which would result in a total sample size of 560 pupils. The effect of a different value of rho, for the same minimum cluster size, may be examined by considering the corresponding rows of the table for rho=0.2, 0.3, etc. For example, in the case where rho=0.3, a total sample size of 880 pupils obtained by selecting 5 pupils from each of 176 schools would be needed to meet the required sampling standard. The rows of the tables that correspond to a minimum cluster size of 1 refer to the effective sample size. That is, they describe the size of a simple random sample which has equivalent accuracy. Therefore, the pairs of figures in the fourth and fifth columns in the table all refer to sample designs which have equivalent accuracy to a simple random sample of size 400. The second and third columns refer to an equivalent sample size of 1,600, and the final two pairs of columns refer to equivalent sample sizes of 178 and 100, respectively. (d) The Numbers of Schools and Pupils required for this Study Using 0.3 as the value of rho and a minimum cluster size of 20 in combination with the sample design tables suggested that 134 schools per province would be required to provide an effective sample size of 400 pupils within provinces. This figure was far beyond the data collection resources of the Provincial offices of education. It was therefore decided to relax the error constraints for provincial estimates from an effective sample size of 400 pupils to an effective sample size of 178 pupils - which would provide 95 percent confidence limits of ±7.5 percent for percentages and ±0.15s for means. The figures listed in the sixth and seventh columns of figures in Appendix 4.1 (for a roh value of 0.3) indicated that 60 schools per stratum would be required to achieve an effective sample size of 178. 553 Vietnam Reading and Mathematics Assessment Study (e) The Allocation of the Sample Across Strata In Table 4.2 the allocation of the sample across the 61 provinces that form the strata for the sample design has been presented. This table described the defined population of schools and pupils in each of the 61 strata - and indicated that the same planned sample (of 60 schools and 1,200 pupils) would be selected for each stratum. Table 4.2: Strata and Sample Allocation for the Defined Target Population Sample Allocation of Schools Province Population of Schools Population of Pupils Proportionate Planned N % N % N N % Ha Noi 242 1.7% 42340 2.3% 86 60 1.6% Hai Phong 214 1.5% 37123 2.1% 75 60 1.6% Ha Tay 353 2.5% 54904 3.0% 111 60 1.6% Hai Duong 273 1.9% 40416 2.2% 82 60 1.6% Hung Yen 161 1.1% 26974 1.5% 55 60 1.6% Ha Nam 135 1.0% 18484 1.0% 37 60 1.6% Nam Dinh 290 2.0% 42617 2.4% 86 60 1.6% Thai Binh 291 2.1% 35933 2.0% 73 60 1.6% Ninh Binh 147 1.0% 23033 1.3% 47 60 1.6% Ha Giang 138 1.0% 10315 0.6% 21 60 1.6% Cao Bang 209 1.5% 11084 0.6% 22 60 1.6% Lao Cai 193 1.4% 15835 0.9% 32 60 1.6% Bac Kan 126 0.9% 7459 0.4% 15 60 1.6% Lang Son 236 1.7% 21744 1.2% 44 60 1.6% Tuyen Quang 188 1.3% 21019 1.2% 43 60 1.6% Yen Bai 219 1.5% 17319 1.0% 35 60 1.6% Thai Nguyen 197 1.4% 25456 1.4% 52 60 1.6% Phu Tho 294 2.1% 32053 1.8% 65 60 1.6% Vinh Phuc 169 1.2% 26976 1.5% 55 60 1.6% Bac Giang 260 1.8% 41357 2.3% 84 60 1.6% Bac Ninh 138 1.0% 22815 1.3% 46 60 1.6% Quang Ninh 206 1.5% 22507 1.2% 46 60 1.6% Lai Chau 137 1.0% 10323 0.6% 21 60 1.6% Son La 230 1.6% 20531 1.1% 42 60 1.6% 554 Vietnam Reading and Mathematics Assessment Study Table 4.2 (Cont'd):Strata and Sample Allocation for the Defined Target Population Sample Allocation of Schools Province Population of Schools Population of Pupils Proportionate Planned N % N % N N % Hoa Binh 234 1.7% 21438 1.2% 43 60 1.6% Thanh Hoa 701 4.9% 98100 5.4% 199 60 1.6% Nghe An 663 4.7% 86428 4.8% 175 60 1.6% Ha Tinh 311 2.2% 38067 2.1% 77 60 1.6% Quang Binh 237 1.7% 21568 1.2% 44 60 1.6% Quang Tri 154 1.1% 13574 0.8% 27 60 1.6% Thua Thien - Huu 228 1.6% 25365 1.4% 51 60 1.6% Da Nang 78 0.6% 13119 0.7% 27 60 1.6% Quang Nam 248 1.7% 32693 1.8% 66 60 1.6% Quang Ngai 222 1.6% 28801 1.6% 58 60 1.6% Binh Dinh 224 1.6% 33523 1.9% 68 60 1.6% Phu Yen 141 1.0% 19794 1.1% 40 60 1.6% Khanh Hoa 174 1.2% 23636 1.3% 48 60 1.6% Gia Lai 234 1.7% 26052 1.4% 53 60 1.6% Kon Tum 95 0.7% 8539 0.5% 17 60 1.6% Dak Lak 391 2.8% 49129 2.7% 99 60 1.6% Ho Chi Minh 422 3.0% 81293 4.5% 165 60 1.6% Lam Dong 231 1.6% 26306 1.5% 53 60 1.6% Ninh Thuan 107 0.8% 11296 0.6% 23 60 1.6% Binh Phuoc 115 0.8% 16849 0.9% 34 60 1.6% Tay Ninh 273 1.9% 24309 1.3% 49 60 1.6% Binh Duong 108 0.8% 15410 0.9% 31 60 1.6% Dong Nai 271 1.9% 49666 2.7% 101 60 1.6% Binh Thuan 207 1.5% 25819 1.4% 52 60 1.6% Ba Ria - Vung Tau 129 0.9% 19616 1.1% 40 60 1.6% Long An 240 1.7% 29936 1.7% 61 60 1.6% Dong Thap 295 2.1% 36498 2.0% 74 60 1.6% An Giang 375 2.6% 43164 2.4% 87 60 1.6% Tien Giang 234 1.7% 35809 2.0% 72 60 1.6% Vinh Long 242 1.7% 24961 1.4% 51 60 1.6% Ben Tre 187 1.3% 29163 1.6% 59 60 1.6% Kien Giang 244 1.7% 31591 1.7% 64 60 1.6% Can Tho 310 2.2% 40983 2.3% 83 60 1.6% Tra Vinh 191 1.3% 21677 1.2% 44 60 1.6% Soc Trang 243 1.7% 26438 1.5% 54 60 1.6% Bac Lieu 139 1.0% 17795 1.0% 36 60 1.6% Ca Mau 229 1.6% 31067 1.7% 63 60 1.6% Total 14173 1808089 3660 3660 555 Vietnam Reading and Mathematics Assessment Study The Construction of the Sampling Frame The next step in the sample design required the preparation of a sampling frame for the members of the defined target population. The sampling frame provided a "listing" of the pupils in the defined target population without actually creating a physical list consisting of an entry for each and every pupil. For this study, the sampling frame needed to provide a complete coverage of the defined target population without being contaminated with incorrect entries, duplicate entries, or entries that referred to elements that were not part of the defined target population. The information used to construct the sampling frame was based on data that had been collected by the Ministry of Education in 2001 for the School Census. This type of sampling frame was ideally suited to the application of two-stage cluster sampling with probability proportional to size selection at the first stage (Ross, 1991). In Table 4.3 a section of the sampling frame associated with the first stratum has been presented. This stratum contained a total of 242 schools and 42,340 pupils. The section of the sampling frame shown in Table 4.3 has presented only the first 40 schools. Each row of information in the sampling frame included the identification number of the school, the school name, and the commune and province in which the school was located. The last column of figures in the sampling frame referred to the number of pupils in the defined target population for each school. The Selection of the Sample In educational survey research the primary sampling units that are most often employed (schools) are rarely equal in size. This variation in size causes difficulties with respect to the control of the total sample size when schools are selected with equal probability at the first stage of a multi-stage sample design. For example, consider a two-stage sample design in which a simple random sample of "a" schools is selected from a list of A schools, and then a fixed fraction of pupils, say 1/k, is selected from each of the "a" schools. This design would provide an epsem, or "equal probability of selection method" (Kish, 1965, p. 21), sample of pupils because the probability of selecting a pupil is a/Ak, which is constant for all pupils in the population. However, the actual size of the sample would depend upon the size of the schools that were selected. One method of obtaining greater control over the sample size is to stratify the 556 Vietnam Reading and Mathematics Assessment Study schools according to size and then select samples of schools within each stratum. A more widely applied alternative is to employ probability proportional to size (PPS) sampling of schools within strata followed by the selection of a simple random sample of a fixed number of pupils within selected schools. This approach provides control over the sample size and results in epsem sampling of pupils within strata. The PPS sampling approach was implemented for the Vietnam Grade 5 Survey by employing the IIEP's SAMDEM software (Sylla et al, 2003). This software was based on the application of the "lottery" method of selection - which has been described along with a hypothetical example in the following section. The lottery method of PPS selection: A hypothetical example An often-used approach for the application of probability proportional to size (PPS) sampling is to employ the "lottery method". For example, consider a situation where two schools are to be selected with probability proportional to size from each stratum of the hypothetical population of 600 pupils described in Table 4.4. The application of the lottery method commences with the allocation, to each school, of a number of lottery tickets equal to the number of pupils in the defined target population. For example, the first school listed in Table 4.4 has 45 pupils and therefore is allocated tickets numbered 1 to 45, and the second school has 60 pupils and therefore is allocated tickets numbered 46 to 105. Since a PPS sample of two schools is to be selected from the first stratum, there are two "winning tickets" required. 557 Vietnam Reading and Mathematics Assessment Study Table 4.3: The First 40 Schools in the First Stratum of the Sampling Frame for the Vietnam Grade 5 Survey New ID School Old ID School Old ID School ID Province Province Commune School Name Location MOS 00013101 09-0106-144 09-0106-144 1 H Néi City CÇu DÒn Hïng V-¬ng 1 30 00071101 09-0111-240 09-0111-240 1 H Néi City Kh-¬ng Mai NguyÔn Tr-êng Té 1 36 00105101 09-0102-028 09-0102-028 1 H Néi City Ph-êng NghÜa §«" NghÜa §« 1 44 00142101 09-0101-017 09-0101-017 1 H Néi City Ph-êng Qu¸n Th¸nh NguyÔn Tri Ph-¬ng 1 46 00121101 09-0104-075 09-0104-075 1 H Néi City ¤ Chî Dõa §¹i La 1 48 00140101 09-0102-031 09-0102-031 1 H Néi City Ph-êng Quan Hoa §o n ThÞ §iÓm 1 53 00045101 09-0107-159 09-0107-159 1 H Néi City §ång Xu©n Kim §ång 1 56 00080101 09-0101-009 09-0101-009 1 H Néi City Ph-êng Kim M· V¹n Phóc 1 57 00020101 09-0107-150 09-0107-150 1 H Néi City Cöa §«ng Lª V¨n T¸m 1 59 00061101 09-0107-151 09-0107-151 1 H Néi City H ng M· Thanh Quan 1 61 00039101 09-0105-113 09-0105-113 1 H Néi City §"ng D §"ng D 2 62 00195101 09-0108-171 09-0108-171 1 H Néi City Tø Liªn Tø Liªn 1 64 00212101 09-0110-221 09-0110-221 1 H Néi City Yªn MÜ Yªn MÜ 2 64 00158103 09-0111-239 09-0111-239 1 H Néi City Thanh Xuan B3/4c Phan Phï Tiªn 1 66 00181101 09-0105-107 09-0105-107 1 H Néi City Tr©u Quú N«ng nghiÖp 1 2 66 00094101 09-0102-030 09-0102-030 1 H Néi City Ph-êng Mai DÞch HERMANN 1 67 00008101 09-0105-079 09-0105-079 1 H Néi City Bå §Ò Bå §Ò 2 72 00024101 09-0110-222 09-0110-222 1 H Néi City Duyªn H Duyªn H 2 72 00026101 09-0105-093 09-0105-093 1 H Néi City §«ng H §«ng H 2 80 00216102 09-0105-091 09-0105-091 1 H Néi City Yªn Viªn Yªn Viªn 2 81 00007102 09-0109-175 09-0109-175 1 H Néi City B¾c S¬n B¾c S¬n B 2 84 00160101 09-0106-134 09-0106-134 1 H Néi City Thanh L-¬ng Thanh L-¬ng 1 84 00086101 09-0106-136 09-0106-136 1 H Néi City Lª §¹i H nh V©n Hå 1 85 00070101 09-0111-237 09-0111-237 1 H Néi City Kh-¬ng §×nh Kh-¬ng §×nh 1 86 00143101 09-0108-172 09-0108-172 1 H Néi City Qu¶ng An Qu¶ng An 2 87 00134101 09-0107-153 09-0107-153 1 H Néi City Phóc T©n Phóc T©n 1 88 00117101 09-0108-170 09-0108-170 1 H Néi City NhËt T©n NhËt T©n 1 89 00200101 09-0105-110 09-0105-110 1 H Néi City V¨n §øc V¨n §øc 2 90 00016105 09-0101-005 09-0101-005 1 H Néi City Ph-êng Cèng VÞ NguyÔn Siªu 1 91 00052101 09-0105-087 09-0105-087 1 H Néi City Giang Biªn Giang Biªn 2 91 00209104 09-0110-209 09-0110-209 1 H Néi City VÜnh Tuy VÜnh Tuy 2 93 00192101 09-0106-135 09-0106-135 1 H Néi City Tr-¬ng §Þnh Trung HiÒn 1 94 00194101 09-0110-220 09-0110-220 1 H Néi City Tø HiÖp Tø HiÖp 2 94 00019101 09-0112-252 09-0112-252 1 H Néi City Cæ NhuÕ Cæ NhuÕ A 2 95 00185101 09-0102-026 09-0102-026 1 H Néi City Ph-êng Trung Ho Trung Ho 1 96 00210101 09-0103-049 09-0103-049 1 H Néi City Vâng La Vâng La 2 98 00140102 09-0102-025 09-0102-025 1 H Néi City Ph-êng Quan Hoa Quan Hoa 1 99 00181102 09-0105-106 09-0105-106 1 H Néi City Tr©u Quú Tr©u Quú 2 100 00065101 09-0110-218 09-0110-218 1 H Néi City Ho ng LiÖt Ho ng LiÖt 2 102 00116101 09-0111-236 09-0111-236 1 H Néi City Nh©n ChÝnh Nh©n ChÝnh 1 102 558 Vietnam Reading and Mathematics Assessment Study Table 4.4: Hypothetical Population for the Illustration of Probability Proportional to Size Selection Stratum School Class No. Pupils Cumulative "Tickets" School Class 1 1 1 45 20 20 1-45 2 25 45 2 3 60 15 60 46-105 4 20 80 5 25 105 3 6 95 25 130 7 30 160 106-200 8 25 185 9 15 200 Sub-total 3 9 200 2 4 10 45 10 10 1-45 11 15 25 12 20 45 5 13 110 20 65 46-155 14 25 90 15 30 120 16 35 155 6 17 120 35 190 156-275 18 40 230 19 45 275 7 20 125 50 325 276-400 21 75 400 Sub-total 4 12 400 Total 7 21 600 The ratio of the number of tickets to the number of winning tickets, known as the "sampling interval", is 200/2 = 100. That is, each ticket in the first stratum should have a 1 in 100 chance of being drawn as a winning ticket. Note, that in the case of the second stratum, the sampling interval would be 400/2 = 200. The winning tickets for the first stratum are drawn by using a random start- constant interval procedure whereby a random number in the interval 1 to 100 is selected as the first winning ticket and the second ticket is selected by adding an increment of 100 to this number. With a random start of 65, the winning ticket numbers would be 65 and 165. This would result in the selection of School 2 (which holds tickets 46-105) and School 3 (which holds tickets 106-200). The chance of selecting any school is proportional to the number of tickets held and therefore each of these schools is selected with probability proportional to the number of pupils 559 Vietnam Reading and Mathematics Assessment Study in the defined target population. The winning tickets for the second stratum are similarly selected using a random start-constant interval approach in which the random start is a random number between 1 and 200, and the constant interval is 200. The Calculation of Sampling Weights Consider a population of pupils which may be described according to the notation presented in Table 4.5. The following discussion is based on the use of two-stage sampling procedures in which the first stage of sampling consists of the PPS selection of schools followed by the selection of a simple random sample of pupils in selected schools. From Stratum h of the population select ah schools with PPS, and then select a simple random sample of nhi pupils within each selected school. For this sample design, the probability of selecting pupil k in class j from school i within Stratum h would be the product of the probability of selecting the pupil's school at the first stage and the probability of selecting the pupil k within school i at the second stage. p = (ah x Nhi / Nh) x (nhi / Nhi) = (ah x nhi) / Nh This application of PPS sampling removes the influence of school size, Nhi, from the calculation of the probability of selecting pupil k. Note that, if the value of nhi is constant within strata, then the numerator of the above equation is constant and equal to nh within strata. In this special case, p = nh / Nh is a constant for all pupils within a particular stratum. The application of complex multi-stage sampling results in unequal probabilities of selection and therefore sampling weights need to be added to the pupil data files prior to the estimation of population characteristics. The calculation of sampling weights follows the classical (Horvitz-Thompson) procedure of assigning each pupil a weight that is proportional to the reciprocal of the probability of including a pupil in the sample. The reciprocals of these probabilities are sometimes referred to as "raising factors" because they refer to the number of elements in the population that are "represented" by the various sample elements. raising factor = (Nh / (ah x nhi) 560 Vietnam Reading and Mathematics Assessment Study These raising factors are often multiplied by a constant so that the "weighted sample size" is equal to the achieved sample size. In this case the constant would be n/N and the sampling weights would be as follows. weight = (Nh x n) / (ah x nhi x N) Table 4.5: Notation used in Discussion of Sample Designs Units Coverage of units Schools Classes Pupils Total Sample Total Sample Total Sample Population A A B B N n Stratum h Ah ah Bh bh Nh nh School i (Stratum h) - - Bhi bhi Nhi nhi Class j (School i in Stratum h) - - - - Nhij Note: 1. The notation conventions for sample designs described in this manual have been listed in Table 4.5. The table entries describe the number of "units" (schools, classes, or pupils) associated with each of four levels of "coverage" (Population, Stratum h, School i, or Class j). Note: 2. For example, the symbol A has been used to refer to the total number of schools ("units") in the Population ("coverage"), whereas the symbol Ah has been used to describe the total number of schools ("units") in Stratum h ("coverage"). Similarly, the symbol n has been used to refer to the number of pupils in the sample, whereas the symbol nhij has been used to refer to the number of pupils in the sample associated with Class j (situated in School I within Stratum h). Note: 3. In the special situation where intact classes (whole classes) are employed in a sample design, the total number of pupils in Class j, Nhij, would be equal to the number of pupils in the sample associated with Class j, nhij. Similarly, for sample designs that employ intact schools the value of Nhi would be equal to nhi. One of the consequences of this approach to weighting is that the weighted mean score for a school variable refers to a school characteristic experienced by "the average pupil" - but not necessarily to a characteristic of "the average school". Similarly, a weighted mean score for a teacher variable referred to a teacher characteristic experienced by "the average pupil" - but not necessarily a characteristic of "the average teacher". In most "real" school system sampling situations, the number of students in the defined target population within each school listed on the sampling frame is slightly different from the actual number of students. This occurs because sampling frames are usually developed from data collected at some earlier time - often a year prior to the selection of the sample of schools. That is, rather than finding Nhi students in school i within stratum h, we often find Nhi (actual). 561 Vietnam Reading and Mathematics Assessment Study In addition, due to occasional absenteeism on the day of data collection, instead of being able to test nhi students in a sample school we often only manage to collect data from nhi (actual). Given these two deviations, the actual probability (assuming random loss of data) of selecting a student in school i within stratum h may be written as follows. p = (ah x Nhi / Nh) x (nhi (actual) / Nhi (actual)) = (ah x Nhi x nhi (actual)) / (Nh x Nhi (actual)) In this case we have: "revised' raising factor = (Nh x Nhi (actual)) / (ah x Nhi x nhi (actual)) To obtain the "revised weights" we multiply the revised raising factor by the achieved total sample size, and then divide by the sum of the revised raising factor across all students in the achieved sample. In the Vietnam Grade 5 Survey the revised weights were referred to as "pweight2" on the data files. A further weight was prepared, "pweight3", by post-stratifying the data using more recent estimates of the total population of students in the defined target population in each Province. The raising factor linked to this sampling weight, labelled RF3 on the data file, provided a mechanism for estimating population totals for different important independent variables. For example, by using RF3 it was possible to estimate the total numbers of pupils in the defined target population that were attending isolated, rural, and urban schools; or who had their own reader, were sharing a reader, or were without a reader. Operational Procedures for the Selection of Pupils within Schools A critical component of the sample design for this study was concerned with the selection of pupils within selected classes. It was decided that these selections should be placed under the control of trained data collectors - after they were provided with materials that would ensure that a simple random sample of pupils was selected in each selected school. The data collectors were informed that it was not acceptable to permit school principals or classroom teachers to have any influence over the sampling procedures within schools. These groups of people may have had a vested interest in selecting particular kinds of pupils, and this may have resulted in major distortions of sample estimates (Brickell, 1974). 562 Vietnam Reading and Mathematics Assessment Study For the Vietnam Grade 5 Survey the data collector initially explained to the School Head in a selected school that a "mechanical procedure" would be used to select the sample of 20 pupils. The data collector then applied the following set of instructions in order to ensure that a simple random sample of pupils was selected. Step 1: Obtain Grade 5 register(s) of attendance These registers were obtained for all Grade 5 pupils that attended normal (not "special") classes. In multiple session schools, both morning and afternoon registers were obtained. Step 2: Assign sequential numbers to all Grade 5 pupils A sequential number was then placed beside the name of each Grade 5 pupil. Example: Consider a school with one session and a total of 48 pupils in Grade 5. Commence by placing the number "1" beside the first pupil on the Register; than place the number "2" beside the second pupil on the Register; ...etc. ...; finally, place the number "48" beside the last pupil on the Register. Another example: Consider a school with 42 pupils in the morning session and 48 pupils in the afternoon session of Grade 5. Commence by placing the number "1" beside the first pupil on the morning register; ... etc. ...; then place a "42" beside the last pupil on the morning register; then place a "43" beside the first pupil on the afternoon register; ... etc. ...; finally place a "90" beside the last pupil on the afternoon register. Step 3: Locate the appropriate set of selection numbers In Appendix 4.2 sets of "selection numbers" have been listed for a variety of school sizes. (Note that only the sets relevant for school sizes in the range 21 to 400 have been presented in Appendix 4.2. For example, if a school had 48 pupils in Grade 5, then the appropriate set of selection numbers was listed under the "R48" heading. Similarly, if a school had 90 Grade 5 pupils then the appropriate set of selection numbers was listed under the "R90" heading. Step 4: Use the appropriate set of selection numbers After locating the appropriate set of selection numbers, these were used to select the sample of 20 pupils. The first selection number was used to locate the Grade 5 pupil with the same sequential number on the Register(s). The second selection number was used to locate the Grade 5 pupil with the same sequential number on the Register(s). This process was continued until the full set of 20 selection numbers has been used. Example: From Appendix 4.2 we see that in a school with a total of 50 pupils in Grade 5 the first pupil selected has sequential number "2"; the second pupil 563 Vietnam Reading and Mathematics Assessment Study selected has sequential number "4"; ... etc. ...; the twentieth pupil selected has sequential number "50". The Response Rates Achieved for the Study Following the execution of the sample design, it was possible to review the performance of the fieldwork phase of the study by considering the response rates achieved for the schools and pupils. The planned and achieved sample sizes for schools, pupils, teachers, and school heads have been presented in Table 4.6. The overall response rates were 99.9 percent for schools, 99.3 percent for pupils, 98.6 percent for teachers, and 99.8 percent for school heads. These figures were, by world standards, absolutely magnificent response rates - given the magnitude of this data collection and the speed with which it was undertaken. 564 Vietnam Reading and Mathematics Assessment Study Table 4.6:The Planned and Achieved Samples of Schools and Pupils Schools Pupils Teachers School heads Province Planned Achieved % % % % response Planned Achieved response Planned Achieved response Planned Achieved response Ha Noi 60 60 100 1200 1191 99.25 120 120 100 60 60 100 Hai Phong 60 60 100 1200 1193 99.42 120 119 99.17 60 60 100 Ha Tay 60 60 100 1200 1196 99.67 120 120 100 60 60 100 Hai Duong 60 60 100 1200 1197 99.75 120 120 100 60 60 100 Hung Yen 60 60 100 1200 1194 99.5 120 117 97.5 60 60 100 Ha Nam 60 60 100 1200 1200 100 120 120 100 60 60 100 Nam Dinh 60 60 100 1200 1200 100 120 120 100 60 60 100 Thai Binh 60 60 100 1200 1200 100 120 119 99.17 60 60 100 Ninh Binh 60 60 100 1200 1182 98.5 120 119 99.17 60 60 100 Ha Giang 59 58 98.31 1200 1139 94.92 118 108 91.53 59 59 100 Cao Bang 59 59 100 1200 1194 99.5 118 107 90.68 59 59 100 Lao Cai 60 58 96.67 1200 1141 95.08 120 112 93.33 60 58 96.67 Bac Kan 60 60 100 1200 1191 99.25 120 118 98.33 60 60 100 Lang Son 60 60 100 1200 1196 99.67 120 118 98.33 60 60 100 Tuyen Quang 60 60 100 1200 1194 99.5 120 118 98.33 60 60 100 Yen Bai 60 60 100 1200 1183 98.58 120 115 95.83 60 60 100 Thai Nguyen 60 60 100 1200 1188 99 120 118 98.33 60 60 100 Phu Tho 60 60 100 1200 1199 99.92 120 120 100 60 60 100 Vinh Phuc 60 60 100 1200 1200 100 120 120 100 60 60 100 Bac Giang 60 60 100 1200 1195 99.58 120 120 100 60 60 100 Bac Ninh 60 60 100 1200 1199 99.92 120 120 100 60 60 100 Quang Ninh 60 59 98.33 1200 1167 97.25 120 116 96.67 60 59 98.33 Lai Chau 58 58 100 1200 1190 99.17 116 111 95.69 58 58 100 Son La 60 60 100 1200 1162 96.83 120 117 97.5 60 59 98.33 Hoa Binh 60 60 100 1200 1194 99.5 120 119 99.17 60 60 100 Thanh Hoa 60 60 100 1200 1195 99.58 120 120 100 60 60 100 Nghe An 60 60 100 1200 1186 98.83 120 119 99.17 60 60 100 Ha Tinh 60 60 100 1200 1200 100 120 120 100 60 59 98.33 Quang Binh 60 60 100 1200 1197 99.75 120 119 99.17 60 60 100 Quang Tri 59 59 100 1200 1193 99.42 118 116 98.31 59 59 100 Thua Thien - Hue 60 60 100 1200 1199 99.92 120 119 99.17 60 60 100 Da Nang 53 53 100 1200 1195 99.58 106 106 100 53 53 100 Quang Nam 60 60 100 1200 1192 99.33 120 119 99.17 60 60 100 Quang Ngai 60 60 100 1200 1197 99.75 120 119 99.17 60 59 98.33 Binh Dinh 60 60 100 1200 1198 99.83 120 120 100 60 60 100 Phu Yen 60 60 100 1200 1195 99.58 120 120 100 60 60 100 Khanh Hoa 60 60 100 1200 1187 98.92 120 118 98.33 60 60 100 Kon Tum 56 56 100 1200 1191 99.25 112 108 96.43 56 56 100 Gia Lai 60 60 100 1200 1192 99.33 120 114 95 60 60 100 Dak Lak 60 60 100 1200 1191 99.25 120 119 99.17 60 60 100 Ho Chi Minh 60 60 100 1200 1194 99.5 120 120 100 60 60 100 Lam Dong 60 60 100 1200 1188 99 120 116 96.67 60 59 98.33 Ninh Thuan 58 58 100 1200 1195 99.58 116 114 98.28 58 58 100 Binh Phuoc 60 60 100 1200 1195 99.58 120 118 98.33 60 60 100 Tay Ninh 60 60 100 1200 1199 99.92 120 118 98.33 60 60 100 Binh Duong 59 59 100 1200 1197 99.75 118 117 99.15 59 59 100 Dong Nai 60 60 100 1200 1197 99.75 120 120 100 60 60 100 Binh Thuan 60 60 100 1200 1199 99.92 120 120 100 60 60 100 Ba Ria - Vung Tau 60 60 100 1200 1195 99.58 120 120 100 60 59 98.33 Long An 59 59 100 1200 1195 99.58 118 117 99.15 59 59 100 Dong Thap 60 60 100 1200 1196 99.67 120 119 99.17 60 60 100 An Giang 60 60 100 1200 1190 99.17 120 120 100 60 60 100 Tien Giang 60 60 100 1200 1195 99.58 120 120 100 60 60 100 Vinh Long 60 60 100 1200 1197 99.75 120 118 98.33 60 60 100 Ben Tre 60 60 100 1200 1193 99.42 120 120 100 60 60 100 Kien Giang 60 60 100 1200 1185 98.75 120 118 98.33 60 60 100 Can Tho 60 60 100 1200 1191 99.25 120 118 98.33 60 60 100 Tra Vinh 60 60 100 1200 1198 99.83 120 120 100 60 60 100 Soc Trang 59 59 100 1200 1193 99.42 118 118 100 59 59 100 Bac Lieu 60 60 100 1200 1190 99.17 120 120 100 60 60 100 Ca Mau 60 60 100 1200 1185 98.75 120 120 100 60 60 100 Total 3639 3635 99.89 73200 72660 99.26 7278 7178 98.63 3639 3631 99.78 565 Vietnam Reading and Mathematics Assessment Study 566 Vietnam Reading and Mathematics Assessment Study 11. The Calculation of Sampling Errors The sample design employed in this study departed markedly from the usual "textbook model" of simple random sampling. This departure demanded that special steps be taken in order to calculate measures of the stability of sample estimates derived from the data. It was decided that the most appropriate approach would be to use the Jackknife procedure to make the necessary calculations. This procedure was applied after all data had been collected, cleaned, and analysed. In the following paragraphs, a brief overview has been presented of the notions of "sampling error" and "the accuracy of individual sample estimates". This has been followed by a description of some of the results of applying the Jackknife procedure to the pupil reading test used in the survey. (a) The Notion of Sampling Error Consider a probability sample of n elements that is used to calculate the sample mean, x, as an estimate of the population mean, X . If an infinite set of samples of size n were drawn independently from this population and the sample mean calculated for each of these samples, then the average of the resulting sampling distribution of sample means, the expected value of x , could be denoted by E (x). The accuracy of the sample statistic, x, as an estimator of the population parameter, X, may be summarized in terms of the mean square error (MSE). The MSE is defined as the average of the squares of the deviations of all possible sample estimates from the value being estimated (Hansen, et al, 1953). MSE (x) = E (x - X)2 = E (x - E (x))2 + (E (x) - X)2 = variance of x + (bias of x)2 A sample design is unbiased if E (x) = X . It is important to remember that "bias" is not a property of a single sample, but of the entire sampling distribution, and that it belongs neither to the selection nor the estimation procedure alone, but to both jointly. For most well designed samples in survey research, bias is usually very small - tending towards zero with increasing sample size. The accuracy of sample estimates is therefore generally assessed in terms of the variance of x , denoted var (x), which quantifies the sampling stability of the values of x around their expected value E (x). 567 Vietnam Reading and Mathematics Assessment Study (b) The Accuracy of Individual Sample Estimates In educational settings the researcher is usually dealing with a single sample of data and not with all possible samples from a population. The variance of sample estimates as a measure of sampling accuracy cannot therefore be calculated exactly. Fortunately, for many probability sample designs, statistical theory may be used to derive formulae which provide estimates of the variance based on the internal evidence of a single sample of data. For a simple random sample of n elements drawn without replacement from a population of N elements, the variance of the sample mean may be estimated from a single sample of data by using the following formula: var(x) = (N - n) / N . s2/n where s2 is the usual sample estimate of the variance of the element values in the population, (Kish, 1965 p. 41). For sufficiently large values of N, the value of the finite population correction, (N - n)/N, tends toward unity. The variance of the sample mean in this situation may therefore be estimated by s2/n. The sampling distribution of the sample mean is approximately normally distributed for many survey research situations. The approximation improves with increased sample size - even though the distribution of elements in the parent population may be far from normal. This characteristic of sampling distributions is known as the Central Limit Theorem and it occurs not only for the sample mean but also for most estimators commonly used to describe survey research results (Kish, 1965). From a knowledge of the properties of the normal distribution we know that we can be "68 percent confident" that the range x ± se (x) includes the population mean, where x is the sample mean obtained from a single sample and se (x), often called the standard error, is the square root of var (x). Similarly the range x ± 1.96 se (x) will include the population mean with 95 percent confidence. While the above discussion has concentrated mostly on sample means derived from simple random samples, the same approach may be used to establish confidence limits for many other statistics derived from various types of sample designs. For example, confidence limits may be calculated for complex statistics such as correlation coefficients, regression coefficients, and multiple correlation coefficients (Ross, 1978). (c) Comparison of the Accuracy of Probability Samples The accuracy of probability samples is usually considered by examining the 568 Vietnam Reading and Mathematics Assessment Study variance associated with a particular sample estimate for a given sample size. This approach to the evaluation of sampling accuracy has generally been based on the recommendation put forward by Kish (1965) that the simple random sample design should be used as a standard for quantifying the accuracy of sample designs that incorporate such complexities as stratification and clustering. Kish introduced the term "deff" (design effect) to describe the ratio of the variance of the sample mean for a complex sample design (denoted c) to the variance of the sample mean for a simple random sample (denoted srs) of the same size. That is, deff = var (xc) / var (xsrs) For the kinds of complex sample designs that are commonly used in educational research, the values of deff for many statistics are often much greater than unity. Consequently, the accuracy of sample estimates may be grossly overestimated if formulae based on simple random sampling assumptions are used to calculate sampling errors. The potential for arriving at false conclusions by using incorrect sampling error calculations has been illustrated in a study carried out by Ross (1976). (d) Error estimation for complex probability samples The computational formulae required to estimate the variance of descriptive statistics, such as sample means, are available for some probability sample designs which incorporate complexities such as stratification and cluster sampling. However, for many commonly-employed statistics, the required formulae are not readily available for sample designs which depart markedly from the model of simple random sampling. These formulae are either enormously complicated or, ultimately, they prove resistant to mathematical analysis (Frankel, 1971). In the absence of suitable formulae, a variety of empirical techniques have emerged in recent years which provide "approximate variances that appear satisfactory for practical purposes" (Kish, 1978 p. 20). The most frequently applied empirical techniques may be divided into two broad categories: Subsample Replication and Taylor's Series Approximation. In Subsample Replication a total sample of data is used to construct two or more subsamples and then a distribution of parameter estimates is generated by using each subsample. The subsample results are analysed to obtain an estimate of the parameter, as well as a confidence assessment for that estimate (Finifter, 1972 p. 114). The main approaches in using this technique have been Independent Replication (Deming, 1960), Jackknifing (Tukey, 1958), Balanced Repeated Replication (McCarthy, 1966). In this study it was decided to apply the Jackknife procedure by using the IIEPJACK software developed as a joint project of the Westat Corporation and the IIEP. This software permitted the calculation of sampling errors for all summary statistics reported for the survey. 569 Vietnam Reading and Mathematics Assessment Study The Jackknife approach to the calculation of sampling errors for the Vietnam Grade 5 Survey data required repeated analyses to be made for each statistic such that each repetition represented an analysis with one of the sample schools removed. In this study this required 3,660 full runs of the analysis (with a slightly reduced data set) for each tabulation that was required. Some preliminary test runs with the fastest desktop computers available showed that to generate a single basic tabulation with appropriate sampling error estimates would take between 1 to 2 hours. Given that some hundreds of tabulations were required for the study report, and that many tabulations had to be created and then revised on a number of occasions, this large amount of processing time was problematic. Following consultations with several professional sampling statisticians, it was decided to "reconstruct" the data file so that it could be presented to the IIEPJACK software in a manner that reduced data processing time dramatically - without preventing the construction of accurate estimates of sampling error. The sample design was based on 61 strata (Provinces) and 60 schools were selected with PPS from each province. The first step in "reconstructing" the data file was to allocate these 3,660 schools to 60 "baskets". The first basket was filled by randomly selecting one school from each stratum. The second basket was filled in the same manner - until there were 60 baskets - each containing 61 schools. Each basket therefore represented one of 60 replications of a sampling procedure that consisted of 61 strata with one school selected with probability proportional to size from each stratum. This "reconstruction" of the sample data provided 60 replicates - but with no stratification of the replicates. The reconstructed data file was then analyzed using the IIEPJACK software under the conditions that (a) "baskets" were primary sampling units, and (b) no stratification was applied. When the data were analysed after the "baskets" were formed, the required tabulations were produced in 2 to 3 minutes, rather than 1 to 2 hours. This massive improvement in processing speed flowed from the reduction in number of repeated analyses required for the Jackknife procedure. In Table 4.7 the IIEPJACK calculations for provincial means on the pupil Reading test and their standard errors of sampling have been presented. The two tables of information show the results of applying a "traditional" Jackknife analysis with 3,660 replicated analyses (undertaken by dropping one school at a time) and the "basket approach" with 60 replicated analyses (undertaken by dropping one basket at a time). The "traditional" table took more than an hour to prepare with the IIEPJACK software, whereas the "basket approach" table took around 2 minutes. It may be seen that the mean scores for provinces and Vietnam overall were exactly the same for both approaches. There were some very small differences between the traditional and basket approach in the estimated sampling errors for the overall Vietnam 570 Vietnam Reading and Mathematics Assessment Study mean and for several of the 61 Province means. The differences for Provinces occurred where there were some missing schools. The final row of figures in the basket approach showed that the overall mean for Vietnam was 500, the standard error was 1.30, and the effective sample size was around 5,900. This information indicated that we could be 95 percent confident that the population mean was in the range of: 500 ± 2(1.30) = 497.4 to 502.6 This overall level of sampling accuracy was excellent because it was equivalent in accuracy to a single random sample of 5,900 pupils. This was far in excess of the benchmark of an effective sample size of 400 pupils that is normally applied for the national level in most well-designed educational survey research studies. The effective sample sizes for the provinces were a little disappointing because most of them were below the original planned benchmark of 178 pupils - which was selected because it provided 95 percent confidence limits of plus or minus 7.5 percent for percentages and plus or minus 0.15 of a student standard deviation for means. The average effective sample size across the 61 Provinces was 114 pupils - which showed that, on average, the Provincial estimates provided 95 percent confidence limits of slightly better than plus or minus 10 percent for percentages and slightly better than plus or minus 0.2 of a student standard deviation for means. 571 Vietnam Reading and Mathematics Assessment Study Table 4.7: Two Approaches to the Calculation of Sampling Errors Traditional Approach Basket Approach (3660 replications) (60 replications) Province Mean(PRD500) SE DEFF ESS Province Mean(PRD500) SE DEFF ESS Ha Noi 555,12 7,92 7,58 156,84 Ha Noi 555,12 7,92 7,58 156,84 Hai Phong 520,00 8,49 9,45 126,18 Hai Phong 520,00 8,49 9,45 126,18 Ha Tay 516,20 10,03 11,79 100,96 Ha Tay 516,20 10,03 11,79 100,96 Hai Duong 545,14 11,31 13,06 91,48 Hai Duong 545,14 11,31 13,06 91,48 Hung Yen 536,37 9,42 11,66 102,36 Hung Yen 536,37 9,42 11,66 102,36 Ha Nam 488,08 6,75 7,41 161,99 Ha Nam 488,08 6,75 7,41 161,99 Nam Dinh 538,91 7,14 8,49 141,22 Nam Dinh 538,91 7,14 8,49 141,22 Thai Binh 553,34 10,19 13,54 88,62 Thai Binh 553,34 10,19 13,54 88,62 Ninh Binh 473,55 6,53 7,35 160,90 Ninh Binh 473,55 6,53 7,35 160,90 Ha Giang 465,78 9,21 11,88 95,74 Ha Giang 465,78 9,21 11,86 95,89 Cao Bang 455,30 12,86 16,21 73,59 Cao Bang 455,30 12,82 16,12 74,00 Lao Cai 530,53 8,33 12,44 91,76 Lao Cai 530,53 8,33 12,44 91,71 Bac Kan 471,45 11,91 16,03 74,23 Bac Kan 471,45 11,91 16,03 74,23 Lang Son 454,17 9,88 12,99 92,09 Lang Son 454,17 9,88 12,99 92,09 Tuyen Quang 446,96 9,51 12,30 96,51 Tuyen Quang 446,96 9,51 12,30 96,51 Yen Bai 473,30 13,53 15,35 77,02 Yen Bai 473,30 13,53 15,35 77,02 Thai Nguyen 527,01 11,71 14,13 84,09 Thai Nguyen 527,01 11,71 14,13 84,09 Phu Tho 519,97 10,33 12,63 94,80 Phu Tho 519,97 10,33 12,63 94,80 Vinh Phuc 489,57 9,61 11,31 106,00 Vinh Phuc 489,57 9,61 11,31 106,00 Bac Giang 486,08 7,74 8,92 133,88 Bac Giang 486,08 7,74 8,92 133,88 Bac Ninh 571,09 10,36 14,22 84,25 Bac Ninh 571,09 10,36 14,22 84,25 Quang Ninh 578,41 12,52 18,73 62,32 Quang Ninh 578,41 12,52 18,73 62,30 Lai Chau 506,51 8,92 13,35 89,15 Lai Chau 506,51 8,91 13,32 89,32 Son La 482,67 12,01 15,90 73,06 Son La 482,67 12,01 15,91 73,04 Hoa Binh 460,87 11,22 14,24 83,83 Hoa Binh 460,87 11,22 14,24 83,83 Thanh Hoa 498,17 8,80 10,40 114,96 Thanh Hoa 498,17 8,80 10,40 114,96 Nghe An 499,49 10,80 13,80 85,86 Nghe An 499,49 10,80 13,80 85,86 Ha Tinh 532,93 9,99 11,78 101,84 Ha Tinh 532,93 9,99 11,78 101,84 Quang Binh 528,84 9,35 13,08 91,29 Quang Binh 528,84 9,35 13,08 91,29 Quang Tri 497,45 7,37 8,40 142,02 Quang Tri 497,45 7,35 8,36 142,74 Thua Thien - Hue 523,38 9,86 13,22 90,71 Thua Thien - Hue 523,38 9,86 13,22 90,71 Da Nang 549,77 8,55 9,34 127,90 Da Nang 549,77 7,87 7,91 151,00 Quang Nam 490,97 8,03 9,26 128,48 Quang Nam 490,97 8,03 9,26 128,48 Quang Ngai 473,15 9,13 10,71 111,79 Quang Ngai 473,15 9,13 10,71 111,79 Binh Dinh 494,29 10,61 14,55 82,33 Binh Dinh 494,29 10,61 14,55 82,33 Phu Yen 487,36 6,08 6,19 192,96 Phu Yen 487,36 6,08 6,19 192,96 Khanh Hoa 481,39 5,71 5,44 218,11 Khanh Hoa 481,39 5,71 5,44 218,11 Kon Tum 454,06 15,25 20,08 59,31 Kon Tum 454,06 14,53 18,23 65,32 Gia Lai 498,25 11,45 14,59 81,57 Gia Lai 498,25 11,45 14,59 81,57 Dak Lak 509,59 9,76 12,27 96,89 Dak Lak 509,59 9,76 12,27 96,89 Ho Chi Minh 541,15 7,77 8,01 149,03 Ho Chi Minh 541,15 7,77 8,01 149,03 Lam Dong 508,74 9,72 11,40 104,22 Lam Dong 508,74 9,72 11,40 104,22 Ninh Thuan 456,06 7,20 7,74 154,44 Ninh Thuan 456,06 7,13 7,57 157,87 Binh Phuoc 471,95 8,03 9,06 131,73 Binh Phuoc 471,95 8,03 9,06 131,73 Tay Ninh 473,54 6,81 8,23 145,77 Tay Ninh 473,54 6,81 8,23 145,77 Binh Duong 516,92 7,36 8,37 142,97 Binh Duong 516,92 7,35 8,36 143,17 Dong Nai 505,96 5,00 4,22 283,58 Dong Nai 505,96 5,00 4,22 283,58 Binh Thuan 474,22 6,95 8,28 144,35 Binh Thuan 474,22 6,95 8,28 144,35 Ba Ria - Vung Tau 511,07 6,16 6,93 172,36 Ba Ria - Vung Tau 511,07 6,16 6,93 172,36 Long An 478,31 7,41 10,15 117,70 Long An 478,31 6,80 8,56 139,65 Dong Thap 470,96 9,01 11,61 103,00 Dong Thap 470,96 9,01 11,61 103,00 An Giang 460,50 9,53 12,36 96,15 An Giang 460,50 9,53 12,36 96,15 Tien Giang 499,76 9,09 11,48 104,11 Tien Giang 499,76 9,09 11,48 104,11 Vinh Long 491,96 9,59 10,58 113,13 Vinh Long 491,96 9,59 10,58 113,13 Ben Tre 491,08 6,87 7,69 154,95 Ben Tre 491,08 6,87 7,69 154,95 Kien Giang 449,39 9,72 13,13 90,23 Kien Giang 449,39 9,72 13,13 90,23 Can Tho 457,87 9,71 12,56 94,79 Can Tho 457,87 9,71 12,56 94,79 Tra Vinh 434,88 8,11 11,30 106,04 Tra Vinh 434,88 8,11 11,30 106,04 Soc Trang 432,98 8,07 11,85 100,64 Soc Trang 432,98 8,05 11,81 100,92 Bac Lieu 452,77 14,11 21,82 54,39 Bac Lieu 452,77 14,11 21,82 54,39 Ca Mau 463,29 10,28 13,53 87,52 Ca Mau 463,29 10,28 13,53 87,52 Vietnam 500,00 1,36 13,47 5390,23 Vietnam 500,00 1,30 12,31 5900,56 572 Vietnam Reading and Mathematics Assessment Study These results for provinces arose because the intraclass correlation coefficient that was estimated for provinces before the sample was selected was somewhat lower than the values that were calculated after the data were collected. In fact, the average value of the intraclass correlation coefficient across the 61 provinces was 0.51 - which was much larger than the prior estimate of 0.30. Future studies of the quality of education in Vietnam will be able to profit from the values of the intraclass correlation coefficient that are now known for all provinces in the area of reading and mathematics. 12.Conclusion This report has described the sample design procedures employed during 2001 for a national survey of Grade 5 pupils in Vietnam. The information generated about the magnitude of the values of the intraclass correlation coefficient by this survey will provide sound guidance for the design of future sample surveys at both the provincial and national levels. The main stages required to prepare the sample design (Design Constraints, Specification of Target Populations, Stratification, Use of Sample Design Tables, Construction of Sampling Frames, School and Pupil Selection, Calculation of Sampling Weights and Sampling Errors) were presented in association with some results drawn from the survey. The sample design used for the survey addressed all prior administrative constraints. The calculation of sampling errors for the pupil reading test demonstrated that the sample design had satisfied the prior sampling error specifications at the national level - but had not quite reached the proposed benchmark of an effective sample size of 178 pupils at the provincial level. Throughout this report all estimates of percentages and means have been accompanied by their appropriate standard errors of sampling. These statistics were calculated using the IIEPJACK software - so that all complexities in the sample design (clustering, multiple stages of sampling, stratification, and disproportionate selection across strata) were adjusted for in the calculations. 573 Vietnam Reading and Mathematics Assessment Study Appendix 4.1: Sample Design Tables for rho = 0.1, 0.2, 0.3 Cluster 95% Confidence Limits for Means/Percentages Size ±0.05s/±2.5% ±0.1s/±5.0% ±0.15s/±7.5% ±0.2s/±10.0% b a n a n a n a n roh = 0.1 1 (SRS) 1600 1600 400 400 178 178 100 100 2 880 1760 220 440 98 196 55 110 5 448 2240 112 560 50 250 28 140 10 304 3040 76 760 34 340 19 190 15 256 3840 64 960 29 435 16 240 20 232 4640 58 1160 26 520 15 300 30 208 6240 52 1560 24 720 13 390 40 196 7840 49 1960 22 880 13 520 50 189 9450 48 2400 21 1050 12 600 roh = 0.2 1 (SRS) 1600 1600 400 400 178 178 100 100 2 960 1920 240 480 107 214 60 120 5 576 2880 144 720 65 325 36 180 10 448 4480 112 1120 50 500 28 280 15 406 6090 102 1530 46 690 26 390 20 384 7680 96 1920 43 860 24 480 30 363 10890 91 2730 41 1230 23 690 40 352 14080 88 3520 40 1600 22 880 50 346 17300 87 4350 39 1950 22 1100 roh = 0.3 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1040 2080 260 520 116 232 65 130 5 704 3520 176 880 79 395 44 220 10 592 5920 148 1480 66 660 37 370 15 555 8325 139 2085 62 930 35 525 20 536 10720 134 2680 60 1200 34 680 30 518 15540 130 3900 58 1740 33 990 40 508 20320 127 5080 57 2280 32 1280 50 503 25150 126 6300 56 2800 32 1600 575 Vietnam Reading and Mathematics Assessment Study Appendix 4.1 (Cont'd): Sample Design Tables for rho = 0.4, 0.5, 0.6 Cluster 95% Confidence Limits for Means/Percentages Size ±0.05s/±2.5% ±0.1s/±5.0% ±0.15s/±7.5% ±0.2s/±10.0% b a n a n a n a n roh = 0.4 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1120 2240 280 560 125 250 70 140 5 832 4160 208 1040 93 465 52 260 10 736 7360 184 1840 82 820 46 460 15 704 10560 176 2640 79 1185 44 660 20 688 13760 172 3440 77 1540 43 860 30 672 20160 168 5040 75 2250 42 1260 40 664 26560 166 6640 74 2960 42 1680 50 660 33000 165 8250 74 3700 42 2100 roh = 0.5 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1200 2400 300 600 134 268 75 150 5 960 4800 240 1200 107 535 60 300 10 880 8800 220 2200 98 980 55 550 15 854 12810 214 3210 95 1425 54 810 20 840 16800 210 4200 94 1880 53 1060 30 827 24810 207 6210 92 2760 52 1560 40 820 32800 205 8200 92 3680 52 2080 50 816 40800 204 10200 91 4550 51 2550 roh = 0.6 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1280 2560 320 640 143 286 80 160 5 1088 5440 272 1360 122 610 68 340 10 1024 10240 256 2560 114 1140 64 640 15 1003 15045 251 3765 112 1680 63 945 20 992 19840 248 4960 111 2220 62 1240 30 982 29460 246 7380 110 3300 62 1860 40 976 39040 244 9760 109 4360 61 2440 50 973 48650 244 12200 109 5450 61 3050 576 Vietnam Reading and Mathematics Assessment Study Appendix 4.1 (Cont'd): Sample Design Tables for rho = 0.7, 0.8, 0.9 Cluster 95% Confidence Limits for Means/Percentages Size ±0.05s/±2.5% ±0.1s/±5.0% ±0.15s/±7.5% ±0.2s/±10.0% b a n a n a n a n roh = 0.7 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1360 2720 340 680 152 304 85 170 5 1216 6080 304 1520 136 680 76 380 10 1168 11680 292 2920 130 1300 73 730 15 1152 17280 288 4320 129 1935 72 1080 20 1144 22880 286 5720 128 2560 72 1440 30 1136 34080 284 8520 127 3810 71 2130 40 1132 45280 283 11320 126 5040 71 2840 50 1130 56500 283 14150 126 6300 71 3550 roh = 0.8 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1440 2880 360 720 161 322 90 180 5 1344 6720 336 1680 150 750 84 420 10 1312 13120 328 3280 146 1460 82 820 15 1302 19530 326 4890 145 2175 82 1230 20 1296 25920 324 6480 145 2900 81 1620 30 1291 38730 323 9690 144 4320 81 2430 40 1288 51520 322 12880 144 5760 81 3240 50 1287 64350 322 16100 144 7200 81 4050 roh = 0.9 1 (SRS) 1600 1600 400 400 178 178 100 100 2 1520 3040 380 760 170 340 95 190 5 1472 7360 368 1840 164 820 92 460 10 1456 14560 364 3640 162 1620 91 910 15 1451 21765 363 5445 162 2430 91 1365 20 1448 28960 362 7240 162 3240 91 1820 30 1446 43380 362 10860 161 4830 91 2730 40 1444 57760 361 14440 161 6440 91 3640 50 1444 72200 361 18050 161 8050 91 4550 577