It is important that the definitions of the criteria are understood within a broader context, and read in conjunction with other principles and guidance on how to conduct evaluations in ways that will be useful and of high quality. They should be contextualized — understood in the context of the individual evaluation, the intervention being evaluated, and the stakeholders involved.
The evaluation questions what you are trying to find out and what you intend to do with the answers, should inform how the criteria are specifically interpreted and analysed. The use of the criteria depends on the purpose of the evaluation. The criteria should not be applied mechanistically. Instead, they should be covered according to the needs of the relevant stakeholders and the context of the evaluation.
More or less time and resources may be devoted to the evaluative analysis for each criterion depending on the evaluation purpose. Data availability, resource constraints, timing, and methodological considerations may also influence how and whether a particular criterion is covered.
Relevance assessment involves looking at differences and trade-offs between different priorities or needs. It requires analysing any changes in the context to assess the extent to which the intervention can be or has been adapted to remain relevant.
The compatibility of the intervention with other interventions in a country, sector or institution. Note: The extent to which other interventions particularly policies support or undermine the intervention, and vice versa. This includes complementarity, harmonisation and co-ordination with others, and the extent to which the intervention is adding value while avoiding duplication of effort. In both conditions, half of the feedback comments were focused on only one aspect of the essay at a time.
For positive feedback at least two aspects were mentioned in 30 percent of the cases, with a maximum of 4 aspects per comment. For negative feedback this percentage was somewhat lower: only 23 percent, with a maximum of 3 aspects per comment. In the other cases the feedback segment was left blank by the student.
An in-depth analysis of the content of feedback showed considerable differences in the probability that a particular aspect was mentioned, see Tables 3 and 4 for an overview of the results. Table 3 shows that there were no significant differences between conditions for the proportion of positive feedback. The results were rather different for negative feedback, in which the condition affected the proportion of feedback in three of the five categories, see Table 4.
Figure 2 shows the results for positive left pane and negative feedback right pane in more comprehensible terms: the proportion of feedback for each of the five categories. Below, these results of positive and negative feedback per feedback category are systematically presented. Table 3. Table 4. Figure 2. Effects of condition criteria vs. CJ on the estimated probability of positive feedback left pane and negative feedback right pane for each of the five categories: content and structure, grammatical control, coherence and unity, vocabulary, and miscellaneous.
First, when providing positive feedback, students in both conditions were equally likely to provide feedback on the content and structure of the text, with a probability of 0.
Second, the proportion of positive feedback on aspects related to grammatical control in the criteria condition was 0. Although the proportion of feedback on grammar decreased in the CJ condition to only 0. The proportion of grammar feedback in the criteria condition was 0. Third, the probability of feedback on coherence and unity 0. When students commented on weaknesses in the text, the probability that they focused on coherence and unity in the text was only 0.
Fourth, the results for feedback on vocabulary are quite comparable to the results for feedback on grammar. The proportion of positive feedback on vocabulary was 0. Specifically, the proportion of grammar feedback in the criteria condition was 0. Fifth, students in both conditions hardly provided feedback on aspects that could not be categorized in any of the other four evaluation criteria, with a probability of 0. The average scores are presented in logits, which represent the probability that a particular text is judged as being of higher quality than a random text from the same pool of texts.
In other words, the probability on high-quality texts was generally higher for students in the CJ condition 0. Figure 3.
The present study aimed to investigate the differential learning effects of an instructional approach in which students apply analytic teacher-designed criteria to the evaluation of essays written by peers vs.
This was tested in a small-scale authentic classroom situation, showing some interesting and promising findings. First, there were no difference in the reliability and validity of the judgments students made in each of the two conditions, indicating that both types of peer assessments equally support students in making evaluative judgments of the quality of their peers' essays. However, there were some differences between conditions in the content of the peer feedback they provided.
Compared to the criteria condition, students in the comparative judgment condition focused relatively more on aspects that were related to the content and structure of the text, and less so on aspects that were related to grammar and vocabulary. This was only the case for feedback targeted to aspects that needed improvement. For feedback on strengths, there appeared to be no difference between conditions. A second important finding of this study is that there appeared to be only a moderate effect of condition on the quality of students' own writing.
Students in the comparative judgment condition wrote texts of somewhat higher quality than the students in the criteria condition. This difference was not significant in this sample, but that can be due to the relatively small sample size cf.
Wasserstein and Lazar, Two main conclusions can be drawn from these results. First and foremost, the instructional approaches influence the aspects of the text to which students pay attention when providing feedback. Although students in this study were all primarily focused on the content and structure of the text, especially when they provided positive feedback, they were more directed toward the lower level aspects of the text when they needed to provide suggestions for improvement based on an analytic list of criteria.
However, when comparing essays, students stayed focused on the higher order aspects when identifying aspects that needed improvement. This finding might be due to the holistic approach in the process of comparative judgment, which allow students to make higher level judgments regarding the essay's communicative effectiveness.
Although it is not necessarily a bad thing to provide feedback on lower level aspects, feedback on higher level aspects is generally associated with improved writing performance Underwood and Tregidgo, By doing so, the feedback in the comparative judgment condition can be more meaningful for the feedback receiver. Ultimately, this can also have an effect on feedback givers themselves as the way they evaluate texts and diagnose strengths and weaknesses in a peer's work may have an important influence on how they conceptualize and regulate quality in their own writing Nicol and Macfarlane Dick, ; Nicol et al.
Second, conclusions regarding the effect of instructional approach on student's own writing performance are somewhat harder to draw based on the results of the present study. Although students in the comparative judgment condition on average wrote texts of higher quality than students in the criteria condition, this was definitely not the case for all students. Even when controlled for individual writing knowledge and writing self-efficacy, differences in writing quality were still larger within conditions than between conditions.
Moreover, as the present study took place in an authentic classroom situation constraining the number of participating students, and as it is not ethical to exclude students from possible learning opportunities, it was deliberately decided not to implement a control condition in which students completed the same writing task without being presented with examples.
As a result, students in both conditions actively engaged with a range of examples of varying quality. As this process seems to be a necessary condition for students to develop a mental representation of what constitutes quality Lin-Siegler et al.
More research is needed to examine whether the active use of shared criteria and examples in a peer assessment affects students' learning and performance, above and beyond the instructional approach teacher-designed criteria or comparative judgment. Another opportunity for further research is to investigate how many examples of which quality are necessary for students to learn. A possible explanation for the small effects in this study of the learning by comparison condition on students' writing quality may be that improved understanding of writing quality does not easily transfer to one's own writing, at least not on the short time.
Further research is needed to understand what instructional factors can foster this transfer. For instance, the learning effects might be stronger once the peer assessment is routinely and systematically implemented in the curriculum.
According to Sadler , any feedback-enhanced intervention in which students are engaged in the process of assessing quality must be carried out long enough for it will be viewed by learners as normal and natural p.
To our knowledge, there is no research yet that investigates how the number of peer assessments performed over the course of a curriculum affects students' performance. The role of the teacher in the transfer from understanding to performance may be a crucial factor as well. Key aspects of pedagogical interventions that successfully promote student's learning include a combination of direct instruction, modeling, scaffolding and guided practice Merrill, This implies that a peer assessment on its own may not be sufficient to improve writing.
A more effective implementation of any type of peer assessment may be that teachers discuss the results from the peer assessment with students and show how they can use the information from the peer assessment during their own writing process Sadler, , ; Rust et al. At the end of the present study, students in comparative judgment condition confirmed that they missed explicit clues on whether they made the right choices during their comparisons. While acknowledging the importance of teachers, Sadler remarks that teachers should hold back from being too directive in guiding students' learning process.
He states that students assume that teachers are the only agents who can provide effective feedback on their work and that they need a considerable period of practice and adaptation to build trust in the feedback they give and receive from peers, especially when they do this in a more holistic manner.
When teachers are too directive in this procedure and keep focusing on analytic criteria instead of on the quality of texts as a whole, students' own learning process might be inhibited. Instead he argues that teachers should guide the process more indirectly, for instance, through monitoring students' evaluation process from a distance and by providing meta-feedback on the quality of students' peer feedback.
Together, this implies that a combination of both instructional methods might be more effective than either of them, and that teachers play an important role in how to bring criteria and examples together in such a way that students engage in deep learning processes.
Although the present study provides important insights into how students evaluate work of their peers and what aspects they take into account during these evaluations, the results do not provide any insight into how they evaluate their own work during writing.
Theories on evaluative judgment suggest that improved understanding of what constitutes quality does not only improve how students evaluate the work of their peers but also how they evaluate their own work Boud, ; Tai et al. Although writing researchers have already acquired a decent understanding of how novice and more advanced writers plan their writing product, there is not much information yet on how students evaluate and revise their writing.
This is especially relevant for developing writers, as being able to monitor and control the quality of one's own product during writing is one of the most important predictors of writing quality Flower and Hayes, Based on the small effects of peer assessment on writing quality in this research it might very well be possible that students have made changes in their writing process. To further our understanding of the learning effects of peer assessment in the context of writing, research should therefore take into account both the process and the product of writing.
To summarize, the present study has taken a first but promising step into unraveling how analyzing examples of varying quality might foster students' understanding and performance in writing. It has been demonstrated that students analyze example texts quite differently by comparison than by applying teacher-designed criteria. In particular, when providing feedback in a comparative approach, students focus more on higher level aspects in their peers' texts.
Although the results are not conclusive in whether the effects of learning by comparison also transfer to students' own writing performance, the results do suggest that it can be a powerful instructional tool in today's practice. It inherently activates students to engage with a range of examples of varying quality, doing so in a highly feasible and efficient manner cf.
Follow-up research is needed to really get a grip on the potential learning effects of comparative judgment, both to contrast the effects to other instructional approaches such as linking example texts to analytic criteria which is now regularly used in educational practice, but also with regards to contextual factors that are needed for an optimal implementation in practice.
This study was carried out in accordance with the guidelines of the University of Antwerp. All subjects gave written informed consent in accordance with the Declaration of Helsinki. All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Bartholomew, S. Adaptive comparative judgment as a tool for assessing open-ended design problems and model eliciting activities.
Using adaptive comparative judgment for student formative feedback and learning during a middle school design project. Bell, A. Students' perceptions of the usefulness of marking guides, grade descriptors and annotated exemplars. Higher Edu. Bloxham, S. Generating dialogue in assessment feedback: exploring the use of interactive cover sheets.
Boud, D. Sustainable assessment: rethinking assessment for the learning society. Studies Contin. Bouwer, R. Effect of genre on the generalizability of writing scores. Sluijsmans and M.
Segers Culemborg: Phronese , 92— Google Scholar. Bradley, R. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39, — Brookhart, S. Appropriate criteria: key to effective rubrics.
Bruning, R. Examining dimensions of self-efficacy for writing. Carless, D. The development of student feedback literacy: enabling uptake of feedback. Higher Educ. Managing dialogic use of exemplars. D-PAC [Computer software] D-PAC [Computer software]. Antwerp: University of Antwerp. Fielding, A. London: DfES. Flower, L. Gregg and E. Steinberg Hillsdale, NJ: Erlbaum , 3— These criteria govern the direction of the evaluation and provide structure and justification for the judgments you make.
We often work backwards from the judgments we make, discovering what criteria we are using on the basis of what our judgments look like. For instance, our tentative judgments about sustainable management practices are as follows:. If we were to analyze these judgments, asking ourselves why we made them, we would see that we used the following criteria: wellbeing of the logging industry, conservation of resources, wellbeing of the environment, and cost.
Once you have identified the criteria informing your initial judgments, you will want to determine what other criteria should be included in your evaluation.
For example, in addition to the criteria you've already come up with wellbeing of the logging industry, conservation of resources, wellbeing of the environment, and cost , you might include the criterion of preservation of the old growth forests. In deciding which criteria are most important to include in your evaluation, it is necessary to consider the criteria your audience is likely to find important.
Let's say we are directing our evaluation of sustainable management methods toward an audience of loggers. If we look at our list of criteria--wellbeing of the logging industry, conservation of resources, wellbeing of the environment, cost, and preservation of the old growth forests--we might decide that wellbeing of the logging industry and cost are the criteria most important to loggers.
At this point, we would also want to identify additional criteria the audience might expect us to address: perhaps feasibility, labor requirements, and efficiency. Once you have developed a long list of possible criteria for judging your subject in this case, sustainable management methods , you will need to narrow the list, since it is impractical and ineffective to use of all possible criteria in your essay. To decide which criteria to address, determine which are least dispensable, both to you and to your audience.
Your own criteria were: wellbeing of the logging industry, conservation of resources, wellbeing of the environment, cost, and preservation of the old growth forests. Those you anticipated for your audience were: feasibility, labor requirements, and efficiency. In the written evaluation, you might choose to address those criteria most important to your audience, with a couple of your own included. For example, your list of indispensable criteria might look like this: wellbeing of the logging industry, cost, labor requirements, efficiency, conservation of resources, and preservation of the old growth forests.
Stephen Reid, English Professor Warrants to use a term from argumentation come on the scene when we ask why a given criterion should be used or should be acceptable in evaluating the particular text, product, or performance in question. When we ask WHY a particular criterion should be important let's say, strong performance in an automobile engine, quickly moving plot in a murder mystery, outgoing personality in a teacher , we are getting at the assumptions i.
0コメント