Tour of research on student evaluations of teaching [22-24]: Heilman and Okimoto 2007, Punyanunt-Carter and Carter 2015, and Young et al. 2009

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

22.

Heilman and Okimoto 2007 "Why Are Women Penalized for Success at Male Tasks?: The Implied Communality Deficit" reports on three experiments regarding evaluations of fictional vice presidents of financial affairs. The experiments do not concern student evaluations of teaching, so it's not clear to me that Holman et al. 2019 should classify this article under "Evidence of Bias in Standard Evaluations of Teaching".

---

23.

Punyanunt-Carter and Carter 2015 "Students' Gender Bias in Teaching Evaluations" indicated that 58 students in an introductory communication course were asked to complete a survey about a male professor or about a female professor. The article did not report inferential statistics, and, given the reported percentages and sample sizes, it's not clear to me that this study should be classified as finding bias.

For example, here are results from the first question, about instructor effectiveness, for which the article reported results only for the percentage of each student gender that agreed or strongly agreed that the instructor was effective:

For the female professor:
82% of 17 males, so 14 of 17
67% of 15 females, so 10 of 15

For the male professor:
69% of 13 males, so 9 of 13
69% of 13 females, so 9 of 13

Overall, that's 21 of 32 (66%) for the female professor and 18 of 26 (69%) for the male professor, producing a p-value of 0.77 in a test for the equality of proportions.

---

24.

Young et al. 2009 "Evaluating Gender Bias in Ratings of University Instructors' Teaching Effectiveness" had graduate students and undergraduate students evaluate on 25 items "a memorable college or university teacher of their choice" (p. 4). Results indicated that "Female students rated their female instructors significantly higher on pedagogical characteristics and course content characteristics than they rated their male instructors. Also, male students rated male instructors significantly higher on the same two factors. Interpersonal characteristics of male and female instructors were not rated differently by the male and female students" (p. 9).

I'm not sure how much to make of the finding quoted above based on this study, given results in Table 4 of the article. The p-value section of Table 4 has a column for each of the three factors (interpersonal characteristics, pedagogical characteristics, and course content characteristics) and has seven rows, for student gender (A), student level (B), instructor gender (C), AxB, AxC, BxC, and AxBxC. So the table has 21 p-values, only 2 of which are under 0.05; the average of the 21 p-values is 0.52.

---

Comments are open if you disagree, but I don't think that any of these three studies provide sufficient evidence to undercut the use of student evaluations in employment decisions.

Tour of research on student evaluations of teaching [22-24]: Heilman and Okimoto 2007, Punyanunt-Carter and Carter 2015, and Young et al. 2009

Leave a Reply Cancel reply