• 中国科学学与科技政策研究会
  • 中国科学院科技政策与管理科学研究所
  • 清华大学科学技术与社会研究中心
ISSN 1003-2053 CN 11-1805/G3

科学学研究 ›› 2023, Vol. 41 ›› Issue (11): 1948-1957.

• 科学学理论与方法 • 上一篇    下一篇

科技奖励评选中同行评议的信度问题研究

李强1,孟宪飞2   

  1. 1. 中国科学院科技战略咨询研究院
    2. 清华大学科研院
  • 收稿日期:2022-08-23 修回日期:2022-12-08 出版日期:2023-11-15 发布日期:2023-11-15
  • 通讯作者: 李强
  • 基金资助:
    代表性国际科技奖项运行机制研究;科技成果分类评价标准与指标方法研究

Reliability Analysis of Peer Review in the Selection of Science and Technology Award Winners

  • Received:2022-08-23 Revised:2022-12-08 Online:2023-11-15 Published:2023-11-15

摘要: 伴随着我国科技奖励的建制化发展和科技奖励活动的大规模开展,评选结果的公平性和评选方法的科学性越来越为社会各界所关注。该项研究针对科技奖项评选中通常采用的同行评议方法来讨论两个问题:一是在涉及需要由多个专家对多个被提名者进行打分或排序以遴选出获奖者时,同行评议的信度如何,其结果能否反映被评价对象的真实差异?二是对于同组专家的判断,分别采用打分或排序方法给出晋级名单,这两组名单是否会存在差异?如果把科技奖项遴选中的同行专家评分看作试验,则要考察的评分结果就是试验指标数据,这样就可以通过构建双因素方差分析模型分别来考察成果水平、专家偏好等不可控因素,以及遴选标准、决策规则等人为调控因素的影响。从抽样分析四个学科领域2017-2019年网络匿名评审和专家组会评环节的定比数据及定序数据的结果看,通过汇总专家打分数据并按项目总分决定晋级者的做法会产生较大的系统误差,若按半数以上票数通过的晋级规则,专家组会评的平均信度仅为28.6%,导致在历年获奖的115个项目中,至少有8个项目本应被淘汰却因逆序问题晋级获奖。为此,该研究通过对不同遴选方法测算结果的比较,提出按秩和排序、专家独立投票等减少系统误差、提高同行评议信度的方法,并结合国际著名科技奖项的特点及发展趋势,探讨取消答辩、将专家组会评改为匿名网评的可行性。

Abstract: With the institutionalized development of science and technology (S&T) incentives and rolling out of major awards in China, the fairness and validity of the selection of such awards are of increasing concern for various communities. Eyeing the customary approach of peer review in the selection process, the research questions are: (1) When multiple experts need to score or rank multiple nominees to select the winners, to what extent can the peer review results reflect the real differences of the evaluands? (2) For the judgments of experts of the same group, can we get the same shortlist when scoring or sorting methods are used? If the selection processes of science and technology awards can be regarded as experiments, the scoring results to be examined are the test data. In this way, we can build a two factor ANOVA model to examine the uncontrollable factors such as the level of achievements, expert preferences, and the controllable factors of selection criteria, decision-making rules, and so on. Thus the study builds an evaluation model of reliability based on two-way ANOVA, to conduct a sample analysis of the ratio data and ordinal data generated respectively by anonymous online appraisal and assessment of expert panels in four different disciplines between year 2017 and 2019. The result shows that, summarizing expert scoring data and determining the winners based on the total score of the projects will produce large systematic errors. If the shortlisting is based on majority voting rule, the average reliability of the expert group’s evaluation is only 28.6%, with maximum value of 60% and minimal value of 6.5%, which means the selection of winner depends more on other factors than the project itself. Thus in the previous years, among the 115 award winners, at least 8 winners should have been eliminated. To solve the problem, the study carries on a quantitative evaluation and comparison regarding the selection of awards between different years, disciplines, phases of selection, and methods of selection, suggests that systematic errors can be reduced by methods of rank sum or independent voting of experts to increase the reliability of peer review. Besides, the study also takes reference from the features and latest development of renowned international S&T awards to examine the organization of selection, evaluation indicators, and decision-making procedures of domestic awards, discusses the possibility of canceling oral defense, adopting anonymous network evaluation instead of panel meeting.