1 Introduction

The question of how to evaluate transdisciplinary research has been debated for some time. In this debate we can distinguish two dimensions of what evaluation might address, because they are informed by different, although complementary, goals that are both related to success (see also Lawrence et al., 2022). One dimension addresses impact: does the transdisciplinary project lead to the expected practical solutions and/or societal changes? The other dimension addresses quality: does the transdisciplinary project meet the specific quality requirements of transdisciplinary research? In this chapter, weFootnote 1 focus on the latter.

In order to narrow down the topic more specifically, we distinguish between those who initiate an evaluation and those who conduct it (see Table 5.1). This chapter focuses on external evaluations that are conducted by third parties, which is an important issue on which to focus because it refers to funding structures and review processes that are decisive, for instance, with regard both to whether transdisciplinary research obtains funding and to which kind of transdisciplinary projects are funded. Current funding structures and review processes are still considered as among the major barriers to the scaling up transdisciplinary research (e.g. Koier & Horlings, 2015; Schneider et al., 2023). It therefore makes sense to learn from experiences and provide new avenues and guidelines for funding agencies and review panels dealing with transdisciplinary research evaluations.

Table 5.1 Internal vs. external evaluation, self-evaluation vs. third-party evaluation

In recent decades, the scholarly discourse on the evaluation of transdisciplinary research has yielded a considerable number of highly differentiated sets of criteria (some examples are Bergmann et al., 2005; Defila & Di Giulio, 1999; Jahn & Keil, 2015; a review on the subject is provided, for instance by Belcher et al. [2016], Boix Mansilla et al. [2006], Klein[2008], Pohl et al. [2011], Schuck-Zöller et al. [2017], and Steelman et al. [2021]). Most of these lists of criteria have been developed by scholars that investigate and are involved in inter- and transdisciplinary research. Accordingly, they are informed by the concerns, terminology, and theoretical approaches of this community. The concern that the special nature of transdisciplinary research is not appropriately captured by evaluations runs like a thread through this discourse as does the tacit assumption that providing elaborate lists of criteria is a remedy for transdisciplinary research not being sufficiently valued.

There is, as a result, no shortage of proposals on how evaluation processes for transdisciplinary projects should be conducted (see also, e.g. the results of the comprehensive review by Laursen et al. [2022]). But quite often they are not evidence-based. Too little is known about the evaluation processes actually taking place and about the dynamics that arise in these processes, one recent exception being an experience-based report by Gerhardus et al. (2016). Improving these processes depends on having more knowledge about the challenges faced by those involved in evaluating transdisciplinary research and about what they perceive to be supportive. Hence, what is missing in the discourse is a scholarly engagement with the actual evaluation practice of ‘well-meaning and well-informed actors’, that is, actors who value transdisciplinary research, are experienced in dealing with it, and are aware of the specific nature and requirements of such research. This could uncover promising paths both for review processes and for future research.

In this chapter, we concentrate on evaluation processes that are characterized by ‘well-meaning and well-informed actors’.Footnote 2 In this first section, we explain our approach to transdisciplinary research, defining our understanding of transdisciplinary research and of its specific quality. We conclude this section by identifying the challenges of evaluating the quality of transdisciplinary research. Based on this, in the subsequent sections we present experiences and results from three case studies in which we have accompanied processes of evaluating transdisciplinary research before finally drawing conclusions for funders and for researchers.

We proceed from the following definitions. While in a multidisciplinary approach experts of different fields explore the same topic but do not relate their perspectives, in an interdisciplinary approach, scholars of at least two academic disciplines collaborate with the aim of producing integrated results, of producing a synthesis (e.g. Andersen & Wagenknecht, 2013; Holbrook, 2013; Zweekhorst et al., 2001; for the scholarly discourse, e.g. Defila & Di Giulio, 1998; Hvidtfeldt, 2018; Klein, 2010; Vermeulen & Witjes, 2020). In transdisciplinary research, in addition to scholars from different academic disciplines, actors from outside academia participate in the research process (e.g. Bogner, 2012; Gibbons et al., 1994; Mielke et al., 2016; Pohl & Hirsch Hadorn, 2007; Regeer & Bunders, 2003). These actors contribute substantially to the research—they are not just a source of information, data, and/or feedback but are involved in co-designing the research and in co-producing the integrated knowledge. Such an actor-oriented understanding of transdisciplinary research is what Mobjörk (2010) refers to as ‘participatory transdisciplinarity’ (in contrast to ‘consulting transdisciplinarity’).

According to these definitions, the specific quality of inter- and transdisciplinary research can be described by three terms which all denote processes that must take place (based on Di Giulio & Defila, 2017; see also Bergmann & Schramm, 2008; Defila & Di Giulio, 1999; Huutoniemi, 2010; Jahn & Keil, 2015; Klein, 1990; Pohl et al., 2011; Röbbecke et al., 2004):

  • Consensus: Those participating in the research have to arrive at a shared problem framing. They need to develop joint research goals they all equally want to reach and shared research questions they all equally want to answer. They have to reach a joint understanding about the theoretical and methodical approach for dealing with these questions, and to develop a common language. Consensus does not mean that individuals should abandon their different perspectives and replace them with a ‘group perspective’ or that their different perspectives should dissolve into just one perspective. Rather, they have to develop a shared point of view—which is not an identical point of view but one with which they can all identify to a certain extent and are prepared to proceed from and to relate their findings. A shared problem framing and the like have to be developed by applying methods of (cognitive) consensus-building.

  • Integration: The research must lead to common outputs (results and products). In other words, those participating in the research have to develop common answers to their shared research questions by integrating, from the very start, the findings from the different disciplines and/or non-academic fields that are involved. To this end, findings and approaches have to be selected in terms of their contribution to the common answers, they have to be reprocessed, related, and integrated. The common result is the integrated knowledge produced in this process, the so-called synthesis. The synthesis has to be achieved by applying methods of knowledge-integration.

  • Diffusion: As a rule, the audience of inter- and transdisciplinary research is neither disciplinary nor purely scholarly, and nor are the users of the products (products can be publications, but tool kits, recommendations, technologies, materials, etc., are products as well). The research outputs (results and products) must feed into different academic and non-academic discourses and fields of practice. This means that the results must be translated in order to fit with the ‘logic’ of the targeted discourses and to be accessible to the different target audiences and their perspectives. This is not simply a matter of the language used nor is it only about disseminating results and products and promoting reception on the part of the audience.

Defining transdisciplinary research implies defining who are the actors from outside academia that are to be involved in the research. This should be done by considering the intended aim of involving them and, related to that, the contribution to the research expected from them. This can best be captured by referring to the concepts of credibility, salience, and legitimacy, which are part of the discourse on scientific policy advice (e.g. Cash et al., 2003; Hastie, 2007). ‘Credibility’ refers to the scientific legitimacy of the knowledge that is produced. It denotes the scientific adequacy of the evidence and arguments. ‘Salience’ refers to the practical legitimacy of the knowledge that is provided. It denotes the relevance to the needs of decision-makers. ‘Legitimacy’ refers to the political/societal legitimacy of the results. It denotes the perception that the knowledge production has been respectful of stakeholders’ divergent values and beliefs, unbiased in its conduct, and fair in its treatment of opposing views and interests. Drawing on these concepts, we suggest distinguishing three goals of participation leading to three types of participating actors and three types of contributions:

  • Participation of uncertified experts to increase credibility: Participation can serve the goal of broadening the knowledge that is considered in framing problems, in investigating problems, and in providing answers and solutions. That is, one goal of participation is to ensure that the relevant knowledge is considered and integrated in the research regardless of whether it is academic or non-academic. In this case, the participating actors are ‘experience-based experts’ (or ‘uncertified experts’) with regard to the topic being investigated (while the participating scholars are ‘certified experts’) (Collins & Evans, 2002). They contribute by providing expertise.

  • Participation of (future) users to increase salience: Participation can serve the goal of including first-hand experiences about actual needs and usability. So, one goal of participation is to ensure that the outputs of research (knowledge and products) can in fact be used, that they answer practical needs, and that they are linked to users’ options for action. In this case, the participating actors are (future) users (including those who have agency and/or practitioners) in the field that is being explored. They contribute by providing practical experience and knowledge about the practice.

  • Participation of stakeholders to increase societal legitimacy: Participation can serve the goal of strengthening the societal legitimacy of the research and its outputs. In other words, one goal of participation is to ensure that the production of knowledge and its outputs are sensitive to socio-political interests, fair in the treatment of opposing views and interests, and that they consider and respect divergent values and beliefs. In this case, the participating actors are stakeholders and actors representing (affected) groups in civil society. They mirror the relevant socio-political interests in the field and contribute by providing their everyday experiences, feelings, and concerns.

One actor may of course belong to more than one of these groups. For instance, an actor may simultaneously be both an uncertified expert and a (future) user.

Taking the three process requirements (i.e. the quality of transdisciplinary research) and the differentiations with regard to participation (i.e. goals, criteria of involvement, contributions) seriously has the following implications with regard to the tasks of evaluating:

  • The evaluation needs to assess whether the different perspectives covered by the participants are integrated, i.e. whether processes aimed at consensus and integration take place (and are conducted state of the art) and whether the research is informed by their results.

  • The evaluation must assess whether the research has the potential to produce the intended impact by generating appropriate results and products and by conducting activities of dissemination that support the diffusion of the outputs.

  • The evaluation has to assess whether the participation is expedient, i.e. whether the ‘right’ actors are involved and whether they contribute substantially (according to the specific goals) to the research.

The challenges that have to be mastered in evaluating the quality of transdisciplinary research present themselves as follows:

  • Being unable to rely primarily on quantitative and indirect indicators for measuring quality: Whether a transdisciplinary project does or does not meet the specific quality requirements of transdisciplinary research may often not be judged simply by using quantitative criteria and indicators (e.g. Stokols et al., 2003). That is, not using qualitative criteria and indicators in evaluating transdisciplinary research would impair the quality of the evaluation. This is reinforced given that an evaluation of transdisciplinary research that relies only on indirect indicators and does not also include direct indicators that target the processes will be unable to judge the specific quality of such research (e.g. Love et al., 2022; Steelman et al., 2021; Wagner et al., 2011). This also means that an individual reviewer’s judgment carries considerable weight and that individual reviewers thus bear a high responsibility.

  • Doing justice to diversity and coping with the lack of common ground: Transdisciplinary projects do not have standardized procedures. The methods used depend not only on the disciplinary background of the scholars involved but also on the background (and possibly vulnerability) of the non-academic actors and on the different types of goals that are pursued by involving them. The methods must be appropriate in relation both to the goals and questions of a given project and to the people who are involved. That is, each project is much more unique than projects that take place in a disciplinary context while, at the same time, there is no shared and agreed on body of methods or state of the art approaches that can be used as a common point of reference in evaluating transdisciplinary projects, making ‘evaluation a custom task’ (Koier & Horlings, 2015, p. 47; see also, e.g. Laursen et al., 2022, and the different contributions in Stoll-Kleemann & Pohl, 2007).

  • Navigating between the need to have a reliable and robust research plan and the inevitability of the non-plannability: For transdisciplinary research processes to be open and informed by the results of the ongoing processes of consensus-building and of knowledge-integration makes it indispensable to have a flexible research plan (e.g. Defila et al., 2016; Verwoerd et al., 2023; see also Dahl Gjefsen et al., this volume). The more a transdisciplinary project’s research plan is fixed from the very beginning, the lower its transdisciplinary quality is likely to be. But this in turn very often conflicts with the expectations and requirements of funding bodies that expect carefully and detailed worked-out research plans (e.g. Lawrence et al., 2022; Vermeulen & Witjes, 2020). And it adds to the difficulty of evaluating a project because the less a transdisciplinary project’s research plan is fixed, the more demanding its evaluation.

The guiding question in our chapter is how evaluation processes can be improved with regard to supporting actors involved in coping with these challenges and with second-order challenges that might arise from managing these first-order challenges. One rather obvious way of approaching the first-order challenges is to arrange for review panels to take funding decisions as a group or to agree as a group on how to react to mid-term or final reports. This might lead to second-order challenges with regard to the dynamics and interaction in the review panel.

2 Experiential and Empirical Background

Our evidence is presented in three case studies. In all three, we supported the process of the external evaluation of transdisciplinary projects as certified experts (but without being involved in the evaluation of proposals/projects). All three processes were characterized by ‘well-meaning and well-informed actors’ (and in all three, some of the members of the review panel were certified experts of inter-/transdisciplinary research). The question of how to evaluate transdisciplinary research and of the criteria to use, as we have shown above, is not new. The novelty of our approach is that in all three case studies, we applied a transdisciplinary design to answer this question for a specific research program. In the following, we describe the three case studies by summarizing both our role and our methodical approach before going into the details of our experiences and results in the subsequent sections.

Case study 1 (CS-1) is the accompanying research projectFootnote 3 to the funding program ‘Research for sustainable development’ (WfNE) in Lower Saxony, managed by the Volkswagen Foundation. WfNE had three rounds of funding (2014, 2015, 2017). The accompanying project was funded by the Ministry of Science and Culture of Lower Saxony. It had three principal investigators—the two of us and Claudia Binder. We were in charge of the research question which was devoted to the appropriate evaluation of transdisciplinary research. Working on this question covered not only investigating the topic, but also contributing to the development of quality criteria that were used in making funding decisions in WfNE. The practitioners with whom we collaborated in our part of the project were the Volkswagen Foundation, the Ministry of Science and Culture, and the interdisciplinary group of scholars responsible for reviewing the research proposals. We observed the discussions of the reviewers (tape recording), we interviewed the reviewers as well as members of the foundation and the ministry (qualitative interviews) in order to learn about their experiences in conducting the evaluation, and we asked the applicants how they experienced the process of submission and evaluation (qualitative interviews; online questionnaire). Subsequently, we discussed the empirical results with the members of the foundation and the ministry involved in the management of the research program and provided a collaboratively revised list of evaluation criteria. This new list was used in the second round of funding, and again we observed the discussions of the reviewers and asked the different actors about their experiences in using this list and about their judgment of the adequacy and applicability of each of the criteria. In CS-1 a transdisciplinary collaboration took place with uncertified experts and (future) users but not with stakeholders (those affected by the evaluation, the applicants).

Case study 2 (CS-2) is another accompanying research project we were in charge of. Over the 2015–2019 period, the Federal State of Baden-Württemberg funded projects running as real-world laboratories (two rounds of funding). During this time, the real-world laboratories program had two accompanying projects. We led one of themFootnote 4 (Defila & Di Giulio, 2018, see also Schäpke, Chapter 6, this volume). In this project, we contributed to the development of quality criteria used in the mid-term and in the final evaluation and to the improvement of the corresponding evaluation processes. The practitioners with whom we collaborated in this project were the interdisciplinary group of scholars responsible for reviewing the ongoing research, and the research teams conducting real-world laboratories. In 2016, the reviewers evaluated the mid-term reports of the research teams in the first round of funding. Both the reviewers and those being evaluated criticized the procedure and the result, which initiated a process of reflection and revisioning. We were in charge of designing and facilitating this process. We analyzed the evaluation reports the reviewers had produced as well as the critique the research teams had voiced with a view to the coherence and consistency of the evaluation and to how the reviewers had justified and interpreted the criteria. Based on the results of the analysis, we suggested how the evaluation process could be improved (criteria and procedure) for the second round of funding. Both the criteria and procedure suggested were subjected to a participatory process with the research teams (both rounds of funding) and a feedback process with the review panel. That is, the research teams participated in the development of the quality criteria that then were applied to their own projects. At a later stage, part of this process was repeated in order to develop the criteria for the final evaluation of the projects (both rounds of funding). In CS-2 a transdisciplinary collaboration took place with stakeholders (those affected by the evaluation, the research teams).

Case study 3 (CS-3) is situated in the same funding context as CS-2, the real-world laboratories program funded by the Federal State of Baden-Württemberg. A third round of funding real-world laboratories started in 2021; in 2023, the projects had the possibility of submitting a proposal for a two-year-extension (starting in 2024). We are mandated to provide methodical support and coaching for the teams that are conducting the projects. In addition, we had a time-limited mandate (2022) to support the process of setting in place the mid-term evaluation and the evaluation of the proposals for the two-year-extensions. The practitioners with whom we collaborated in fulfilling this mandate were the Ministry of Science, Research and Arts Baden-Württemberg, the professional agency in charge of organizing the evaluation, and the research teams conducting real-world laboratories. In contrast to CS-2, we did not interact with the review panel. We provided a first input into the process by reminding the funder (including the professional evaluation agency in charge of organizing the evaluation) of the process that had taken place in the 2015–2019 period and by providing the materials that had been produced in this process. Based on this, the funders decided how they would like to proceed and what evaluation criteria they would like to apply. We provided feedback on their concepts and helped to design and facilitate an online workshop in which the research teams had the opportunity to discuss and comment on the criteria. Based on the research teams’ feedback, the criteria were revised and handed over to the review panel for the final decision. That is, also in CS-3, the research teams participated in developing the quality criteria that then were applied to their own projects. In CS-3 a transdisciplinary collaboration took place with stakeholders (those affected by the evaluation, the research teams).

In the following section, we report on our learnings from these case studies, focusing on three topics: the practical requirements with regard to evaluation criteria, the interdisciplinary nature of the process of evaluating transdisciplinary research, and the benefits of a transdisciplinary approach to developing criteria and procedures for evaluating transdisciplinary research. We draw mainly on the experiences and results of CS-1 in which we collected empirical data but complement this by the experiences in CS-2 and/or CS-3.

3 Requirements for Practicable Criteria to Evaluate the Quality of Transdisciplinary Research

In 2014, WfNE was launched (CS-1). The aim of the program was to fund research looking into issues of sustainability without further limitations regarding the topics to address or the scientific fields invited for submission. In the first round of funding, two related but not identical sets of criteria were used (see Table 5.2). One of them was published in the call for projects, the other resulted from specifying these criteria for the review process. Both lists were provided by the funder (ministry and foundation).

Table 5.2 The two sets of criteria used in the first round of funding of WfnE in 2014. The list in the left-hand column was communicated to the applicants, the list in the right-hand column was communicated to the review panel

We asked the applicants (qualitative interviews, lasting about 15 minutes) what they believe are important criteria to evaluate sustainability research and whether the different criteria published in the call (Table 5.2, left-hand column) make sense to them. We asked the reviewers (qualitative interviews, lasting about an hour) whether the different criteria they were asked to use (Table 5.2, right-hand column) were adequate for the purpose of assessing and selecting proposals for funding. And we asked both groups whether they felt any criteria were lacking. The applicants voiced a number of difficulties with the criteria as they had been published in the call, and the reviewers also expressed difficulties as the criteria had been specified for evaluating the project proposals. The analysis of the data was informed by two questions: What criteria do the respondents use in judging the suitability of criteria? What do the respondents suggest in terms of revising the criteria?

The findings from the first question (see Table 5.3) show that the main criteria the applicants and the reviewers used to assess the criteria differ although showing some overlap and that they are informed by the respondents’ respective roles.

Table 5.3 The main criteria applicants and reviewers used in judging the suitability of the criteria

In the interviews, both groups of respondents made suggestions for how to change the criteria. Based on these suggestions, we reprocessed the list of criteria. This was a collaborative process with the funder that resulted in a new list of criteria that was used in the second round of funding 2015 (see Table 5.4). Major changes covered the following: the new criteria did not predetermine a specific theoretical approach to sustainable development, actual or perceived redundancies were eliminated, and the criteria were presented and operationalized in the format of questions whereby the questions used to specify each criterion were not meant to be applied cumulatively but rather to define the conceptual space to consider in applying the criteria to an individual proposal.

Table 5.4 The revised criteria used in the second round of funding of WfnE in 2015, showing the entire list (left-hand column) and how selected criteria were operationalized (right-hand column)

Again, we asked both the applicants (online questionnaire this time) and the reviewers (qualitative interviews as before) about their experiences and how they judge the adequacy of the criteria. After that, we discussed the results and the criteria in a meeting with the funder and the review panel. Applicants, reviewers, and the funder were satisfied with the new set of criteria and judged it to be suitable (the funder took the final decision on the criteria). In the third round of funding in 2017, this list of criteria was published as a part of the call for projects.

The process in CS-2 (evaluation of mid-term reports, starting in 2016) was again initiated by the actors (reviewers, project teams) not being satisfied with a first list of nine criteria and resulted in a collaboratively agreed list of seven. The process was designed as follows: Based on our analyses of the evaluation reports, on the critique by the research teams, and on our experiences in CS-1, we provided a first list of revised criteria that were operationalized in the format of questions. In workshops, the research teams discussed and revised this list, and the result of this process was subsequently discussed with and revised by the reviewers (the review panel took the final decision on the criteria).

At a meta-level, the lists of criteria resulting from these processes (CS-1, CS-2) can be characterized as follows: The lists differ with regard to the content of the criteria, that is, both lists of criteria are tailored to the individual funding program. They cannot be simply transferred to another funding context. Although the criteria in both lists are in line with core requirements of transdisciplinary research as they are formulated in the scholarly discourse, the language used in how they are formulated is not entirely the technical language this academic community uses. If, in our role as certified experts, we had provided the lists of criteria, these would have been, at least in part, formulated differently. Only one of the six criteria in CS-1 addresses the transdisciplinary process directly (two if we also count the criterion addressing the interdisciplinarity of the consortium). The list in CS-2 has two criteria that address the transdisciplinary process directly (knowledge-integration, participation) and two that address processes of diffusion and of generating societal impact. In both lists, all the criteria are qualitative, and their number is limited. In both funding contexts, the actors considered it useful to have criteria in the format of questions and to specify these by questions.

4 The Interdisciplinary Nature of the Process of Evaluating Transdisciplinary Research

The process of evaluation in CS-1 (research proposals) was rather complex. Roughly it worked as follows. First, applicants submitted a full proposal, which was evaluated by an interdisciplinary panel of reviewers. Each member of the panel was assigned several proposals, which they read against the criteria provided. They met for a one-day discussion to decide which of the proposals they deemed to be eligible for funding (roughly 30% of the submitted proposals). These applicants were invited to present their projects in a two-day colloquium open to the public. During and directly after this colloquium the reviewers met for several rounds of discussion and decided which of the projects should receive funding (approximately half of the eligible projects). In the interviews that took place some months after this process, the reviewers described in hindsight how they had experienced the process. The following is based on the main points that were voiced in these interviews by several of the reviewers.

Neither the broad spectrum of research fields nor the diversity of disciplines covered by the submitted proposals were represented in the review panel. In other words, the review process in CS-1 was of a multidisciplinary nature both with regard to the composition of the team of reviewers and with regard to the disciplinary background of the proposals each reviewer had to assess. This meant that the reviewers were constantly forced to move out of their individual comfort zone:

Sometimes, I found it difficult to judge to what extent the disciplinary state of research was well presented and whether the research question was sufficiently related to it, which is also one of the criteria. And we always had only one statement on this topic [by a member of the panel] to draw on […] and sometimes none. (Interview with reviewer)Footnote 5

Review decisions in CS-1 had to be backed by the entire group, which meant that the reviewers had to integrate their different perspectives and reach a decision they all agreed with. Thus, the review process in CS-1 aimed to be interdisciplinary because it aimed to reach an integrated judgment. In this process, the reviewers experienced the problems that tend to characterize any interdisciplinary collaboration—problems of bridging different disciplinary worldviews as well as the problem that the processes of consensus-building and of knowledge-integration are not always carefully designed and supported:

When I think of the […] engineers and the sociologists, these are two different worlds, aren't they? We did have diametrical sensitivities and perceptions and also judgements. (Interview with reviewer)

And then it started, how can I say, a fundamental discussion about the relationship between certain sciences and the pecking order in sciences, and who's better now than the other and so on. (Interview with reviewer)

But I also got the impression that in this interdisciplinary communication there might have been a little more exchange at some points. (Interview with reviewer)

Achieving an integrated judgment—that is, succeeding in the interdisciplinary integration of perspectives—was experienced to be individually rewarding and regarded as adding substantially to the quality of the results of the funding decisions:

And I also experience it to be enriching, because one does learn from each other, that is, one learns how other people do actually look at the proposals with their disciplinary backgrounds, that is, what do they read in this proposal, which I read with a specific lens and perspective and with regard to which I have a specific perception and judgement. (Interview with reviewer)

Of course, in one’s own review, in the course of the individual preparation, there were always a few questions left unanswered, but the group served this purpose to discuss these questions in the group. And this always worked, that in the group these questions could be answered very quickly. Well, I was not left alone with anything. (Interview with reviewer)

And, of course, there might always be projects, which, if I had been the only person to decide whether to fund them or not, might not have been funded. But that is the advantage of considering different perspectives in deciding and of deciding with different people. (Interview with reviewer)

That there were always several people who were discussing and deciding on a proposal, that maybe the bias which one has or where one had to exceed the personal comfort zone or expertise, perhaps, then hopefully was compensated for. (Interview with reviewer)

One of the risks of not striving for and achieving an interdisciplinary—an integrated—judgment is ending up with unbalanced funding decisions that might systematically privilege some approaches and/or knowledge systems and/or topics. Another risk manifested itself in CS-2. In the first round of funding, the reviewers wrote individual comments on the project teams’ mid-term reports, which were used to produce the mid-term reviews without them having first been subject to an in-depth discussion and an interdisciplinary integration of perspectives in the (interdisciplinary) group of reviewers. Our comparative analysis of these mid-term reviews found that the different evaluation criteria had been interpreted differently by the members of the review panel resulting in an inconsistent and sometimes even contradictory mid-term review.

Our case studies confirm that the process of evaluating transdisciplinary research is inevitably multidisciplinary. But they also show that this process is not always interdisciplinary, meaning that it is not always organized in such a way as to lead to integrated judgments although this does, if successful, improve the quality of the evaluation and the decision-making. Aiming at integrated judgments is time-consuming because it requires reviewers to engage in intensive interdisciplinary processes of consensus-building and of knowledge-integration, and it is demanding because these processes must be designed and moderated. In such a process reviewers learn from each other and broaden their horizons. This might strengthen what Misra et al. (2015) call a ‘transdisciplinary orientation’, because it provides them with a positive experience of interdisciplinary collaboration—and such experiences do possibly add to reviewers’ willingness to engage in such time-consuming and cognitively challenging processes.

But the interdisciplinary interaction should not be limited to the group of reviewers as was emphasized by the applicants in CS-1. In a funding context that addresses any scientific field, it is quite a challenge to ensure comprehensibility for a broad spectrum of disciplines. In such a context the applicants cannot know what information they need to explain in their proposals and what information the reviewers will be able to infer. This problem can be eased by an oral exchange between reviewers and applicants. That is, ideally an interdisciplinary process of evaluating transdisciplinary research plans for such an exchange (and removes the review panels’ anonymity). The value such an exchange could have can be illustrated by the experience in CS-3. Evaluation criteria might be interpreted differently by scholars from different disciplinary backgrounds. This became obvious in how the project teams discussed the evaluation criteria suggested by the funder (mid-term evaluation) in the online workshop. Discussing criteria with those who have to comply with them makes it possible to identify the criteria that need to be reformulated (or explained) in order to avoid misunderstandings.

5 The Potential of Adopting a Transdisciplinary Approach in Setting up the Evaluation of Transdisciplinary Research

In all three case studies, the process of how the external evaluation of transdisciplinary projects was developed (criteria and procedure) shows transdisciplinary elements by involving actors who play different roles. The three case studies differ in terms of the intensity of the transdisciplinary collaboration with the different actors involved (see Table 5.5).

Table 5.5 Intensity with which the uncertified experts, (future) users, stakeholders, and certified experts were involved in the development of the criteria and procedures for the external evaluation of transdisciplinary projects

An intensive collaboration of review panels and funders (uncertified experts, (future) users) in developing criteria and procedures seems an obvious thing to do. But in many cases, this is not done systematically. Rather, as a rule, the funder provides the criteria to be used and the review panel can modify these criteria to a certain extent. This was the case with CS-1 in which the review panel’s deep dissatisfaction with the criteria led to the transdisciplinary process of revising these. The funder had not planned this, and it would not have happened systematically without the involvement of certified experts who designed and facilitated the process. In CS-1, the scholarly knowledge about transdisciplinarity, the practical needs and experiences of funders and reviewers as well as the experience-based expertise of the funders and reviewers fed into the process and its result in a transdisciplinary way. This increased the credibility and the salience of the evaluation. The stakeholders’ perspectives were included by extractive methods and via feedback.

While in CS-1 the applicants (the stakeholders) did not actively participate in the collaborative development of the criteria, this was the case in CS-2 and CS-3. In CS-2, they were involved because they criticized their mid-term review. Again, such a process was not planned by the funder, and it would not have been possible without the involvement of certified experts who designed and facilitated the process. In CS-3, the stakeholders were involved because based on the previous experience (CS-2) the funder wanted this to happen in order to improve the quality and transparency of the evaluation. The certified experts were involved as consultants, and they reminded the funder about the previous achievements (CS-2) and thus guaranteed that the current process built on what had been learned and developed in the past. In CS-2 and CS-3, the scholarly knowledge about transdisciplinarity as well as the stakeholders’ concerns, experiences, and interests fed into the process and its result in a transdisciplinary way. This increased the legitimacy of the evaluation. The funders’ and the reviewers’ perspectives also fed into the process, but there was no point in which all actors engaged in a direct discussion and collaboration with each other.

One might ask whether it is reasonable to involve applicants and project teams in developing criteria and procedures that will be applied to their own proposals and projects. In CS-2 and CS-3, this worked out well and lead to criteria on which all actors agreed. Furthermore, in CS-3, the project teams were asked what they expect from the mid-term evaluation, which was done in an online workshop setting. The answers were collected on a whiteboard cover four dimensions:

  • Expectations of the evaluation’s quality: fair; taking the individuality of the projects into account; efficient, transparent with regard to the criteria; considering both qualitative and quantitative dimensions.

  • What should be taken into account in evaluating the projects: the specialty of the research format (real-world laboratories); external factors that influence progress but are out of reach of the projects; what can realistically be achieved at the mid-term; that activities aimed at including stakeholders or at achieving long-term impact should be acknowledged, even if not all of them are successful.

  • What the evaluation should yield for the projects: opportunity to reflect and learn about the progress of the project; getting (constructive) feedback, food for thought, and suggestions with a view to the second phase; opportunity to question and revise the design and plan for the second phase; making visible the efforts of the first phase; a special focus on the methods used to implement participation.

  • Expectations about how the evaluation should contribute to the broader discourse about the research format of real-world laboratories.

Based on our experiences in all three case studies, the benefits of including certified experts and stakeholders in the collaborative development of the external evaluation of transdisciplinary projects can be summarized as follows:

  • The certified experts contribute expertise not only in relation to the topic of transdisciplinarity and of evaluating transdisciplinary research, but also with regard to how the inter- and transdisciplinary processes of consensus-building and knowledge-integration related to the evaluation processes could be designed and facilitated. Compared to the other actors involved in the process the certified experts are neutral on the set of criteria and the evaluation procedure. In procedures that extend over a period, the certified experts can serve as a measure of quality assurance for the process. They encourage self-reflection by questioning practices or by presenting results from the accompanying research.

  • The stakeholders contribute, of course, their concerns, experiences, expectations, and interests. Considering the stakeholders’ perspectives adds to their commitment to high-quality transdisciplinary research. In addition, in most cases, being academics, the stakeholders amplify the spectrum of disciplinary perspectives that are considered in formulating the criteria. This in turn could help funders and reviewers in doing justice to the diversity of disciplines, non-academic actors, topics, and approaches that are represented in a transdisciplinary research program and to the individuality of transdisciplinary projects.

Finally, a transdisciplinary approach should not focus solely on developing the criteria, but target the entire process of evaluation as was emphasized by reviewers in CS-1:

Because I think that actually the evaluation procedures should be carried out exactly in this way. That is: as transparent as possible and in compliance with comprehensible criteria, but also be willing to review and change the adopted process at any time, and where there is a need for change and the possibility to change to then actually do so. (Interview with reviewer)

In emphasizing the benefits of applying a transdisciplinary approach to the evaluation of transdisciplinary research we do not advocate a democratic approach—the final decisions about both criteria and procedures rest with the funder or the review panel.

6 Conclusion

The processes in interdisciplinary review panels that evaluate transdisciplinary research face the same problems, can yield the same added value, and need the same support as any form of interdisciplinary collaboration. Thus, such processes should meet the same quality requirements as any other interdisciplinary collaboration and must be carefully designed and facilitated in order to lead to shared problem framings and integrated results. A high-quality evaluation of transdisciplinary research requires time-consuming processes in which reviewers from different disciplines interact with each other and with the applicants. Reviewers must be willing to leave their individual comfort zone and to enter an interdisciplinary collaboration. But they must not be left alone to deal with the challenges they encounter while doing this. Taking decisions in a group supports the individual reviewers, eases their responsibility—and might facilitate their willingness to leave their comfort zone and to admit to personal uncertainties. The community of scholars doing research on the evaluation of transdisciplinary research should insist that the procedural quality of such processes is respected and examine the dynamics of these processes.

A transdisciplinary approach has the potential of adding to the credibility, salience, and legitimacy of how the quality of transdisciplinary research is evaluated. Funders should consider involving applicants and project teams (stakeholders) in developing quality criteria that are then applied to their own projects, and they should consider collecting and serving the expectations of those that are affected by mid-term evaluations in order to enhance the beneficial impact of mid-term evaluations (see also Defila & Di Giulio, 2020). This requires a reconsideration of the relationship between those that do an evaluation and those that are evaluated, such that the usually hierarchical relationship is replaced with one that is based on partnership. Funders should consider involving certified experts of inter- and transdisciplinarity, such as in the format of an accompanying research project, that examine and support the ongoing process(es). The certified experts in turn should be sensitive to their role when they engage in such processes without themselves being reviewers. A collaboration with funders, reviewers, and applicants is a transdisciplinary collaboration. Certified experts engaging in such transdisciplinary collaborations must be aware that they cannot impose their criteria and theories on the uncertified experts, (future) users, or stakeholders but need to enter a process of consensus-building and knowledge-integration with them.

The scholarly approach of certified experts to the topic of how to evaluate the quality of transdisciplinary research should be reconsidered. One question on which to reflect is whether the perspectives of uncertified experts, (future) users, and stakeholders are sufficiently considered. In developing suggestions for how to evaluate transdisciplinary research that shall actually be used in funding decisions and in mid-term or final evaluations, not only the scholarly knowledge about transdisciplinarity has to be taken into account, but also the expertise of the uncertified experts (funders and reviewers), what (future) users (funders, and reviewers) actually need from a practical perspective, and what bothers stakeholders (applicants, project teams). If these perspectives do not feed into the scholarly discussion, the suggestions emerging from this discussion will not be sufficiently linked to the practice of doing evaluations. This question also touches on the language used in the scholarly discourse about the evaluation of transdisciplinary research; it has to be asked whether the terminology that is used is to a too large extent loaden with the theories and terms of the academic community rather than linking to the language of uncertified experts, (future) users, and stakeholders. A second question arises from the fact that in an actual evaluation of transdisciplinary research it is impossible to use a large number of criteria that cover all aspects that are, according to the knowledge of certified experts, important to achieve high-quality transdisciplinary research. Furthermore, the sets of criteria that are used cannot be limited to mirroring transdisciplinary quality but also have to mirror the funding context and the funders’ worldviews and policies. Against this background, the topic that has dominated the scholarly debate for quite some time now has to be questioned: All the highly differentiated lists of criteria that have been developed are useful as a source of inspiration for external evaluations (as well as those developed primarily for internal self-evaluation). But they cannot be more than that, because they are too differentiated and comprehensive for the purpose of an external evaluation. The question thus is whether it would be better to stop developing ever more elaborate sets of criteria and turn to other questions instead.

We might summarize our conclusions and learnings in the form of three general messages:

  • Message 1: Criteria that can be used in taking funding decisions or in mid-term or final evaluations have themselves to meet criteria, and these criteria should be informed by the expertise and the practical needs of those applying them and consider the concerns of those affected by them.

  • Message 2: There are enough suitable sets of criteria that can (and have to) be adapted for specific evaluations. The scholarly debate on inter- and transdisciplinarity should now move forward and focus on the process of evaluation itself and on how this process should be designed and supported.

  • Message 3: It makes sense to adopt a transdisciplinary approach to develop evaluation criteria for transdisciplinary research and to improve the evaluation process. The evaluation process in turn must itself meet the same quality criteria as any other inter- or transdisciplinary process.

We are convinced that the quality of how transdisciplinary research is evaluated can be improved by developing criteria to assess the quality of evaluations as well as by transdisciplinary collaborations. What our case studies did not cover and thus remains to be investigated is what role non-academics can and should play in this.