Apart from the substantive question of which indicator to choose, a relevant question is also whether the indicator is appropriate. After all, not all indicators are the same . Indicators can be rated on how ‘good’ they are, in other words, criteria can be set up to choose from the possible indicators to use. One example is reliability of the data used for the indicator. However, the literature differs on which criteria these are. The table below summarises a number of criteria for indicators from four different sources.
Source | RBA method
(Friedman, 2005) |
Waardevol
(Drooge et al., 2011) |
PIPA method
(Drooge & Spaapen, 2017) |
Scholtes et al. (2011) |
Criteria for indicators | • Communication power: Can the indicator be effectively communicated to all stakeholders? Is it easy to understand what the indicator means? Do all members of the community experience the indicator in the same way?
• Proxy power: Is the indicator a good proxy for other indicators so that a maximum of 2 or 3 indicators is sufficient? • Data power: Is there sufficient data that is valid and reliable as well as regularly available to be able to monitor progress? |
• Measurable: is the indicator measurable and is the interpretation unambiguous?
• Available and reliable: is the necessary data available and reliable? • Manipulable: how easy it is to manipulate scores? • Validity: does the indicator measure what we want to measure? |
• function to monitor whether the project will reach the goals set;
• need to enable learning during the project, so as to stimulate changes and improvements; • should reflect characteristics of the specific project; • should be realistic to use, and this includes that it should be financially justified to collect evidence; • need to be endorsed by policy and politics. |
• Relevance: is the indicator acceptable to the salient stakeholders as a relevant indicator?
• Validity: does the indicator measure what it is supposed to measure? • Robustness: is the indicator reliable? • Feasibility: is data collection for the indicator possible? |
There are similarities between the criteria mentioned by the various sources, such as the reliability and validity of the data and the relevance and communication power of the indicator. But there are also some differences. What the different sources fail to make clear is that the criteria mentioned are about different ‘things’. Criteria like availability and continuity are about the data source on which an indicator relies. The data source could be an administrative system (financial data), a bibliographic system (publications, citations) or a stakeholder panel as a focus group asked to complete a questionnaire on, for example, satisfaction with and relevance of the research. Criteria such as reliability, manipulability and validity are about the data itself. A financial system may be continuously available to provide information but the data itself, for whatever reason, may not be reliable or may be manipulated to appear more ‘favourable’. Here, we can already establish three ‘layers’ across which quality criteria can be established: indicator, data and data source.
Another important distinction to be made is that between a criterion and an indicator. A criterion is a maxim that has to be met; an indicator is the measure of whether you have met the maxim, the criterion. For example, for participation by athletes in the Olympics, the maxim is that they must perform at the required level. In that case, the Olympic limit is the standard by which this is measured and the indicator is the measured distance jumped or time run. One maxim used for applied research, for example, is that results must be useful for real-life practice; subsequently, an indicator is required to show that this is the case, for example through stakeholder statements about that usefulness. Another example is a criterion such as ‘results should be accessible’. In that case, an indicator could be the relative percentage of publications published via open access. This distinction is not always made, which leads to confusion (van Vliet et al., 2020). Sometimes it is difficult to make a distinction. ‘Stakeholder satisfaction’, for example, can be interpreted as a criterion with the follow-up question ‘how do I demonstrate this?’, in other words, which indicator do I choose?, but also as an indicator, with the follow-up question ‘what am I demonstrating with this?’, in other words, which criterion is met?
The difference between criterion and indicator has to be carefully considered; it is the difference between what requirements you wish to meet and how you choose to prove it. At the same time, it is important to use them as a two-pronged approach to an evaluation: both formulating the requirements the research should meet and choosing how to provide evidence for this. It is not exceptional to find criteria without evidence in research evaluations and, conversely, to be presented with evidence for which it is not clear what the evidence is for. What is proposed here is to be explicit about which criteria are distinguished and which indicators are considered to be evidence of a relevant criterion.
With this distinction, it can be argued that a quality criterion like relevance is mainly about whether ‘results are accessible’ is a relevant criterion to demonstrate the status of the research. Moreover, a criterion should be formulated in such a way that it can be made demonstrable. A criterion such as ‘The research has continuous effects on professional practice and society’ (Pijlman, 2017) is so general and all-encompassing that it is difficult to demonstrate this with an indicator. On the basis of this analysis, the following ‘stratification’ can be proposed in criteria for indicators:
Aspect | Quality criteria for indicators |
Criteria | Relevance, demonstrable |
Indicators | Proxy power, communication power, acceptance |
Data | Reliable, manipulable, valid |
Data sources | Availability, continuity |
The ‘stratification’ of the different quality criteria for indicators already gives more guidance on what to look out for and what trade-offs to make. It is prudent to draw up a limited number of criteria against which to assess indicators. In doing so, you should take care not to remove precisely those indicators that are highly relevant but not fully robust or for which the data is difficult to collect, as this tends to leave only the usual suspects of ‘money, people, publications and patents’ (Brouns, 2016).
Harry van Vliet, Augustus 2021
Sources
Brouns, M. (2016). Van Olympus naar agora. Een frisse blik op praktijkgericht onderzoek. Thema(4), 69-74.
Drooge, L. van, & Spaapen, J. (2017). Evaluation and monitoring of transdisciplinary collaborations. The Journal of Technology Transfer. doi:https://doi.org/10.1007/s10961-017-9607-7
Drooge, L. van, Vandeberg, R., Zuijdam, F., Mostert, B., Meulen, B. van der, & Bruins, E. (2011). Waardevol. Indicatoren voor Valorisatie. Den Haag.
Friedman, M. (2005). Trying hard is not good enough. Sante Fe, New Mexico: Fiscal Policy Studies Institute.
Pijlman, H., Andriessen, D., Goumans, M., Jacobs, G., Majoor, D., Cornelissen, A., de Jong, H. (2017). Advies werkgroep Kwaliteit van Praktijkgericht Onderzoek en het lectoraat. Den Haag: Vereniging Hogescholen.
Scholtes, A., van Vught, F., de Haas, M., et al. (2011), The EDUPROF project: developing indicators of applied research. Brussel: European Union, Education and Culture DG.
Van Vliet, H., Wakkee, I., Fukkink, R., Teepe, R., & van Outersterp, D. (2020). Rapporteren over doorwerking van Praktijkgericht Onderzoek. Amsterdam: Hogeschool van Amsterdam.