Evaluating the effectiveness of institutions is a hard mission, and it causes numerous theoretical and methodological difficulties. This is especially true for evaluating the effectiveness of quasi-judicial institutions such as the United Nations Human Rights Committee (‘HRC’), as my 2019 EJIL article attempted to do. Therefore, first I would like to thank Andreas J. Ullmann and Andreas von Staden for providing me food for thought in their reply to my article (published in the latest issue of EJIL) and opening an important discussion regarding how to think about evaluating effectiveness of non-judicial institutions. Given the space limitation, I will not be able to answer each and every point Ullmann and von Staden raised. Therefore, I focus my rejoinder on three main subjects: the possibility of bias in the dataset, issues regarding the construction of the dataset, and issues regarding levels of analysis. I am very interested in additional comments and discussion, and welcome emails on the subject.
The first, and perhaps most significant, issue raised by the Ullmann and von Staden is that the dataset used for the study could have been biased. This is mainly because not all states answer the requests for follow-up information. Therefore, constructing a dataset that includes only data regarding communications for which the states responded to the request for follow-up information might be problematic, when one’s mission is to assess the general implementation of the views of the HRC in individual communications.
As I mentioned in the article, this was indeed a concern, and I addressed this issue by comparing the states in the dataset that I used for the article, with a previous dataset that I had (which included all communications filed against states during a somewhat equivalent period of time). I tested whether the states in the two dataset might differ regarding human rights score and polity score (using a t-test, given that the number of observations was larger than 30). I performed the t-test at the level of the communications. The analysis showed that there was no statistically significant difference between the two datasets, and thus the dataset used for the article was not necessarily biased. I agree that there might be other relevant characteristics in which the states may differ, besides human rights score and polity score for which I tested. However, since it is impossible to test each and every possible difference between states, the researcher has to make certain methodological choices. I decided to test for human rights and polity score because I assumed that there was a high probability that these two characteristics may influence the decision to file a report. I was also more comfortable with using those two specific variables, since they are very highly correlated with the other variables that I used (r>0.7 in most cases, including variables such as GDP or government effectiveness). Therefore, even if there was a certain bias regarding those variables, it is probably not a very significant one.
It is also important to note in this regard, that (with a few exceptions) in recent years communications are usually filed against states that are more human rights abiding (see Shikhelman, ‘Access to Justice in the United Nations Human Rights Committee’ (2018), 39 Michigan Journal of International Law 453). This problematic situation by itself can explain why we might not see a significant difference between the states against which communications are filed, and the states eventually replying to the follow-up requests.
To further support their concern regarding a possible bias, Ullman and Von Staden compare the number of communications filed against states in my dataset, with the number of communications included in the dataset which they constructed. Ullman and Von Staden argue that in there is a considerable mismatch between the frequency in which sates appear in their datasets and in my dataset. For instance, whereas in their dataset, two of the states against which most communications have been filed were Republic of Korea and Jamaica, these states do not appear in my dataset at all.
I disagree with Ullman and Von Staden’s conclusion based on the comparison between the two datasets. One of the main explanations for the difference in the datasets would be that whereas their dataset included all the communications filed since 1977, my dataset is much more limited in time, and all but three of the communications in it were filed after 2002. Since the patterns of filing communications against certain states have majorly changed over time, this can indeed explain the differences that they observe in the patterns of states. Therefore, comparing such different time periods might equal to comparing ‘apples and oranges’.
Moreover, turning to the specific examples, almost all of the communications filed against Jamaica have been filed during the time period of 1984-1991. Further, in 1997 Jamaica notified of its denunciation of the Optional Protocol. Therefore, bringing Jamaica as a counter example for my datasets that generally begins at 2002 is simply irrelevant.
The example of the Republic of Korea is also not a very good one. In the case of the Republic of Korea most of the communications were eventually united by the HRC into two big communications (Jung et al. (1593-1603/2007) CCPR/C/98/D/1593-1603/2007 (Apr. 30, 2010), and Min-Kyu Jeong et al. (1642-1741/2007) CCPR/C/101/D/1642-1741/2007 (Apr. 27, 2011)). These communications have been united since they had been filed by the same lawyers, and regarding the same subject matter. Therefore, I think that counting those communications separately as did Ullman and Von Staden is wrong methodologically, and their absence from my dataset is not very crucial.
For my final point regarding the bias concern, I shall briefly mention that it is not possible to use the suggested Heckman’s correction in this dataset, since the dependent variable is ordinal. For reasons beyond the scope of this rejoinder, in my opinion using equivalent methods would also have too many disadvantages.
Another point that Ullman and Von Staden raise, is that I was not completely clear about how the dataset was construed. This is of special importance, since many times the compliance of a state with the views in a communication is reviewed a few times until the review is officially complete. I would like to clarify that I chose to include in my dataset only the final grade given to the state. The main reason for that is that the compliance with the views in certain communications is reviewed significantly more times than the compliance with views in other communications, before the inquiry is closed (the pattern is completely unclear). Therefore, in order to prevent the over-representation of certain communications in the dataset, I chose to include only the final grade granted. Ullman and Von Staden are right that generally speaking the compliance with remedies improves over time. Therefore, I did control in the regression for the time that had passed from the decision in the views to the time that the grade was given.
The final point made that Ullman and Von Staden that I would like to address is that there are actually three relevant levels of analysis of the data – the level of the remedy, the level of the communication and the level of the state. However, in my article I directly account only for the first two levels. They suggest using a multi-level mixed effects model accounting for the level of the state as well. However, according to the literature such models are more optimal when there are more than 50 groups and the group size is at least 30 (Maas and Hox, ‘Robustness issues in multilevel regression analysis’ 58 Statistica Neerlandica (2004), 127, at 137). Since in the dataset used for my article there were only 28 states with a maximum group size of 10 communications per-state, it was problematic to use a multilevel model as suggested.
I should note that I was indeed hesitant at first to use all three levels of analysis since the dataset itself is relatively small. However, in retrospect, perhaps it would have been somewhat more correct methodologically to not simply ignore the level of the state as I did, but to account for this level by clustering the standard errors for states as well (and not only for the level of the communications). Therefore, I reran the regression using double standard errors – both for communications and for states. The statistical significance of the results generally did not change. The only variable which became less statistically significant in some specifications following this adjustment was having a national of the state serving as a Committee Member (i.e. having a national as a Committee Member is less significantly associated with compliance).
In the Jewish tradition there is a phrase saying: ‘Just as one knife can only be sharpened by another, so too the wits of a scholar can only be sharpened by his fellow’. Although I disagree with Ullmann and von Staden on certain points, I do agree with them on other points. Most of all, I would like to thank them for the opportunity to rethink and reconsider different theoretical and methodological challenges in my article and in general.
Implementing Decisions of International Human Rights Institutions – Evidence from the United Nations Human Rights Committee: A Rejoinder to Ullmann and von Staden
Written by Vera ShikhelmanEvaluating the effectiveness of institutions is a hard mission, and it causes numerous theoretical and methodological difficulties. This is especially true for evaluating the effectiveness of quasi-judicial institutions such as the United Nations Human Rights Committee (‘HRC’), as my 2019 EJIL article attempted to do. Therefore, first I would like to thank Andreas J. Ullmann and Andreas von Staden for providing me food for thought in their reply to my article (published in the latest issue of EJIL) and opening an important discussion regarding how to think about evaluating effectiveness of non-judicial institutions. Given the space limitation, I will not be able to answer each and every point Ullmann and von Staden raised. Therefore, I focus my rejoinder on three main subjects: the possibility of bias in the dataset, issues regarding the construction of the dataset, and issues regarding levels of analysis. I am very interested in additional comments and discussion, and welcome emails on the subject.
The first, and perhaps most significant, issue raised by the Ullmann and von Staden is that the dataset used for the study could have been biased. This is mainly because not all states answer the requests for follow-up information. Therefore, constructing a dataset that includes only data regarding communications for which the states responded to the request for follow-up information might be problematic, when one’s mission is to assess the general implementation of the views of the HRC in individual communications.
As I mentioned in the article, this was indeed a concern, and I addressed this issue by comparing the states in the dataset that I used for the article, with a previous dataset that I had (which included all communications filed against states during a somewhat equivalent period of time). I tested whether the states in the two dataset might differ regarding human rights score and polity score (using a t-test, given that the number of observations was larger than 30). I performed the t-test at the level of the communications. The analysis showed that there was no statistically significant difference between the two datasets, and thus the dataset used for the article was not necessarily biased. I agree that there might be other relevant characteristics in which the states may differ, besides human rights score and polity score for which I tested. However, since it is impossible to test each and every possible difference between states, the researcher has to make certain methodological choices. I decided to test for human rights and polity score because I assumed that there was a high probability that these two characteristics may influence the decision to file a report. I was also more comfortable with using those two specific variables, since they are very highly correlated with the other variables that I used (r>0.7 in most cases, including variables such as GDP or government effectiveness). Therefore, even if there was a certain bias regarding those variables, it is probably not a very significant one.
It is also important to note in this regard, that (with a few exceptions) in recent years communications are usually filed against states that are more human rights abiding (see Shikhelman, ‘Access to Justice in the United Nations Human Rights Committee’ (2018), 39 Michigan Journal of International Law 453). This problematic situation by itself can explain why we might not see a significant difference between the states against which communications are filed, and the states eventually replying to the follow-up requests.
To further support their concern regarding a possible bias, Ullman and Von Staden compare the number of communications filed against states in my dataset, with the number of communications included in the dataset which they constructed. Ullman and Von Staden argue that in there is a considerable mismatch between the frequency in which sates appear in their datasets and in my dataset. For instance, whereas in their dataset, two of the states against which most communications have been filed were Republic of Korea and Jamaica, these states do not appear in my dataset at all.
I disagree with Ullman and Von Staden’s conclusion based on the comparison between the two datasets. One of the main explanations for the difference in the datasets would be that whereas their dataset included all the communications filed since 1977, my dataset is much more limited in time, and all but three of the communications in it were filed after 2002. Since the patterns of filing communications against certain states have majorly changed over time, this can indeed explain the differences that they observe in the patterns of states. Therefore, comparing such different time periods might equal to comparing ‘apples and oranges’.
Moreover, turning to the specific examples, almost all of the communications filed against Jamaica have been filed during the time period of 1984-1991. Further, in 1997 Jamaica notified of its denunciation of the Optional Protocol. Therefore, bringing Jamaica as a counter example for my datasets that generally begins at 2002 is simply irrelevant.
The example of the Republic of Korea is also not a very good one. In the case of the Republic of Korea most of the communications were eventually united by the HRC into two big communications (Jung et al. (1593-1603/2007) CCPR/C/98/D/1593-1603/2007 (Apr. 30, 2010), and Min-Kyu Jeong et al. (1642-1741/2007) CCPR/C/101/D/1642-1741/2007 (Apr. 27, 2011)). These communications have been united since they had been filed by the same lawyers, and regarding the same subject matter. Therefore, I think that counting those communications separately as did Ullman and Von Staden is wrong methodologically, and their absence from my dataset is not very crucial.
For my final point regarding the bias concern, I shall briefly mention that it is not possible to use the suggested Heckman’s correction in this dataset, since the dependent variable is ordinal. For reasons beyond the scope of this rejoinder, in my opinion using equivalent methods would also have too many disadvantages.
Another point that Ullman and Von Staden raise, is that I was not completely clear about how the dataset was construed. This is of special importance, since many times the compliance of a state with the views in a communication is reviewed a few times until the review is officially complete. I would like to clarify that I chose to include in my dataset only the final grade given to the state. The main reason for that is that the compliance with the views in certain communications is reviewed significantly more times than the compliance with views in other communications, before the inquiry is closed (the pattern is completely unclear). Therefore, in order to prevent the over-representation of certain communications in the dataset, I chose to include only the final grade granted. Ullman and Von Staden are right that generally speaking the compliance with remedies improves over time. Therefore, I did control in the regression for the time that had passed from the decision in the views to the time that the grade was given.
The final point made that Ullman and Von Staden that I would like to address is that there are actually three relevant levels of analysis of the data – the level of the remedy, the level of the communication and the level of the state. However, in my article I directly account only for the first two levels. They suggest using a multi-level mixed effects model accounting for the level of the state as well. However, according to the literature such models are more optimal when there are more than 50 groups and the group size is at least 30 (Maas and Hox, ‘Robustness issues in multilevel regression analysis’ 58 Statistica Neerlandica (2004), 127, at 137). Since in the dataset used for my article there were only 28 states with a maximum group size of 10 communications per-state, it was problematic to use a multilevel model as suggested.
I should note that I was indeed hesitant at first to use all three levels of analysis since the dataset itself is relatively small. However, in retrospect, perhaps it would have been somewhat more correct methodologically to not simply ignore the level of the state as I did, but to account for this level by clustering the standard errors for states as well (and not only for the level of the communications). Therefore, I reran the regression using double standard errors – both for communications and for states. The statistical significance of the results generally did not change. The only variable which became less statistically significant in some specifications following this adjustment was having a national of the state serving as a Committee Member (i.e. having a national as a Committee Member is less significantly associated with compliance).
In the Jewish tradition there is a phrase saying: ‘Just as one knife can only be sharpened by another, so too the wits of a scholar can only be sharpened by his fellow’. Although I disagree with Ullmann and von Staden on certain points, I do agree with them on other points. Most of all, I would like to thank them for the opportunity to rethink and reconsider different theoretical and methodological challenges in my article and in general.
Share this:
Related
Categories
Tags
Leave a Comment
Comments for this post are closed
Comments