{"id":1328,"date":"2023-12-01T17:20:44","date_gmt":"2023-12-01T16:20:44","guid":{"rendered":"http:\/\/steveharoz.com\/blog\/?p=1328"},"modified":"2024-03-15T10:07:22","modified_gmt":"2024-03-15T09:07:22","slug":"more-precise-measures-increase-standardized-effect-sizes","status":"publish","type":"post","link":"http:\/\/steveharoz.com\/blog\/2023\/more-precise-measures-increase-standardized-effect-sizes\/","title":{"rendered":"More precise measures increase standardized effect sizes"},"content":{"rendered":"\n<p>My last post \u2013 <a href=\"http:\/\/steveharoz.com\/blog\/2023\/simulating-how-replicate-trial-count-impacts-cohens-d-effect-size\/\" data-type=\"post\" data-id=\"1290\">Simulating how replicate trial count impacts Cohen\u2019s d effect size<\/a> \u2013 focused mostly on how parameters of within-subjects experiments impact effect size. Here, I&#8217;ll clarify how measurement precision in between-subjects experiments can substantially influence standardized effect sizes. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">More replicate trials = better precision<\/h2>\n\n\n\n<p>A precise measurement is always an aim of an experiment, although practical and budgetary limitations often get in the way. Attentive subjects, carefully calibrated equipment, and a well controlled environment can all improve measurement precision. Averaging together many replicate trials is another approach for improving precision, and it can also be easily simulated.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>To show that replicate trial count is a proxy for precision, I ran a simulation that varied the number of replicate trials. For each subject, I sampled 100 measurements from a normal distribution. Then I used either (a) just the first measurement, (b) the average of the first 10 measurements, or (c) the average of all 100 measurements as that subject&#8217;s overall measurement. With more replicate trials, the variability of the overall measure reduces. More replicate trials = better precision.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"6000\" height=\"7200\" src=\"http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2.png\" alt=\"Averaging more replicate trials is equivalent to a more precise measurement\" class=\"wp-image-1354\" srcset=\"http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2.png 6000w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2-250x300.png 250w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2-853x1024.png 853w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2-768x922.png 768w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2-1280x1536.png 1280w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2-1707x2048.png 1707w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-CLM-2-2-624x749.png 624w\" sizes=\"auto, (max-width: 6000px) 100vw, 6000px\" \/><\/a><\/figure>\n\n\n\n<p>Notice that the blue lines are much more tightly packed with 100 replicate trials than with 10 or 1.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Simulating the same effect with different measurement precision<\/h2>\n\n\n\n<p>To see the impact of increasing precision, I ran a simulation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Two conditions: A and B<\/li>\n\n\n\n<li>A normally distributed continuous response<\/li>\n\n\n\n<li>Condition B has responses 0.2 higher than condition A (the \u201ctrue\u201d effect)<\/li>\n\n\n\n<li>Normally distributed individual differences in the baseline (sd = 0.1)<\/li>\n\n\n\n<li>Normally distributed response noise (sd = 1)<\/li>\n<\/ul>\n\n\n\n<p>The formula for a response is:<\/p>\n\n\n\n<p><code>response = subject_baseline + 0.2*(condition == B) + trial_response_noise<\/code><\/p>\n\n\n\n<p>For each replicate trial count (1, 5, 25, 125, 625, and 3125), I ran 5000 simulations and plotted the distribution of Cohen&#8217;s d<sub>a<\/sub>, which is the classic formula for Cohen&#8217;s d based on subject means. <a href=\"https:\/\/osf.io\/jfa5d\"><strong>Here is the code<\/strong><\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist.png\"><img loading=\"lazy\" decoding=\"async\" width=\"7200\" height=\"6000\" src=\"http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist.png\" alt=\"\" class=\"wp-image-1334\" srcset=\"http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist.png 7200w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist-300x250.png 300w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist-1024x853.png 1024w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist-768x640.png 768w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist-1536x1280.png 1536w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist-2048x1707.png 2048w, http:\/\/steveharoz.com\/blog\/wp-content\/uploads\/2023\/12\/precision-dist-624x520.png 624w\" sizes=\"auto, (max-width: 7200px) 100vw, 7200px\" \/><\/a><\/figure>\n\n\n\n<p>The exact same effect measured under different between-subject experiments yields a different effect size. As the replicate trial count (measurement precision) increase, the effect size increases. <\/p>\n\n\n\n<p>The reason is that increasing the measurement precision, reduces the standard deviation. Since Cohen&#8217;s d is<code> (mean(a) - mean(b)) \/ sd(all)<\/code>, and the standard deviation is in denominator, higher precision results in a lower standard deviation, which results in a lower standardized effect size.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Be careful comparing effect sizes across experiments<\/h2>\n\n\n\n<p>Measurement precision can be impacted by many factors that are often tough to pin down. So it&#8217;s worth having a certain degree of skepticism when interpreting nominal effect size values or comparing them across experiments. Even when experiments test the same effect, differences in precision can yield very different standardized effect sizes.<\/p>\n\n\n\n<p>The sensitivity to precision raises concerns about the validity of some meta-analyses. If all of the experiments in a meta-analysis are carefully performed direct replications, then precision should be similar and the results can be pooled. But I don&#8217;t see how indirect replications or unrelated studies on a similar phenomenon can be used in a meta-analysis. Any difference in experiment design could change the precision, yielding subtly or wildly different standardized effect sizes. <\/p>\n\n\n\n<p>If the simulated precision levels above were each different experiments in a meta-analysis, it&#8217;d be tough to draw a conclusion due to the high variance. And if one were a replication of another, there could be drama due to the order of magnitude difference in effect sizes. So keep that in mind when judging replicability using effect size value or a meta-analysis with high variance between experiments.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaways<\/h2>\n\n\n\n<p><strong>Standardized effect size values have little meaning outside of a specific experiment<\/strong>. Effect size is a function of the experiment design and analysis approach. So describing an effect outside of the context of the experiment just isn&#8217;t meaningful.<\/p>\n\n\n\n<p><strong>Small, medium, and large are not meaningful categories<\/strong>. If the nominal value can vary with the experiment design, then arbitrary demarcations of categories doesn&#8217;t add any practical clarity for anyone.<\/p>\n\n\n\n<p><strong>Focus on uncertainty instead of point estimates<\/strong>. 95% confidence intervals of an effect size are much more interesting to me than an estimate of a specific value. An effect size with a 95% CI of [0.4, 0.6] is much more interesting to me than an effect size with a 95% CI of [-1, 3].<\/p>\n\n\n\n<p><em>Thanks to <a href=\"http:\/\/aaroncaldwell.us\">Aaron Caldwell<\/a> and <a href=\"https:\/\/matthewbjane.com\/\">Matthew B Jan\u00e9<\/a> for helpful comments<\/em>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>This post is part 2 of a series on statistical power and methodological errors. See also:<\/strong><\/p>\n\n\n\n<p><a href=\"http:\/\/steveharoz.com\/blog\/2023\/simulating-how-replicate-trial-count-impacts-cohens-d-effect-size\/\" data-type=\"post\" data-id=\"1290\">Simulating how replicate trial count impacts Cohen\u2019s d effect size<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/steveharoz.com\/blog\/2024\/wrong-conclusions-built-on-statistical-errors\/\" data-type=\"post\" data-id=\"1361\">Invalid Conclusions Built on Statistical Errors<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My last post \u2013 Simulating how replicate trial count impacts Cohen\u2019s d effect size \u2013 focused mostly on how parameters of within-subjects experiments impact effect size. Here, I&#8217;ll clarify how measurement precision in between-subjects experiments can substantially influence standardized effect sizes. More replicate trials = better precision A precise measurement is always an aim of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1329,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"federate","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1328","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/posts\/1328","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/comments?post=1328"}],"version-history":[{"count":15,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/posts\/1328\/revisions"}],"predecessor-version":[{"id":1401,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/posts\/1328\/revisions\/1401"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/media\/1329"}],"wp:attachment":[{"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/media?parent=1328"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/categories?post=1328"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/steveharoz.com\/blog\/wp-json\/wp\/v2\/tags?post=1328"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}