Have you ever ever pooled many well being surveys which have a posh sampling design and embrace variables reminiscent of main sampling unit (PSU), stratum, and sampling weights? Solely to search out that a few of these surveys didn’t have any of those variables in any respect (e.g., PSU and sampling weights solely, not stratum)? Right here’s an environment friendly resolution packing the datasets in an inventory and making use of capabilities that can ship the outcomes you’re searching for.
In a pooled dataset of many well being surveys, a few of which can not have all three variables (PSU, stratum, and sampling weights), for those who had been to conduct a complete-case evaluation, you’ll solely hold the datasets with knowledge for the three variables. This may result in a considerable pattern measurement discount as we might want to exclude the datasets with, for instance, PSU and sampling weights however with out knowledge for the stratum variable.
My resolution? To research every dataset independently, leveraging no matter variable they’ve (e.g., PSU and sampling weights solely, not stratum). Sadly, analyzing every dataset at a time will likely be inefficient. Nonetheless, having confronted this situation a number of occasions in over seven years {of professional} expertise in analysis and dozens…