Stratified sampling and cluster sampling are two of the four types of probabilistic sampling. I suggest we combine them... it might seem weird, yet it really helps with managing random samples.
One thing is certain regarding Romania and managing population data records – there is no possibility to draw a random sample of citizens from a database containing contact information for all Romanian residents, so that you may claim to perform an adequate probabilistic sampling, with each person having an equal chance of being selected in the sample, or at the very least so you could compute the probability of being selected for each individual that gets drawn in the sample. Personally, I had the opportunity to experience a collaboration with D.E.P.A.B.D. (The Directorate for Persons Record and Databases Management) that was responsible for randomly extracting addresses following an algorithm I supplied. I needed a sample of 5000 addresses for Romanian residents aged 50 and above. Even so, I had to design the sample as a multi-stratified cluster based and extract the localities, and specify how many addresses I needed for each locality. The collaboration was somewhat good, notwithstanding the long time it took to complete. What surprised me, though, was when after several weeks of waiting, I finally received the database containing addresses, only to find out that for some rural localities, for which the concept of a street is a foreign one, they were unable to perform the sampling. I panicked... we eventually found a solution to maintain the conditions needed for a probabilistic sample for these administrative units as well, but it delayed us by 2 weeks.
Now, before going into detail on multi-stratification, I wish to highlight some particularities about Romania and the way its managed or how its territory is organized. I will employ already existing data published by several authorities. I also have several data and I noticed there are discrepancies compared to what one might find on INSSE, provided you are patient and process their files. By the way, INSSE’s structuring of the files containing population details at settlement level is severely lacking. I could never figure out why the SIRUTA codes (unique code for each settlement in Romania), managed by an entity which has a responsibility to organize the territorial management of the country, are not found throughout all the INSSE files, the latter preferring to use text documents. If you are lucky enough to find excel documents, you can be sure you will find the same settlement written sometimes with diacritics, sometimes without, and when dealing with rural areas, you will only find data on communes (and not villages). One wonders... which is why I am stocking up on my patience reserves for the data for the census that is just beginning (named December 1st 2021) and hope that they learned how to create smart files.
A few data about Romania
Population: around 20 mil
Area: 238,397 km2
Population density: 84,4 inh./km²
41 counties
7 historical regions (București, Ardeal, Banat/ Crișana/ Maramureș, Moldova, Muntenia, Oltenia, Dobrogea) or 8 micro-regions (NUTS 2) defined by INSSE, somewhat more balanced (Bucuresti – Ilfov, Nord-Vest, Centru, Nord – Est, Sud – Est, Sud – Muntenia, Sud - Vest Oltenia, Vest).
In 2016 there were 3181 territorial administrative units (a file published by Romanian authorities on Eurostat), called LAUs in European lingo. These AUs, short for administrative units, are meant to manage several settlements. That can mean municipalities, towns, or communes. There are no self-managed villages. Villages, standing at over 10.000 total, are assigned for management to towns or other villages, the latter serving as communes. Among the 3181 LAUs we only find those villages which act as communes. Apart from these, there are also an additional 10 thousand or so villages, pardon my repetition, which are subordinated to a town or a commune.
I would also like to focus your attention on population density. I provided some data above, but without comparing the number with data from other countries one can not say whether Romania is crowded or rather bare.
Below you’ll find a map from Eurostat. Blue areas are low density, people have plenty of available space and live apart, whereas orange areas are denser. As you can see, there’s plenty of space to go around in our country, Romania’s settlements are rather scattered.
Click here for part II.