The National System of Information and Single Registry of State Beneficiaries (Sistema Nacional de Información y Registro Único de Beneficiarios del Estado, SINIRUBE) integrates social databases across multiple institutions for a total of 34 public institutions and 42 social assistance programs in Costa Rica.
This advanced IT platform is a system that quickly and timely identifies the beneficiaries of all social programs with homogeneous criteria, avoids duplication, and will ensure that public funds are allocated to the people in need to improve the performance of the social sector and contribute to poverty reduction in the country. However, the inconsistency of the collection methodologies and the lack of error detection automated mechanisms in the collection process jeopardize the integrity of the data collected.
More than 555,000 people from 313,000 households in ten regions across the country have been part of the social protection and promotion programs of Instituto Mixto de Ayuda Social (IMAS) in 2021, with a total investment of CRC 188 billion (c. US 268 million). It is estimated that 30%-50% of the data captured through the system have errors, so its use in Data Lakes for decision-making would corrupt the efforts.
Fighting poverty and inequality largely depends on implementing accurate, efficient, equitable, and transparent social policies. Consequently, social institutions undertake design, planning, and evaluation processes that require accurate and reliable data.
Consequently, verifying the quality of data related to social development is of utmost importance, especially considering that in the case of social registries, their consolidation involves the collection and processing of millions of data by thousands of people, so errors and inaccuracies are recurrent. In addition, social registry data are used in countless studies, actions, and decisions regarding public policy, such as poverty measurement, budget allocation to combat it, and the targeting of resources, to mention just a few examples.
Thus, it is more relevant to ensure that social data correctly represent what they were designed for. Hence the importance of having a data quality analysis system that identifies, documents, and reports errors and anomalies in social records to correct and avoid them. Accordingly, from the onset of social policies, we contribute to improving the quality of life of the poorest and most vulnerable populations.
The response to the problem varies depending on each entity and the conclusion could be that the most common method is manual validation on a representative sample of the data universe.
In the case of Bono Proteger, it is worth noting the practical case of data quality analysis that the Ministry of Labor and Social Security (Ministerio de Trabajo y Seguridad Social, MTSS) has implemented to identify, suspend, and recover accreditations that do not correspond to Bono Proteger. Each time a Bono Proteger is paid, review filters are applied to the databases (spreadsheets) to validate compliance with the requirements for program beneficiaries. A random and manual review identifies those who do not meet the established requirements or who have changed their status and no longer need to continue receiving the bonus. Then, an administrative investigation is initiated, including the possibility of a hearing so that individuals may exercise their right of defense and provide the evidence they deem appropriate. If the process determines that the person complied with the requirements, they can receive the subsequent payments. Otherwise, an accreditations recovery procedure is initiated.
In this process and many others, the amount of effort, resources, and time required to identify data quality issues outweigh its gains.
QualIA is a program that identifies and analyzes atypical data or data with a high probability of error in social registers and questionnaires, streamlining and optimizing mechanisms to improve data quality developed by the company ProsperIA. The project included the development of probabilistic and predictive models to improve the data quality in social assistance application forms validated through a random sampling study of anonymous SINIRUBE databases.
The solution provides one or more algorithms allowing three different functions and analysis capabilities. These functions are deployed through an application programming interface (API), which SINIRUBE will use with its servers or the web infrastructure built by ProsperIA around the API for six months.
Law 8968 for the Protection of Individuals against Processing of their Personal Data is in force. Section III sets out the data processing security and confidentiality standards and expectations. SINIRUBE and this project observe and adhere to the existing legal standards.
The agencies and/or institutions collaborating in developing the QualIA prototype and pilot have signed a cooperation agreement allowing the exchange of anonymous data.
Costa Rica
Social inclusion
Costa Rica, Costa Rica
SINIRUBE
Development of the Model
Esta es una herramienta práctica de autoevaluación ética de IA para emprendedores, que permite llevar a cabo un análisis de la solución tecnológica basada en IA y manejo de datos.
Este documento presenta el Informe Final de la auditoría algorítmica del sistema Laura, llevada a cabo por Eticas Research and Consulting.
En conjunto con la OECD publicamos el manual de ciencia de datos, el cual busca proveer recomendaciones técnicas a los equipos desarrolladores de sistemas de IA.