Sorry, you need to enable JavaScript to visit this website.

Pilot

Qualia

Costa Rica. Costa Rica
Qualia

The National System of Information and Single Registry of State Beneficiaries (Sistema Nacional de Información y Registro Único de Beneficiarios del Estado, SINIRUBE) integrates social databases across multiple institutions for a total of 34 public institutions and 42 social assistance programs in Costa Rica.

This advanced IT platform is a system that quickly and timely identifies the beneficiaries of all social programs with homogeneous criteria, avoids duplication, and will ensure that public funds are allocated to the people in need to improve the performance of the social sector and contribute to poverty reduction in the country. However, the inconsistency of the collection methodologies and the lack of error detection automated mechanisms in the collection process jeopardize the integrity of the data collected.

More than 555,000 people from 313,000 households in ten regions across the country have been part of the social protection and promotion programs of Instituto Mixto de Ayuda Social (IMAS) in 2021, with a total investment of CRC 188 billion (c. US 268 million). It is estimated that 30%-50% of the data captured through the system have errors, so its use in Data Lakes for decision-making would corrupt the efforts.

Problem to be solved

Fighting poverty and inequality largely depends on implementing accurate, efficient, equitable, and transparent social policies. Consequently, social institutions undertake design, planning, and evaluation processes that require accurate and reliable data.

Consequently, verifying the quality of data related to social development is of utmost importance, especially considering that in the case of social registries, their consolidation involves the collection and processing of millions of data by thousands of people, so errors and inaccuracies are recurrent. In addition, social registry data are used in countless studies, actions, and decisions regarding public policy, such as poverty measurement, budget allocation to combat it, and the targeting of resources, to mention just a few examples.

Thus, it is more relevant to ensure that social data correctly represent what they were designed for. Hence the importance of having a data quality analysis system that identifies, documents, and reports errors and anomalies in social records to correct and avoid them. Accordingly, from the onset of social policies, we contribute to improving the quality of life of the poorest and most vulnerable populations.

Populations affected by the problem
  1. Vulnerable populations whose social assistance processes may be affected by the process and technical difficulties of the SINIRUBE systems. E.g., the process rejects an individual for any of the following causes:
    • There is no unique code for households. It is not possible to assign multiple households to a single house. If two or more households share the same house, their registration may be repeated.
    • Repeated individuals. Foreign and undocumented individuals may appear multiple times in the register with different ID numbers, including undocumented, with a residence permit, and residents.
    • Unidentified households. There may be cases of unidentified households in which some of their members change residence after the social inclusion form (FIS) is completed.
  2. The work of the personnel of SINIRUBE and related institutions is delayed or paralyzed as they need to use manual data validation methods, producing bottlenecks in the processes and failing to guarantee accurate data validation.

The current response to this problem, considering the related institutions

The response to the problem varies depending on each entity and the conclusion could be that the most common method is manual validation on a representative sample of the data universe.

In the case of Bono Proteger, it is worth noting the practical case of data quality analysis that the Ministry of Labor and Social Security (Ministerio de Trabajo y Seguridad Social, MTSS) has implemented to identify, suspend, and recover accreditations that do not correspond to Bono Proteger. Each time a Bono Proteger is paid, review filters are applied to the databases (spreadsheets) to validate compliance with the requirements for program beneficiaries. A random and manual review identifies those who do not meet the established requirements or who have changed their status and no longer need to continue receiving the bonus. Then, an administrative investigation is initiated, including the possibility of a hearing so that individuals may exercise their right of defense and provide the evidence they deem appropriate. If the process determines that the person complied with the requirements, they can receive the subsequent payments. Otherwise, an accreditations recovery procedure is initiated.

In this process and many others, the amount of effort, resources, and time required to identify data quality issues outweigh its gains.

Proposal to solve the problem using AI

QualIA is a program that identifies and analyzes atypical data or data with a high probability of error in social registers and questionnaires, streamlining and optimizing mechanisms to improve data quality developed by the company ProsperIA. The project included the development of probabilistic and predictive models to improve the data quality in social assistance application forms validated through a random sampling study of anonymous SINIRUBE databases.

The solution provides one or more algorithms allowing three different functions and analysis capabilities. These functions are deployed through an application programming interface (API), which SINIRUBE will use with its servers or the web infrastructure built by ProsperIA around the API for six months.

What security considerations, national laws, or standards should be considered to use each source of information?

Law 8968 for the Protection of Individuals against Processing of their Personal Data is in force. Section III sets out the data processing security and confidentiality standards and expectations. SINIRUBE and this project observe and adhere to the existing legal standards.

The agencies and/or institutions collaborating in developing the QualIA prototype and pilot have signed a cooperation agreement allowing the exchange of anonymous data.

Progress/results to March 2022
  1. A summary of the state-of-the-art methodologies for data quality analysis and their applicability to the SINIRUBE case was conducted. This information was socialized with the top management of the institution and Instituto Mixto de Ayuda Social (IMAS).
  2. Probabilistic and predictive models were developed to improve data quality in social assistance application forms.
  3. Based on the previous point, the SINIRUBE data quality analysis API was designed and tested, including the following functionalities:
    • Identification of implausible records
    • Identification of variable and implausible values
    • Imputation and recommendation of probable values

Goals for 2022-II
  1. Present the progress and a demo to the new management of the Institutions.
  2. Achieve approval and develop the RCT Pilot in a public institution to tune up the models and deliver the API governance to SINIRUBE
  3. Implement the model validation contingency plan, hopefully with data from two public institutions.
  4. After the pilot test is finalized, to deliver the implemented API to SINIRUBE as a scalable infrastructure web service.

Main implementation challenges
  1. Change of government and authorities, risk of prioritization of the solution, and the identified pilots. Also, the appointment of a new SINIRUBE official project lead.
  2. Limited SINIRUBE resources for the development of the on-site pilot.
  3. SINIRUBE's urgency to close the project and present results.

Main AI challenges identified
  1. Limited availability and openness of institutions to collaborate with resources and data, models could thus be biased as they are subject to hermetic cultures in data management and AI projects.
  2. Should a contingency plan for the on-site pilot be activated, assess the gap between models built using ENAHO and RIS Digital data. It may delay pilot development and compromise the results as they would potentially not be sourcing from the same data architecture.

Hub

Costa Rica

Sector

Social inclusion

Location

Costa Rica, Costa Rica

Executing Entity

SINIRUBE

State

Development of the Model

Contact

fairlac@iadb.org

It may interest you
Publications

The Regional Landscape and 12 Country Snapshots

Publications

Responsible and Widespread Adoption of Artificial Intelligence in Latin America and the Caribbean