Paperback & Hardcover
Approach & Methodology
Insights in Global Health is an outgrowth of the Virtue Foundation Actionable Data Initiative. Harnessing advancements in technology and machine learning, the Foundation has created a first-of-its-kind mapping-and-matching global health platform for local nonprofits and healthcare organizations.
This compendium represents the first product of this initiative, providing a curated view of demand-side data and enabling volunteer medical professionals, governments, and stakeholders to better identify where healthcare services are available and additional resources are needed. Each of the 24 chapters presents a brief country overview, a map depicting the locations of healthcare facilities, and a curated list of nonprofit organizations and healthcare facilities. QR codes associated with each listing link back to the web platform, providing access to further information about the organizations as well as the ability to interact with the data in a customizable manner.
Nonprofit Data Collection and Curation
Using specific keywords and medical specialty descriptors, a pipeline for querying and identifying nonprofit websites was created for targeted regions. Forty-three separate medical specialties, 7 generic terms, and 4 nonprofit keywords were applied to produce a total of 5,076 unique query combinations and executed on various search engines and social media platforms. This resulted in 1,060,451 candidate nonprofit web pages that were subsequently indexed using custom crawlers built with open-source Python libraries as a distributed Spark application, running on parallel workers on Amazon Web Services. This list was complemented with known public resources, such as the United Nations database for NGOs.
A decision-tree script extracted the domains, deduplicated web pages, and created a recursive multilevel indexing tree, identifying 66,121 unique candidate nonprofit websites. Further data including contact information, donation links, and other metadata were captured using regular expressions and pattern matching techniques. To minimize the likelihood of collected websites not representing an actual healthcare nonprofit organization and to minimize noise, machine learning methods were employed to filter the data.
A training set of 11,877 websites was thus manually labeled by the Virtue Foundation volunteer team. An auto-tuned word N-Gram text modeler using token occurrences, optimized for sensitivity over precision, achieved best performance on this training set. In addition to being able to predict whether or not a website represents a nonprofit, the classifier was also able to determine whether the organization’s activities were concentrated on healthcare. The inference process applied to the 66,121 candidate websites returned 3,052 organizations as healthcare nonprofits. Predicting whether a nonprofit was involved in healthcare proved challenging as numerous healthcare-related websites belonging to educational organizations, publications, and for-profit entities have a high likelihood of being incorrectly classified as providing healthcare. Therefore, all 3,052 organizations underwent further manual review to establish legitimacy, identify healthcare services provided, confirm countries of activity, and find additional relevant information. At completion, the total number of nonprofit organizations was narrowed down to 1,610. Due to space constraints, only 1,070 nonprofit organizations were ultimately included in the book, based on their quality and relevance. The companion online platform provides a more comprehensive and regularly updated dataset.
Healthcare Facility Data Collection and Curation
Healthcare facility data was primarily sourced from the OpenStreetMap humanitarian data layer. Given the abundant, and at times outdated, hospital listings in the OpenStreetMap dataset, uniform filtering based on building footprint, facility name, and online presence was applied to limit the data to hospitals and facilities with the highest impact and capacity. Area-based filtering was employed to exclude buildings too small to be a hospital based on square footage. Keyword filtering was then used to exclude non-hospitals on name (e.g., “health post”), factoring for linguistic differences.
Lastly, to establish activity, a scoring system was derived for each candidate facility by searching for related websites, local directories, government reports, social media posts, etc. Public APIs, including Bing Maps, OpenStreetMap Nominatim, and Geonames were called to capture and externally validate additional details. The purpose of these integrations was to (1) reverse-geocode hospital coordinates to return missing addresses, and (2) validate the location of the hospitals with close proximity to country borders. This approach was premised on the assumption that principal hospitals are more likely to be referenced online, whether by individuals, governments, or nonprofits. Filtering was complemented by several rounds of manual curation and review.
Data contained in this compendium presents only the first steps in improving the nonprofit and healthcare facility landscape in low-income countries. Much work remains to be done to better our understanding of the granularity specific to each region and healthcare system. The Foundations’s development of a vulnerability index based upon macro-level health statistics, bed capacity, and population mobility in targeted regions is a step in this direction. Additionally, insights from social media activity can help identify acute medical conditions in real time and facilitate rapid assistance where needed. Information sources such as public satellite data and ground images obtained from online user activity can be further used in conjunction with machine-learning algorithms to validate the location of hospitals, estimate facility area, and even predict the number of beds needed. Together, these and other features will enhance the global marketplace for the exchange of healthcare services.