Nav Search

Smart Solutions to Inequalities Created by Data Gap

Data can improve quality of life. Skillful collection and utilization of data has the potential to optimize fields from healthcare to education to finance. It can help solve social problems; organizations such as are mining data to help solve entrenched issues such as homelessness and human trafficking.

Yet, data also has the power to marginalize. As use of data grows, segments of the population that are not integrated into data collection systems are falling behind.

According to a recent report from the Center for Data Innovation, our society faces a growing “data divide.” This report explains that, much like the digital divide, which saw social and economic disadvantages result from lack of access to technology, the data divide results in social and economic inequalities from a lack of collection or use of data about an individual or community. Communities experiencing better data collection benefit from better health care, increased access to financial services, education and civic participation, while communities in “data deserts” may be overlooked.

For example, advanced, research hospitals collect massive amounts of information on newborns. Yet, babies born at smaller hospitals may leave with no digital footprint at all. Similarly, advanced schools can use data to create personalized learning environments, while schools in poorer neighborhoods may struggle even to offer Internet services.

Data Determines Representation

Even before the emergence of “big data,” communities were impacted by the kind of information collected about them. As the CDI report explains, decennial census data is used to apportion congressional seats among states, as well as to distribute federal funding. But these data collection systems have their flaws: civil rights organizations point out that population counts of minority and low-income populations often are inaccurate.

This can be due to collection methods and well as characteristics of the target population. According to civil and human rights coalition The Leadership Conference, collection methods such as telephone calls and mailings may not reach populations with lower education levels, lower literacy abilities, and difficulty with the English language. Furthermore, these populations may distrust government programs due to fear that law enforcement or immigration officials will deport or incarcerate them.

The end result of less census participation? Less representation. Lack of representation leads poverty-stricken areas into a downward spiral—schools are inadequately funded, leading to lower quality education, leading to worse job prospects and increased social problems. Accurate data can stem this cycle.

Issues with Public Data

Issues of under-representation become exacerbated when trying to implement data solutions on a local scale. For example, Governing features the story of Ted Smith, chief of civic innovation for the city of Louisville, KY, who cannot find complete data sets to cross-reference prisoners with mental health issues and those with substance abuse problems. Smith explains, “The world of data is not perfect…and when it comes to case management, certain data sets can be scarce.”

Moreover, some kinds of data collected by government entities does not have the same sensitivity or accuracy that data collected from advanced technology—such as Internet searches or wearables—does. Publicly collected data suffers not only from inaccurate collection methods, but also from outdated systems. One example of outdated systems is the much-publicized Office of Personnel Management, which manually processes retirement papers of government workers in a limestone cave in Pennsylvania. This kind of reliance on paperwork leads to increased processing times—not to mention massive storage issues.

Public data collection about underrepresented populations can often be inaccurate. As consultant Mike Meikle explains in an article in Social Work Today, social work data is “very dirty…a lot of unnecessary information, incorrect entries, and even duplicate information in these systems because they’ve never really been customized for the social services world.” Therefore, utilizing this data for programs like Smith’s prisoner project in Louisville can prove difficult.


Innovative social entrepreneurs and researcher are working around the data gap by looking in unlikely places, creating their own data, or partnering with government agencies to leverage already collected data. Their solutions provide a means for populations to be tracked and assisted that would otherwise not be counted.

Unlikely Data Sources

When specific, individually reported data can’t be found, anonymous data can be used to extrapolate findings. For example, in a paper presented at KDD 2014 researcher Varoon Bashyakarla and his team evaluated personal ads on Craigslist to track men advertising for sex with men who may not identify as gay. This population could be less likely to use protection and may not have access to resources in the case of AIDS/HIV infection. If federal funds for AIDS/HIV treatment are targeted only at communities with large gay populations, under-the-radar populations may not be served.

Creating Proxy Data

Another model of introducing data to non-tracked populations is to expand the definition of “usable data.” In an article reporting LendUp’s $50 million funding, TechCrunch explains that the startup aims to redefine payday lending and make the loan experience more fair and transparent for unbanked Americans.

In LendUp’s model, customers start out with a small loan, which is awarded based on both traditional data from financial sources as well as meaningful, yet untraditional, data such as on-time rent and bill payments. Cofounder Jake Rosenberg explains in The New York Times that, as customers engage more with LendUp’s products and financial education courses, they move up in credit status. Eventually LendUp reports this internal credit building to major credit bureaus, giving customers the opportunity to enter the financial market.

Leveraging Existing Data

As reported in a 1776 article, when government entities open up their data collection to researchers, social entrepreneurs, and even large companies, than technology solutions can be brought to the populations that need it most—and at a pace far faster than government agencies typically operate. One example is the U.S. Department of Health and Human Services’ HHS IDEA Lab, which was founded in 2013 to improve how the Department delivers on its mission.

Startups Bridge the Gap

The growing gap between the data “haves” and “have-nots” will need to be addressed with policy at the federal, state and local levels. In their recent report, the Center for Data Innovation recommends four tactics to close the data gap:

  1. Continue government data collection programs that focus on hard-to-reach and underrepresented communities.
  2. Ensure that funding programs aimed at closing the digital divide consider the impact on data poverty.
  3. Ensure that digital literacy programs help individuals to understand data-producing technologies, such as social media and the Internet of things.
  4. Encourage civic leaders in low-income neighborhoods understand the benefits of data and how to integrate technology solutions into grant proposals.

But policy reform moves slowly compared to the speed of data creation, and the policies needed to adequately address the data gap are vast, ranging from updating Census collection methods to providing funding for internet access in low-income schools and healthcare centers. As policy solutions unfold over time, innovative startups and non-profits can begin to bridge the gap now by considering populations that may not be reflected in traditional—or digital—data.

Emily Brown

Emily works in urban planning, helping cities to become more competitive. She was named as one of 40 under 40 economic developers, and has taken part in a successful Kickstarter…