T.P. Caruso & Associates

Envisioning a digital infrastructure for a Learning Health System

Tag Archives: Health Data Cloud

No PHI

Comments Off

A health information cloud (HICloud) doesn’t need Protected Health Information (PHI) to support health information exchange.  You can keep PHI private and locked away in your private, secure databases.  Then you can use a portion of that PHI (e.g. First Name, Last Name, Birth Date and Last Four Digits of Social Security Number), if you are authorized, or you can input your own PHI with a password, to get all the health information that is in the cloud.

Such a HICloud is highly secure from abuse by individuals who want to link health information to PHI, for insurance purposes or employment evaluation, etc., because the PHI is not there to be found linked to anything.  You have to provide the PHI to make the link, but more importantly, even with the PHI (so you know Name, Birth Date and Last Four Digits of SSN) you still have to be authorized to be receiving information from any PHI you might enter.   You can’t find everyone with a particular symptom, nor can you find everyone who lives in a particular place of a certain age, and you can’t associate that information with health information because that identifying information is not in the HICloud.

A few challenges arise from such HICloud, but they are surmountable.  For instance, date-related information is needed for proper aggregation of health information, but using relative information such as a stamp that indicates that a particular “chunk” of health information created today was created 20 days after the previous chunk of health information.  Age information can be hidden within categories like 0-4, 5-9, 10-14…older than 80.

The biggest challenge is the use case in which a care provider needs to get health information about an unconscious individual who needs emergency health intervention.   If the individual has the necessary PHI on their person, then there is no problem, but what if all that is known about the individual is where they live, how tall they are, the color of their hair, and the location of a birthmark.  A service could be provided, on a separate system, not directly connected to the HICloud, that allows emergency medical personnel to do searches for PHI.  When the PHI is thought to be found, special authorization could be provided to use that PHI to aggregate that individual’s health information from HICloud.

Secure and privately maintained PHRs, EHRs, and EMRs can contain the necessary PHI for accessing all the health information in the HICloud, but only those with official authorization from the individual who owns the PHI.  Even access to deidentified information about each individual requires a consent from that individual, in case that individual feels insecure sharing even deidentified information.

Such a capability makes sense and is possible.  Furthermore, it makes health information exchange much easier that through a network of health information exchanges, and it provides a means of paying for the costs of the health information storage and health information exchange since those with authorized access to the consented deidentified information can be paying for that access.  Furthermore, a portion of the funds received from organizations searching the consented deidentified information can flow back to the individual providing that information as a further incentive for sharing.   Entire business models can be built around payment for access, health information exchange, and individual consent.

What’s holding us back from building a HICloud that contains NO PHI?

Filed under Think Tanks
Sep 15, 2011

Let’s Think Different

Comments Off

I wonder whether the author of the Healthcare Technology News blog post called “Think Differently – the sequel” thinks that the ONC and their PCAST Workgroup is actually “Thinking Differently”?  Though it was a good summary of the PCAST HIT Report and the PCAST Workgroup Report released in mid-April, the blog suggested this was thinking differently.  Would Steve Jobs (Mr. Think Differently) say that they are thinking differently?  I think not.  Certainly PCAST thought differently when they published their treatise about health information exchange; however, the ONC and its delegates in the HIT Policy Committee are constrained by legacy thinking.  How do we best go from where we are now to there?  That’s legacy thinking and that’s what ONC is doing with the PCAST Health IT Report.

The Biomedical Informatics Think Tank™ thinks differently: that knowing where we are going does not require knowledge of where we are, but only what we want to be able to do with this new exchange architecture.  We are conceiving a technology that will create a Health Data Cloud (see my latest blogs: Why Build a Health Data Cloud and A Health Data Cloud is a Powerful Tool for Health Research), which will attain the objectives set out in the PCAST Health IT Report: Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward:

  1. “Every American will have electronic health records and will have the ability to exercise privacy preferences for how those records are accessed, consistent with law and policy.
  2. Subject to privacy and security rules, a clinician will be able to view all patient data that is available and necessary for treatment. The data will be available across organizational boundaries.
  3. Subject to privacy and security rules, authorized researchers and public health officials will be able to leverage patient data in order to perform multi-patient, multi-entity analyses.”

The technology will be based in the latest thinking with a mathematical foundation for medical semantics, privacy and security in a cloud, social networks that empower individuals, and health ontology.  We start our thinking with what we want and think about the best way to meet those needs with technology that will serve us for a long time into the future.  That thinking is not constrained by current ideas about EHRs and time frames to get new technology standards implemented, a clear constraint of the Health IT Policy Committee’s PCAST Workgroup.  We will be part of a transformation health information exchange through our efforts, just as Apple has been part of the transformation of personal computing since the 70′s.  We think differently, and we invite you to think differently with us.

Here’s some different thinking: ONC should invest in a major effort, exceeding the scale of The Human Genome Project (HGP), to define a New Exchange Architecture with a Universal Exchange Language, and then they can actually build and manage the Health Data Cloud that will be required.  This is just as important as the HGP, and in fact, a continuation of the personalized medicine movement that is partially driven by the results of the HGP.  The HGP cost taxpayers $2.7 billion (HGP Frequently Asked Questions, Oct 2010) and much more was contributed by international government agencies from the UK and other countries, as well as by private and non-profit organizations towards this same goal.  HITECH put ten times this amount, $27 billion into motivating healthcare providers to move into the 21st century of health IT and electronic health record technology.  Only 10% of this funding could transform with a New Exchange Architecture.

Let’s think differently!

Jun 30, 2011

Bad Googling in a Health Information Cloud

Comments Off

In our recently published white paper about Secure Aggregation we nickname the method for aggregating data “Bad Googling” because the method pulls down out of the data cloud a set of data about the patient in which we are interested, thus it is a sensitive method, but it also pulls down patient data from a number of other patients, thus it is non-specific.  No obvious relationship exists between the disaggregated ‘chunks’ of data of any of the patients.  In fact this ‘Bad Googling’ pulls down chunks of data for so many patients that it is statistically impossible to identify the information about a particular patient in this data.  However, the set of data can be further analyzed and, with partial decryption, the information about a particular patient can be identified and aggregated according to their context.  These chunks of data for a particular patient can only be discovered by testing the ability to decrypt them.

Patient data is protected not only by customary or new prediction schemes, but by being a needle in a haystack.  To be more precise, the method finds not just dozens of needles in that haystack that correspond to one patient,  but several millions of needles that look no different to any other needle.  Any method that has to decrypt and examine all the chunks of patient data in the world to aggregate it would be too slow for clinical emergencies.  By knowing where to find the millions of needles, the time for decrypting and examining the data is reduced to within tolerable limits.

Large groups of chunks of disaggregated patient data are assigned to (and could readily be switched between on a regular basis for added security) arbitrary clubs.  The chunks of data that we pull down by “Bad Googling” belong to the same club, which was assigned to each patient in this set to serve as a “non-unique identifier”.  As a result, Bad Googling” pulls down our patient data along with other patient data.

From such a club the method randomly assigns to each patient in the set one of a very large number of possible hidden identifiers.  This hidden identifier represents a specific graph structure which is one of the large numbers of hidden identifiers that could be derived from any particular club.  Though the hidden identifier is assigned to the patient, it remains hidden because the method never assigns to a patient a specific string of characters with a one-to-one association to the hidden identifier.  The hidden identifier always remains solely in control of the patient, and the patient never releases it to anyone, keeping it hidden.

Note that a club is actually defined as a unique collection of nodes names rather like a set of atoms (carbon, oxygen, nitrogen, sulfur etc. ) from which one can build a molecular graph structure.  We say that any graph built from these node names is a hidden identifier.  Each hidden identifier actually represents one of a very large number of possible ways in which a graph can be generated from a club.  Thus C6H1206 can represent a number of molecular graph structures, the most common being called glucose, galactose and fructose (see molecular graph structures).

Apr 28, 2011