Goal — Graph social and professional affiliation for prescribers (Physicians, RNs) and derive at influencers/opinion leaders and physicians that have the most klout in a particular cluster
Proposal — The Physician Affinity Graph will use publicly available data sets to form a network graph between physicians and other entities involved in health care services. ML will be used in this context for clustering based on common features/dimensions between physicians and these entities.
Methodology — A lot depends on the amount of publicly available datasets that can be linked to a physician. We will heavily leverage the federal level Medicare and Medicaid claims data release through Freedom of information act (FOIA). The physician ID from these data will be used to uniquely identify a doctor. These data detail the type of treatment, disease area, health care provider details, claims details (insurance) along with geographical information (state, city etc.).
Potential other data sets (depending on their availability) that we will incrementally add to the graph are:
- Pharmaceutical companies (data made public through Sunshine act - source ProPublica)
- Medical associations (in conversation with AMA on Physician masterlist dataset for academic use), Professional groups around disease areas (researching)
- Government license/registration data (available by individual state gov sites)
- Hospital affiliations, Journal and publications (can be extracted through the claims data on Medicare/Medicaid)
1. Create a list of different data sources to procure publicly available healthcare claims data, pharmaceutical relationship (sunshine act) data, and hospital/practice/association affiliation data sets.
2. Analyze the format in which data is available from these sources and scrape the data in a repository. Eg CSV, JSON, API, Website etc
3. Create a database model using these data sources to analyze the physician relationships with hospitals, medical colleges, medical associations, pharmaceuticals, journals and other professional networks
4. Munge and analyze the raw data collected above and create a network graph
5. We will be using a combination of Python, R, D3, Neo4j and Github for coding
Long term goals:
- Develop a influence score based ranking of physicians
- Develop a 'Key Opinion Leader' KOL recommendation engine for payers, pharmaceutical companies, patients, research and academia