One more opportunity to implement data mining techniques in the health care industry will be helping the healthcare insurers to detect fraud transactions so that the other patients can receive better and more affordable healthcare services. This occurs when individuals deceive an insurance company to try to obtain money to which they are not entitled. It happens when someone puts false information on an insurance application and when false or misleading information is given or important information is omitted in an insurance transaction or claim.
Data that can be collected from the patients:
Patient Name |
Age – gender |
date of service |
location |
Service provider |
Problem |
diagnosis reports |
Services Utilized |
Cost of Service |
No. of visits |
X |
24 – M |
3/8/2016 |
HYD |
Apollo |
Heart |
A |
80,000 |
3 |
|
Y |
30 – F |
5/10/2016 |
BANG |
Yashoda |
Eyes |
B |
2,50,000 |
4 |
Location |
Hospitals |
Diagnosis |
Quality Ratings |
Average No. of patients |
Doctors Availability |
Average Service Cost |
HYD |
Apollo |
Heart |
2 |
10 |
Mon, Tue |
50,000 |
Apart from the data we have collected from the patients, we will be gathering one more dataset where we will be having the details of all hospitals in the locality, diagnosis, quality.
Evaluation: Here we cannot accurately classify whether a transaction is default or not, because of the challenges faced while collecting the data related to the hospitals.
We can use data mining algorithms such as decision trees and naïve bayes classification, for classification whether a claim is default or not based on the
- Deviation from inquired cost and the average cost in that hospital, in that locality and also for the same hospital in other localities.
- Comparing the ratings to that service/quality from the patient to the other patients
- Services utilized and the doctor’s availability to the data of service.
Here we also have to consider the deviations which are small, because that is the case where most of the transactions occurring, because they will be including the charges with caution like cost inquired to them from medicines, consultation, etc…
We also be using the regression analysis to check on whether the average cost for that diagnosis is increasing or not. Because may be the hospital has upgraded their service based on the ratings or feedback and modified the diagnosis steps which also has an effect on the cost.
Even though we have used the analytics, my perspective for the success of classification for this particular opportunity will be the domain knowledge and the manual analysis, like comparing the date of service and availability of the doctor’s.
Integration: We can integrate the model built to the insurance claims functionality. Whenever the customer submits a claim, we have to design the application is such a way that it collects all the details necessary for our analysis. Then we have to clean the data for any missing values and based on the classification the finance section of the company will take appropriate decision