This is a data analysis scenario given that you have lot of data on hospitals across the United States and you want to perform a simple analysis to easily determine which hospital has lower deaths in a given state for a given condition.

The data originally used for the analysis was from https://www.medicare.gov/ but l just want to simplify this to get the main point across.

Assume you have hospital data that gives you information on heart attack death, heart failure death and pneumonia death from all hospitals across the united states identified by state and the hospital name. Given this information and also that not all samples are complete so your analysis should consider hospitals that have the complete data given in your file. Assume this file is a .csv file called outcome-of-care-measures.csv. You have downloaded this file and all that is requested is for you to create an R function which the user provides the state and the condition they are interested and return the hospital that has the lowest death for the condition.

For example, you are given state of Texas and condition is heart attack, your function should simply return the hospital that has the lowest number of deaths due to heart attack.

Below are columns in the file and our analysis is going to look at only 3 columns 11, 17 and 23

[1] “Provider.Number”
[2] “Hospital.Name”
[3] “Address.1”
[4] “Address.2”
[5] “Address.3”
[6] “City”
[7] “State”
[8] “ZIP.Code”
[9] “County.Name”
[10] “Phone.Number”
[11] “Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”
[12] “Comparison.to.U.S..Rate…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”
[13] “Lower.Mortality.Estimate…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”
[14] “Upper.Mortality.Estimate…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”
[15] “Number.of.Patients…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”
[16] “Footnote…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”
[17] “Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure”
[18] “Comparison.to.U.S..Rate…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure”
[19] “Lower.Mortality.Estimate…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure”
[20] “Upper.Mortality.Estimate…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure”
[21] “Number.of.Patients…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure”
[22] “Footnote…Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure”
[23] “Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”
[24] “Comparison.to.U.S..Rate…Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”
[25] “Lower.Mortality.Estimate…Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”
[26] “Upper.Mortality.Estimate…Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”
[27] “Number.of.Patients…Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”
[28] “Footnote…Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”
[29] “Hospital.30.Day.Readmission.Rates.from.Heart.Attack”
[30] “Comparison.to.U.S..Rate…Hospital.30.Day.Readmission.Rates.from.Heart.Attack”
[31] “Lower.Readmission.Estimate…Hospital.30.Day.Readmission.Rates.from.Heart.Attack”
[32] “Upper.Readmission.Estimate…Hospital.30.Day.Readmission.Rates.from.Heart.Attack”
[33] “Number.of.Patients…Hospital.30.Day.Readmission.Rates.from.Heart.Attack”
[34] “Footnote…Hospital.30.Day.Readmission.Rates.from.Heart.Attack”
[35] “Hospital.30.Day.Readmission.Rates.from.Heart.Failure”
[36] “Comparison.to.U.S..Rate…Hospital.30.Day.Readmission.Rates.from.Heart.Failure”
[37] “Lower.Readmission.Estimate…Hospital.30.Day.Readmission.Rates.from.Heart.Failure”
[38] “Upper.Readmission.Estimate…Hospital.30.Day.Readmission.Rates.from.Heart.Failure”
[39] “Number.of.Patients…Hospital.30.Day.Readmission.Rates.from.Heart.Failure”
[40] “Footnote…Hospital.30.Day.Readmission.Rates.from.Heart.Failure”
[41] “Hospital.30.Day.Readmission.Rates.from.Pneumonia”
[42] “Comparison.to.U.S..Rate…Hospital.30.Day.Readmission.Rates.from.Pneumonia”
[43] “Lower.Readmission.Estimate…Hospital.30.Day.Readmission.Rates.from.Pneumonia”
[44] “Upper.Readmission.Estimate…Hospital.30.Day.Readmission.Rates.from.Pneumonia”
[45] “Number.of.Patients…Hospital.30.Day.Readmission.Rates.from.Pneumonia”
[46] “Footnote…Hospital.30.Day.Readmission.Rates.from.Pneumonia”

The data is huge so we are trying to write simple optimized code to perform this task. The  conditions we are trying to analyze in the file are “Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack”,

“Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure” and

“Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia”

Lets call our function best which will try to determine which hospital has lowest death given a state and condition.

best<- function(state,outcome){
    
    # outcome is the condition interested in finding and state is the state code
    # we want to identify the conditions in the file with easier names so its easy 
    # for user to input the conditions by using the outcomes vector
    outcomes <- c("heart attack" = "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack",
                  "heart failure" = "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure",
                  "pneumonia" = "Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia")
    if(outcome %in% names(outcomes)){
        
        # Function runs for our predefined outcomes so code works only if you enter 
        # heart attack, heart failure or pneumonia eles you get invalid outcome 
        
        filedata <-read.csv("outcome-of-care-measures.csv", colClasses = "character")
        
        # file read into filedata with assumption the file is at same location as your R working directory.
        
        if (state %in% filedata[["State"]]){
            
            #Here we check to see that the user input the correct state code example TX for texas as it is in the file
            # because this is important for our analysis to work for the state
            
            state_vect <- filedata["State"] == state
            # state_vect contains only given state appearing at their locations in filedata
            
            outcome <- outcomes[[outcome]]
            # outcome is now assigned the corresponding  condition description as is in the file
            
            workdata <- filedata[state_vect, c("Hospital.Name", "State",outcome)]
            # workdata is a subset of original data based on our state_vect
            
            workdata[,outcome] <- as.numeric(workdata[,outcome])
            # Data cleaning to make sure we have numeric values in the outcome column.
            # if that data is not numeric we get NA and do not consider that sample for  our analysis
            
            good <- !is.na(workdata[outcome])
            # good is a vector of of trues where given condition does not have value of NA
            
            workdata <- workdata[good,]
            # We re-subset workdata to get only the good data for analysis without NAs
            
            sortdata <- workdata[order(workdata[outcome],workdata["Hospital.Name"]),]
            # Sortdata is workdata ordered in ascending order based on the outcome, followed by the hospital name.
            # This means the hospital with the smallest outcome will come first and if there is a tie,
            #we break it by alphabetical ordering of the hospital name.
            
            sortdata[1,"Hospital.Name"]
            # Finally, we output the hospital name at the row 1 as the hospital that has lowest death count.
            
        }
        else{
            stop('Invalid State')
            # if user input the wrong state, we output Invalid Sate and stop the execution
        }
    }
    else{
        stop('Invalid Outcome')
        #if user input wrong condition, we output invalid output and stop execution.
    }
}

Below is a snapshot of the code run

 

Thanks for reading and comments and suggestions welcome.

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *