Reflection Paper
If you are looking for affordable, custom-written, high-quality, and non-plagiarized papers, your student life just became easier with us. We are the ideal place for all your writing needs.
Order a Similar Paper
Order a Different Paper
The PowerPoint slides for all the modules that were learned are attached.
- Based on your experience and expectations, how would you characterize what you have learned in this course? What were your significant learning moments of the different modules?
- What suggestions do you have to improve what was covered during the modules? Are there any gaps?
- What “ah-ha moments” have you experienced while working on the module materials?
- What have you learned about the concepts of data analysis in healthcare?
- How will you think differently about reported healthcare data since taking this course?
- To what extent have you been successful in achieving the learning outcomes listed in the modules? What is still unclear? What do you still need to work on?
- Compose a two-page paper in MS Word. Cite ALL sources according to APA format.
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 1, Introduction to Data Analysis
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Understand types of data analysis
Review types of data
Explore the skills required for a career in healthcare data analytics
© 2019 AHIMA
ahima.org
Data Analysis
Healthcare is a data driven business
Data collected
Diagnostic tests
Services provided
Costs and payment
Diagnosis and procedure codes
What is data analysis?
Task of transforming, summarizing or modeling data to allow the end user to make meaningful conclusions
© 2019 AHIMA
ahima.org
Data
Data
Data
Information
Primary vs Secondary Analysis
Primary data analysis is the use of data for its primary purpose
Example: Billing and claims data’s primary use it to determine services rendered and payment to from a patient or third-party payer
Performing an analysis of typical payment received from a payer for emergency visits is a primary use
Secondary data analysis is the use of data beyond its primary purpose
Example: ICD-10 diagnosis codes are assigned to a patient to record diseases present or discovered during an encounter
Using a profile of the most common ICD-10 diagnosis categories for the purposes of determining the patient load by service line is a secondary use.
Be aware of primary use and evaluate the secondary use is valid and reliable
© 2019 AHIMA
ahima.org
Types of Statistical Analysis
Descriptive statistics
Characterizes the distribution of the data
Estimates the center or ‘typical’ value
Measures the spread or variation in the data
Inferential statistics
Using sample data to make conclusions or decisions regarding a population
Not practical to observe the entire population
Often accompanied with a probability of making an incorrect decision based on the sample
© 2019 AHIMA
ahima.org
Structured vs Unstructured Data
Structured data
AKA Discrete data
Data stored in fields that may be delineated
Values can be listed and validated
Examples
Patient age
CPT code
Laboratory test values
Unstructured data
Free form text captured in narrative form
May be stored in a database field, but the content in not limited to values of a variable
Examples
Progress notes in an EHR
Comments in a patient satisfaction survey
Radiologist’s report of an x-ray result
© 2019 AHIMA
ahima.org
Qualitative Data
Qualitative data
Describes observations about a subject
Typically free text or comments
May be recoded or placed into categories for analysis
Example:
A nurse describes a patient as having pale skin tone.
Survey question: What do you like most about this course?
Data scales typically used for recoding qualitative data:
Nominal – categories without a natural order
Diagnosis codes
Clinical units
Colors
Ordinal – categories with a natural order
Patient satisfaction surveys
Patient severity scores
Evaluation and management code levels
© 2019 AHIMA
ahima.org
Quantitative Data
Quantitative data
Naturally numeric
May be categorical (ordinal or nominal)
Data scales found in quantitative analysis
Interval – numeric values where the distance between two values has meaning, but there is no true zero and the interpretation is not preserved when multiplying/dividing
Temperature
Dates
Ratio – numeric values where zero has meaning and multiplying/dividing values has meaning
Length of stay
Age
Weight
© 2019 AHIMA
ahima.org
Variable Scales/Data Type
© 2019 AHIMA
ahima.org
Overview of Data Type and Statistics
© 2019 AHIMA
ahima.org
Inferential Statistics – CMS
© 2019 AHIMA
ahima.org
Exploratory Data Analysis and Data Mining
Exploratory Data Analysis (EDA)
Used to uncover patterns in data
Typically a secondary use of data
Primarily graphical analysis (plots, trends, etc.)
Data Mining
Also looking for patterns in data
Adds in descriptive statistics and more formal statistical techniques
May be used for benchmarking and determining high/lower performers
© 2019 AHIMA
ahima.org
Predictive Modeling
Historical data is used to build models to determine most likely outcome in future
Data mining is used to identify the potentially best predictors
Maybe a simple function (linear regression) or more involved models (neural networks)
Examples
Used by CMS for pre-payment reviews to fight fraud
Used by credit card companies to prevent fraud
Used by providers to identify missed charges
© 2019 AHIMA
ahima.org
Data Analyst Skills
Must be able to combine:
Content knowledge (coded data, healthcare business process, etc.)
Understanding of the strengths and weaknesses of various data elements
Data acquisition skills through querying databases or effectively writing specifications for queries
Ability to identify the appropriate statistical technique to apply
Familiarity with analytic software to produce the required output
Present the analysis to the end user so that it may be the basis for business decisions
© 2019 AHIMA
ahima.org
Opportunities for HIM Professionals
HIM Professionals are uniquely positioned to:
Understand data structures and coding systems
Understand available data and methods for integration
Can communicate with both finance and IT staff
Act as a business analyst—far more valuable than a pure data analyst
© 2019 AHIMA
ahima.org
Entry Level Health Data Analyst Responsibilities
Working with data
Identify, analyze, and interpret trends or patterns in complex data sets
In collaboration with others, interpret data and develop recommendations on the basis of findings
Perform basic statistical analyses for projects and reports
Reporting Results
Develop graphs, reports, and presentations of project results, trends, data mining
Create and present quality dashboards
Generate routine and ad hoc reports
© 2019 AHIMA
ahima.org
Mid-level Health Data Analyst Responsibilities
Work collaboratively
with data and reporting
the database administrator to help produce effective production management
utilization management reports in support of performance management related to utilization, cost, and risk with the various health plan data
monitor data integrity and quality of reports on a monthly basis
in monitoring financial performance in each health plan
Develop and maintain
claims audit reporting and processes
contract models in support of contract negotiations with health plans
Develop, implement, and enhance evaluation and measurement models for the quality, data and reporting, and data warehouse department programs, projects, and initiatives for maximum effectiveness
Act as a business analyst
Recommend improvements to processes, programs, and initiatives by using analytical skills and a variety of reporting tools
Determine the most appropriate approach for internal and external report design, production, and distribution, specific to the relevant audience
© 2019 AHIMA
ahima.org
Senior-level Health Data Analyst Responsibilities
Understand and address the information needs of governance, leadership, and staff to support continuous improvement of patient care processes and outcomes
Lead and manage efforts to enhance the strategic use of data and analytic tools to improve clinical care processes and outcomes continuously
Work to ensure the dissemination of accurate, reliable, timely, accessible, actionable information (data analysis) to help leaders and staff actively identify and address opportunities to improve patient care and related processes
Work actively with information technology to select and develop tools to enable facility governance and leadership to monitor the progress of quality, patient safety, service, and related metrics continuously throughout the system
© 2019 AHIMA
ahima.org
Senior-level Health Data Analyst Responsibilities
Engage and collaborate with information technology and senior leadership to create and maintain:
a succinct report (e.g., dashboard),
a balanced set of system assessment measures, that conveys status and direction of key system-wide quality and patient safety initiatives for the trustee quality and safety committee and senior management;
present this information regularly to the quality and safety committee of the board to ensure understanding of information contained therein
Actively support the efforts of divisions, departments, programs, and clinical units to identify, obtain, and actively use quantitative information needed to support clinical quality monitoring and improvement activities
Function as an advisor and technical resource regarding the use of data in clinical quality improvement activities
Lead analysis of outcomes and resource utilization for specific patient populations as necessary
Lead efforts to implement state-of-the-art quality improvement analytical tools (i.e., statistical process control)
Play an active role, including leadership, where appropriate, on teams addressing system-wide clinical quality improvement opportunities
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image7.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 2, Data in Healthcare
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast reliability and validity
Categorize types of healthcare data
Connect the health care data flow to the data types and uses
Illustrate commonly used sources of external data
© 2019 AHIMA
ahima.org
Data Quality
Validity
Accuracy of the data
Ability of the data to measure the attribute it is intended to measure
Reliability
Repeatability or reproducibility of the results
© 2019 AHIMA
ahima.org
Types of Validity
Face validity
Does the metric appear to measure the quantity it was intended to measure?
Often assessed via expert opinion
Weakest form of validity measure, but should be the first step is assessing validity of a new test or metric
Content validity
Are the components of the metric necessary and sufficient to measure the quantity?
In survey design, this content validity ensures that there are not irrelevant questions
Construct validity
Is the measurement tool capturing the construct to be measured?
In survey design, this may be measured by asking similar questions about a topic (or construct) to ensure consistency in the responses
Criterion validity
Does the metric agree with an accepted gold standard for measuring the same quantity?
A new less expensive laboratory test may be compared against another accepted test for measuring the same quantity. If the test results agree, then the new test has criterion validity
© 2019 AHIMA
ahima.org
Types of Reliability
Inter-rater reliability – measures the reproducibility or consistency of the metric between two different raters
Intra-rater reliability – measures the reproducibility or consistency of the metric between two different time points using the same rater
Statistics to measure reliability
Kappa statistic or Cohen’s Kappa
Measures inter or intra rater reliability
0.41 to 0.60 – moderate
0.61 to 0.80 – substantial
0.81 to 1.00 – almost perfect
Cronbach’s Alpha
Measures internal consistency between questions
Acceptable level >= 0.70
© 2019 AHIMA
ahima.org
Types of Healthcare Data
Internal data
Electronic health records
Claims and billing data
Patient satisfaction surveys
External data
Registries (may be both internal/external)
Statewide databases
Medicare claims data
© 2019 AHIMA
ahima.org
Diagnostic Data
Transitioned to ICD-10-CM on 10/1/2015
Even after transition, both coding systems will be utilized for data profiling and analysis
ICD was designed as a disease tracking system, but used in the US as a payment driver under prospective payment systems
© 2019 AHIMA
ahima.org
Diagnostic Data – IPPS
CMS pays for inpatient services provided to Medicare patients via an inpatient prospective payment system (IPPS)
Payment is based on diagnosis related groups (DRG) – ICD-10 diagnosis and procedure codes are combined with other demographic data to ‘group’ patients in the DRGs for determination of payment
DRGs are further grouped into MDCs
ICD-10 and DRG codes are all updated based on the federal fiscal year starting on October 1.
© 2019 AHIMA
ahima.org
Diagnostic Data
© 2019 AHIMA
ahima.org
Procedural Data – ICD-10-PCS
© 2019 AHIMA
ahima.org
Procedural Data – CPT
© 2019 AHIMA
ahima.org
Pharmacy Data
National Drug Codes (NDC)
FDA website
http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm
Therapeutic Classification Groups
OVID Field Guide
http://resourcecenter.ovid.com/site/products/fieldguide/ipab/List_of_AHFS_Pharmacologic-.jsp
RxNorm
National Library of Medicine
© 2019 AHIMA
ahima.org
Administrative Data
Revenue Codes
Place of Service Codes
Claims Processing Codes
Relative Value Unit Data
© 2019 AHIMA
ahima.org
Revenue Codes
Four digit code
Used to categorize charges into ‘departments’ on UB-04 or 837I billing records
NOT necessarily the same department found in provider accounting system
Standard across providers
Allows comparison of departmental charges and costs across providers
http://www.resdac.org/sites/resdac.org/files/Revenue%20Center%20Table.txt
© 2019 AHIMA
ahima.org
Place of Service Codes
Used on professional claims (HCFA-1500 or 837P) to specify the type of location that the service was performed
© 2019 AHIMA
ahima.org
Healthcare Data Flow
© 2019 AHIMA
ahima.org
16
Claims Data
UB-04 Claim form (CMS-1450)
Hospital services
Submitted via 837I transaction set
5010 format
CMS-1500 Claim Form
Physician services
Submitted via 837P transaction set
5010 format
© 2019 AHIMA
ahima.org
Departmental Databases
Laboratory Information System (LIS)
May use Logical Observational Identifiers Names and Codes (LOINC)
Radiology Information System (RIS)
Images available through Picture Archiving and Communication System (PACS)
Patient Accounts Database
Includes financial data
Charges
Payments
Accounts receivable/accounts payable
Payroll
General ledger
May be called a practice management system in a physician office
© 2019 AHIMA
ahima.org
Other Internal Data
Registries
Cancer
Trauma
Birth
Diabetes
Implants
Transplants
Immunizations
© 2019 AHIMA
ahima.org
External Data
Medicare
Inpatient
Outpatient
Part B Utilization (Physician)
State Databases
HCUP
© 2019 AHIMA
ahima.org
Medicare Claims Data
MedPAR File
All Medicare inpatient claims for a given federal fiscal year (10/1 – 9/30)
Data source for many of the labs accompany text
One record for each inpatient stay
Used as the basis for IPPS DRG relative weight changes
Standard Analytic Outpatient File
All Medicare outpatient claims for a given calendar year
Multiple files that must be combined to summarize at the claim level
An extract of this file (HOPPS) is the basis for changes to OPPS APC relative weights
Part B Utilization File
Summary file by calendar year
Includes information by specialty and for top HCPCS codes:
Allowed services (volume)
Allowed charges
Payment amount
© 2019 AHIMA
ahima.org
CMS Payment Rule Impact Files
Released annually
Inpatient prospective payment (IPPS)
Outpatient prospective payment (OPPS)
Includes data elements that may be used for benchmarking
Hospital Demographics
Urban/rural setting
Region
Ownership
Teaching/non-teaching status
Number of beds
Operational Statistics
Volume
Average daily census
Payment adjustment factors
Ratio of cost to charge for cost estimation
Case mix index
Medicare percentage
Value based purchasing performance
Payment level (current and projected)
© 2019 AHIMA
ahima.org
Data.medicare.gov
Central repository for Medicare ‘compare’ databases
© 2019 AHIMA
ahima.org
23
State Databases
Utah
Office of Healthcare Statistics
Hospital utilization
Ambulatory surgery center utilization
Query tools to locate specific data
Massachusetts
Massachusetts Community Health Information Profile (MassCHIP)
Standard reports – ‘instant topics’
Downloadable query software for producing custom reports
© 2019 AHIMA
ahima.org
HCUP
http://hcupnet.ahrq.gov/
Data elements
Statistics on Hospital Stays
Readmission Rates
Emergency Department Use
AHRQ Quality Indicators
Online query system
© 2019 AHIMA
ahima.org
HCUP Sample Query
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image7.png
image8.png
image9.png
image10.png
image11.png
image12.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 3, Tools for Data Organization, Analysis, and Presentation
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast database structures
Categorize types of statistical software
Illustrate commonly used data visualization methods
© 2019 AHIMA
ahima.org
Data Organization Using Databases
Healthcare data is complex and often multi-dimensional
Provider
Patients
Insurance companies
Services
Providing an organizational structure for the data can facilitate more efficient analysis and reporting
Database – self-describing collection of integrated records.
Self-describing – contains a description of its own structure
Integrated – data elements are related to each other
© 2019 AHIMA
ahima.org
Database Vocabulary
Tables- two dimensional arrays of data
Rows = records
Columns = variables or attributes
RDMS – Relational Database Management System
Software that is used to hold and maintain data tables and their relationships
SQL – Structured Query Language
Programming language used to communicate with a relational database
ERD – Entity Relationship Diagram
Diagram that shows how tables in an RDMS relate
© 2019 AHIMA
ahima.org
Hierarchy of a Relational Database
Tables are rows and columns of values
Envision a tab in a spreadsheet
Fields are the columns in a spreadsheet
In a patient database, fields may be age, gender, admission date, etc.
Data elements or records are the rows in a spreadsheet
In a patient database, row may represent patients or services provided to patients
A unique row identifier in a table is called the primary key
Cannot be duplicated within the same table
Used to link tables together
© 2019 AHIMA
ahima.org
Data Dictionary
Details roadmap of the database
Should include
Name of computer or software program that contains the data element
Type of data in the field
Length of data in the field
Edits placed on the data field
Values allowed to be placed in the data field
A clear definition of each value
© 2019 AHIMA
ahima.org
Structured Query Language
SQL
Tool to use and maintain databases
Select data
Update data
Insert rows into a table
Delete rows from a table
© 2019 AHIMA
ahima.org
SQL Example
Retrieve the records for all patients from Milwaukee
SELECT PATIENT_LNAME, PATIENT_FNAME FROM PATIENT WHERE PATIENT_CITY = ‘Milwaukee’
Key words in the query are in red font
© 2019 AHIMA
ahima.org
Statistical Software Packages
R
Command line with a menu driven add in (R Commander)
Open source with a large user base on-line
May open Excel files for analysis
Statistical Analysis System (SAS)
Command line program
Excellent for manipulating large datasets
SPSS
Menu driven statistical software
Most common in academic settings
Microsoft Excel
Spreadsheet software with extensive statistical function
Excellent for summarizing data quickly
Commonly found in business setting
© 2019 AHIMA
ahima.org
R
Open source program that may be installed on both windows and Mac based computers
Used to demonstrate examples throughout the text
Many on-line tutorials and video demonstrations of the capabilities
Open source allows the users to expand the functionality of the software
Free to use
© 2019 AHIMA
ahima.org
SAS Syntax
SAS is a programming language much like SQL
Key words:
Data – used to name and create a dataset
Proc – declare which analytic procedure will be used
Set – declare which dataset will be the subject of the analysis
Run – designates the end of the command and starts the calculation
Syntax: always end commands with a ‘;’
© 2019 AHIMA
ahima.org
Graphical Displays of Data
Types of graphical comparisons
Group summary
Trends or changes over time
Relative size of groups
Relationships between variables
© 2019 AHIMA
ahima.org
Bar Graph or Chart
Group summary
Comparison of counts or averages across groups
Two variables: admissions, age category.
One bar for each gender
© 2019 AHIMA
ahima.org
Line Graphs or Chart
Trends or changes over time
Look for trends/patterns
Should not be used for connecting unrelated points
© 2019 AHIMA
ahima.org
Pie Chart
Compares relative size of groups
Used to represent relative proportions of a total
Note that this is different than a bar chart – in a pie chart categories must be part of a bigger set or population
© 2019 AHIMA
ahima.org
Scatter Diagrams
Used to display the relationship between two continuous variables
Should not be used if either variable is categorical
© 2019 AHIMA
ahima.org
Infographics
Conveys a message or story using a combination of graphs and text
Primary types:
Cause and effect
Chronological
Quantitative
Directional
Product
© 2019 AHIMA
ahima.org
Tables versus Graphs
Tables have several advantages over graphs such as:
Present more information than a graph
Display the exact values
Require less work to create
Graphs also have advantages over tables such as:
Catch the attention of the reader
Show trends easily
Bring out facts or relationships that stimulate thinking
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image7.png
image8.png
image9.png
image10.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 4, Analyzing Categorical Variables
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast rates and proportions commonly used in healthcare
Relate the rates and proportions to the appropriate statistical methods
Illustrate commonly used descriptive and inferential statistics
© 2019 AHIMA
ahima.org
Categorical Variables
Data elements that represent categories
Nominal (no natural order)
Ordinal (ordered)
Healthcare Examples
Gender
Discharge status
Dead/Alive
© 2019 AHIMA
ahima.org
Rates and Proportions
Commonly used in healthcare
Mortality rates
Infection rates
Complication rates
Readmission rates
Must understand the numerator and denominator of each rate
Numerator – count of subjects that meet the criteria to be measured
Denominator – count of subjects that could meet the criteria to be measured
Example – 30 day readmission rate for COPD
Numerator – patients discharged with a principal diagnosis of COPD and readmitted within 30 days
Denominator – patients discharged with a principal diagnosis of COPD
Be aware of any exclusion criteria
Adults only?
Gender specific rates
Immune-compromised patients
© 2019 AHIMA
ahima.org
Census Statistics
Inpatient census
Number of patients in the facility at a set point in time
Typically measured at midnight
Daily inpatient census
Number of patients in the facility at a set point in time plus any patients that were both admitted and discharged during that day
Resources are expended to treat patients that are admitted and discharged on the same day
May be a more relevant statistic that inpatient census for monitoring resource consumption
Inpatient service day
Inpatient service day for a particular day is equal to the daily inpatient census that that day
Average daily inpatient census
The number of inpatient service days averaged over a set time period.
Formula:
© 2019 AHIMA
ahima.org
Example 1
The hospital inpatient census at midnight on January 15th is 102. Fifteen patients are admitted, three patients are discharged and one patient is admitted and subsequently dies on January 16th.
What is the inpatient census for January 16th?
102 + 15 – 3 = 114
How many inpatient service days were provided on January 16th?
102 + 15 – 3 + 1 = 115
(Note that the one patient admitted and discharged on the same day is included in the inpatient service days, but not present for the January 16th inpatient census count.)
© 2019 AHIMA
ahima.org
Example 2
If the number of inpatient service days for the first calendar quarter of 2015 was 9,015, what was the average daily inpatient census for the quarter (round to the nearest 0.1 of a day)?
Step 1: Find number of days in period. First calendar quarter is January (31 days), February (28 days), March (31 days). Number of days = 31 + 28 + 31 = 90
Step 2: Divide the [inpatient service days] by the [number of days] in the period. 9015/90 = 100.2 days
© 2019 AHIMA
ahima.org
Utilization Rates
Cesarean section rate:
Number of c-sections performed divided by number of deliveries
Note the denominator is the number of deliveries (not mothers) and includes both c-section and vaginal births
Inpatient occupancy rate
Inpatient service days divided by the number of bed days in the period
The number of bed days is the number of beds available for each day in the period measured
If the number of beds changes during the period, then that change must be reflected in the number of bed days.
Mortality Rates
Gross mortality rate – number of patients that died divided by the number of patients discharged during the period
Net mortality rate: number of patients that died at least 48 hours after admission divided by the number of patients discharges during the period
The net rate excludes patients that died within 48 hours of admission from the numerator.
Autopsy Rates
Gross autopsy rate – number of autopsies performed divided by the number of patients that died while in the hospital during a period
Net autopsy rate – number of autopsies performed divided by the number of bodies available for autopsy
The net rate excludes bodies that might be taken to the coroner for investigations from the denominator.
© 2019 AHIMA
ahima.org
Example 3
If the number of inpatient service days for the first calendar quarter of 2015 was 9,015 and the facility had 120 beds available until closing 20 beds on March 1st, what was the inpatient bed occupancy rate for the quarter (round to the nearest 0.1)?
Step 1: Find the number of bed days.
Step 2: Divide inpatient service days by number of bed days:
9015/10180 = 0.886 or 88.6%
Month | Beds Available | Days | Bed Days |
January | 120 | 31 | 3,720 |
February | 120 | 28 | 3,360 |
March | 100 | 31 | 3,100 |
Total | 10,180 |
© 2019 AHIMA
ahima.org
Example 4
AMC Hospital discharged 256 patients during November. Ten of those patients died and 2 died on the same day as they were admitted. What are the gross and net mortality rates for AMC Hospital (round to the nearest 0.1 of a percent)?
Gross mortality rate
= 12/256 = 0.047 = 4.7%
Net mortality rate
= (12 – 2)/256 = 0.039 = 3.9%
© 2019 AHIMA
ahima.org
Example 5
AMC Hospital discharged 256 patients during November. Ten of those patients died and 2 died on the same day as they were admitted. Four bodies were autopsied. One body was transferred to the coroner’s office for a criminal investigation. What are the gross and net autopsy rates for AMC Hospital (round to the nearest 0.1 of a percent)?
Gross autopsy rate
= 4/12 = 0.333 = 33.3%
Net autopsy rate
= 4/(12-1) = 4/11 = 0.364 = 36.4%
© 2019 AHIMA
ahima.org
Census/Utilization Rates
© 2019 AHIMA
ahima.org
12
Population Health and Epidemiology Rates
Epidemiology – study of patterns in disease occurrence and spread
Incidence rate –
Number of new cases of a disease divided by the population at risk for acquiring the disease
Prevalence rate –
Number of cases of the disease (both new and existing) divided by the population at risk for acquiring the disease
Point prevalence – prevalence of a disease at a particular point in time
Period prevalence – prevalence of a disease during a time period (month, year, etc.)
© 2019 AHIMA
ahima.org
Example 6
The Department of Health in Center City is interesting in determining the effectiveness of the flu vaccine. They determined that there were 100 new flu cases during the month of January. The population of Center City was 15,000 during that month. What is the incidence rate of flu for Center City in January?
Incidence rate = 6.7 per 1,000
© 2019 AHIMA
ahima.org
Example 7
The officials in Center City wanted to further study the impact of the flu on the population. There were 54 residents with the flu on January 31st and 10 residents with the flu on January 1st. Use this data and the fact that there were 100 new cases during the month of January to determine the point prevalence for January 31st and the period prevalence for the month of January.
(Note that period prevalence includes anyone with the disease during the period. Since 10 residents were sick with he flu on Jan 1st, they are included in the numerator of the period prevalence for January).
© 2019 AHIMA
ahima.org
Descriptive Statistics: Proportions
Each subject either has or does not have the attribute to be counted (dead/alive, success/failure, yes/no)
Recode each observation as a binary variable (two values):
If attribute is present = 1
If attribute is not present = 0
The mean of the 0s and 1s is the proportion of subjects with the attribute
Simple example:
What proportion of patients are female?
Patient genders: M, M, F, F, M
Recode F = 1; M = 0
Recoded gender data: 0, 0, 1, 1, 0
Mean = 2/5 = 0.4 or 40% of patients are female
© 2019 AHIMA
ahima.org
Descriptive Statistics
Frequency distribution
Appropriate for both nominal and ordinal categorical data
Typically the counts and percentages for each category are presented
© 2019 AHIMA
ahima.org
Charts or Graphs
Since this subset of CPT codes is ordinal, the bar chart is a better representation.
Pie charts are a good choice for nominal data.
© 2019 AHIMA
ahima.org
Contingency Tables
Used to display and analyze the relationship between two categorical variables
Notice in table below:
20/32 = 62.5% of female patients were discharged home
10/24 = 41.7% of male patients were discharged home
Is this just a random occurrence or is this evidence that there is a significant relationship between gender and being discharged to home?
A hypothesis test may be used to answer that question
© 2019 AHIMA
ahima.org
Ranks and Percentiles
Ranks and percentile may be used to describe ordinal data
Ranks – the position of a value after the sample is ordered using order of magnitude – usually ascending (increasing) order
Percentile
AKA percentile rank
Points that divide the sample into 100 equal parts
Important percentile ranks:
25th percentile
AKA first quartile
25% of the values in the sample are less than the 25th percentile
50th percentile
AKA median or second quartile
50% of the values in the sample are less than the 50th percentile (50% are also greater)
75th percentile
AKA 3rd quartile
75% of the values in the sample are less than the 75th percentile
© 2019 AHIMA
ahima.org
Inferential Statistics
Hypothesis Testing Basics
Hypothesis test – statistical technique used to determine if the evidence (values) present in a random sample is strong enough to make a conclusion about the population
Null hypothesis (Ho) – status quo, requires no action
Example: Ho for Table 4.3 is that there is no relationship between gender and discharge to home
Alternative hypothesis (H1 or Ha) – complement of the null hypothesis, often referred to as the research hypothesis
Example: H1 for Table 4.3 is that there is a relationship between gender and discharge to home
Data is gathered from a random sample of a population to determine if the null hypotheses can be rejected
© 2019 AHIMA
ahima.org
Hypothesis Testing Basics
Test statistic –
A statistic that is calculated to determine if the data values support
Must be compared to a known probability distribution to determine the making an error in deciding whether or not to reject the null hypothesis.
Type I error – incorrectly rejecting the null hypothesis when it is true
The alpha level or acceptable level of this error is set by the analyst prior to the start of the analysis of the data
The p-value is the smallest alpha level for which the null hypothesis would be rejected
If the p-value is smaller than the pre-set alpha level, then there is sufficient evidence to reject the null hypothesis
Type II error – incorrectly NOT rejecting the null hypothesis when it is false
this error may be controlled by the type of hypothesis test and the sample size used in the study
© 2019 AHIMA
ahima.org
Hypothesis Testing Steps
Determine the null and alternative hypotheses
Set the acceptable type I error or alpha level
Select the appropriate test statistic
Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic
Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.
© 2019 AHIMA
ahima.org
Inferential Statistics:
Proportions
Used to determine if a population proportion is higher or lower than a standard
May be interested in a one or two sided hypothesis
Two sided alternative: important to know if population of interest is higher or lower than standard
One sided alternative: only concerned about higher or lower, not both
Two sided
One sided
© 2019 AHIMA
ahima.org
One-sample Z-test for proportions
Reject Ho
Reject Ho
Reject null hypothesis when Z is extreme
© 2019 AHIMA
ahima.org
Example: One-sample Z-test for proportions
Follow 5 basic steps in hypothesis testing:
1. Determine the null and alternative hypotheses:
Null hypothesis (Ho): p = 85 percent or 0.85
Alternative hypothesis (Ha): p ≠ 0.85
2. Set the acceptable type I error or alpha level
The company leaders are willing to accept a five percent error rate. Alpha = 0.05
3. Select the appropriate test statistic
Z is the appropriate test statistic
© 2019 AHIMA
ahima.org
Example: One-sample Z-test for proportions (continued)
4. Compare the test statistic to a critical value based on alpha and the distribution of the test statistic
5. Reject the null hypothesis if the calculated test statistic is more extreme than the critical value. If not, then do not reject the null hypothesis
Since Ha is a two sided alternative (≠) we select the critical value associated with the alpha level divided by two. We want to protect against both higher and lower alternatives. We reject Ho if Z > 1.960 or Z < −1.960. Z = −1.36 is not less than −1.960, therefore do not reject the null hypothesis
Conclusion: The observed 70% rate from the sample of 10 employees is not sufficient evidence to reject the null hypothesis
© 2019 AHIMA
ahima.org
Example: One-sample Z-test for proportions (continued)
© 2019 AHIMA
ahima.org
Confidence Interval for Proportions
Confidence interval: a range of values based on a sample that contains a population value with a set level of confidence.
Common in political surveys
President’s approval rating is 60% +/- 5%
AKA margin of error (+/-5%)
The width of a confidence interval is a function of the proportion value and the sample size
Widest confidence interval (large margin of error) when p = 50%
Larger sample side results in a narrower confidence interval for a proportion
1.96 for 95% confidence interval
© 2019 AHIMA
ahima.org
Example: Confidence Interval for Proportions
Conclusion: Based on the sample, we are 95% sure that the range 42% and 98% covers the true vaccination rate.
© 2019 AHIMA
ahima.org
Two-sample Z-test for Proportions
Used to determine if the proportion for a particular attribute is higher or lower when comparing two populations
Is the mortality rate as Hospital A higher or lower than that in Hospital B?
May be a one-sided or two-sided test depending the desire to determine which population is higher or lower
© 2019 AHIMA
ahima.org
Two-sample Z-test for Proportions
If Z > standard normal critical value at α/2, or Z < -(standard normal critical value at α/2), then reject Ho
Difference in sample proportions divided by standard error
Overall proportion when two samples are pooled together
© 2019 AHIMA
ahima.org
Example: Two-sample Z-test for Proportions
Is the mortality rate for MS-DRG 292 (p1) different from that in MS-DRG 293 (p2)?
1. Determine the hypotheses:
Ho: p1 = p2 ; Ha: p1 ≠ p2
2. Set the alpha level = 0.05
3. Select the appropriate test statistic: Z-test
© 2019 AHIMA
ahima.org
Example: Two-sample Z-test for Proportions
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image9.png
image7.png
image8.png
image10.png
image11.png
image12.png
image13.png
image14.png
image15.png
image16.png
image17.png
image18.png
image19.png
image20.png
image21.png
image22.png
image23.png
image24.png
image25.png
image26.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 5, Analyzing Continuous Variables
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast commonly used measures of central tendency
Compare and contrast commonly used measures of variation or spread of values
Illustrate appropriate inferential statistics to use for continuous data
© 2019 AHIMA
ahima.org
Continuous Variables
Data elements that represent naturally numeric values that can take infinite values
Interval (no true zero)
Ratio
Healthcare Examples
Length of stay
Charge
Systolic blood pressure
Age
Time to code records
© 2019 AHIMA
ahima.org
Descriptive Statistics
Measures of Central Tendency
Mean
Arithmetic average
Sum of values divided by the number of values
Median
Middle value
If even number of values, average of two middle values
Less influenced by extreme values or outliers than the mean
Mode
Most frequent value
© 2019 AHIMA
ahima.org
Descriptive Statistics
Measures of Variation
Range
Maximum value minus minimum value
Interquartile range
Difference between the third and first quartile
Variance
Average squared deviation from the mean
Unit of measure is “squared units”
Standard deviation
Square root of the variance
Unit of measure is same as unit of measure in sample
© 2019 AHIMA
ahima.org
Descriptive Statistics
Example
Calculate the mean, median and mode of the following sample length of stay data:
2, 4, 6, 3, 1, 2, 5
Mean:
Median
Sort values: 1, 2, 2, 3, 4, 5, 6
Median or middle value = 3
Mode
2, since it is the most frequent value
Note: The mode is rarely used for continuous variables that have many unique values and is presented here for demonstration purposes.
© 2019 AHIMA
ahima.org
Descriptive Statistics
Example
Calculate the range, variance and standard deviation of the following sample length of stay data:
2, 4, 6, 3, 1, 2, 5
Range = 6 – 1 = 5
Sample variance
= 3.2
Standard deviation
s =
© 2019 AHIMA
ahima.org
Review: Hypothesis Testing Steps
Determine the null and alternative hypotheses
Set the acceptable type I error or alpha level
Select the appropriate test statistic
Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic
Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.
© 2019 AHIMA
ahima.org
Inferential Statistics
One Sample t-test
One-sample t-test
Used to test if a population value is different from a standard or benchmark
Test statistic:
Compare to a t-distribution to determine critical value
May be one sided or two sided
Anatomy of test statistic:
Numerator: distance from sample mean to null hypothesis value
Denominator: standard error of the sample mean (SEM)
© 2019 AHIMA
ahima.org
Inferential Statistics
One Sample t-test – Example
Suppose the researcher that collected the length of stay (LOS) data in the previous examples would like to determine if the population LOS is longer than a standard of 3 days.
Step 1: Determine the null and alternative hypotheses
Ho: µ ≤ 3
Ha: µ > 3
Step 2: Set the acceptable type 1 error rate (AKA alpha level).
Set α = 0.05
Step 3: Select the appropriate test statistic: t-test
© 2019 AHIMA
ahima.org
Inferential Statistics
One Sample t-test -Example
Step 3 (con’t)
Recall from previous slides:
s = 1.8
n = 7
=0.44
Step 4: Compare test statistic to critical value.
T-test statistic critical value comes from the t-distribution with n-1 degrees of freedom
T-distribution is symmetric around zero much like standard normal (bell curve); width is defined by the degrees of freedom. (see Figure 5.1 in text)
© 2019 AHIMA
ahima.org
Inferential Statistics
One Sample t-test – Example
Step 4 (con’t): t= 0.44; df = n – 1 = 7 -1 = 6, one sided test at α=0.05, critical value = 1.943
Step 5: Reject the null hypothesis if the test statistic is more extreme than the critical value. 0.44 is not greater than 1.943, do not reject the null hypothesis and conclude that the LOS is not longer than the standard
© 2019 AHIMA
ahima.org
Inferential Statistics
Confidence Interval for Population Mean
Recall that a confidence interval is a range of values that is likely to cover the true population value with a pre-defined probability or level of confidence
A (1-α)% confidence interval for the population mean is centered at the sample mean and has a width that is dependent on the confidence level and standard error of the mean
Higher level of confidence requires a wider interval
Large sample size results in a narrower interval
Width of confidence interval is a measure of the precision of the estimate of the sample mean
A narrower interval is more precise
© 2019 AHIMA
ahima.org
Inferential Statistics
Confidence Interval for Population Mean
Formulate a 95% confidence interval for the LOS data presented in the previous example:
s = 1.8
n = 7
95% CI, so α = 0.05; α/2 = 0.025
df = 6
Critical value (table 5.1) = 2.447
95% CI:
1.7
(1.6,5.0)
We are 95% sure that the range 1.6 to 5.0 days includes the true population LOS is between
© 2019 AHIMA
ahima.org
Inferential Statistics
Paired t-test
Paired t-test
Used to compare pre/post test population values or matched pairs
Test statistic:
Where d = difference between the pre/post values or the pairs
Compare to a t-distribution to determine critical value
May be one sided or two sided
Anatomy of test statistic:
Numerator: distance from sample mean difference to null hypothesis value (usually zero)
Denominator: standard error of the sample mean difference (SEM)
© 2019 AHIMA
ahima.org
Inferential Statistics
Paired t-test – Example
The transition from ICD-9 to ICD-10 is predicted to cause an increase in the amount of time required to code medical records. A pilot study was conducted using a random sample of 10 records to determine if the time required was significantly different. Each record was coded using the two coding systems by on coder. The values are recorded in the table.
Step 1: Determine the null and alternative hypotheses:
Ho: D = 0
Ha: D ≠ 0
Step 2: Set the alpha level: 0.01
ID | ICD-9 Time | ICD-10 Time | d |
1 | 10 | 15 | 5 |
2 | 11 | 12 | 1 |
3 | 15 | 10 | -5 |
4 | 30 | 36 | 6 |
5 | 5 | 7 | 2 |
6 | 10 | 13 | 3 |
7 | 8 | 5 | -3 |
8 | 11 | 19 | 8 |
9 | 21 | 19 | -2 |
10 | 18 | 23 | 5 |
© 2019 AHIMA
ahima.org
Inferential Statistics
Paired t-test – Example
Step 3: Select the appropriate test statistic:
Step 4: Compare the test statistic to the critical value
=1.49
Compare to t distn with df = 9, α/2 = 0.005
1.49 not > 3.25
Step 5: Do not reject Ho
ID | ICD-9 Time | ICD-10 Time | d |
1 | 10 | 15 | 5 |
2 | 11 | 12 | 1 |
3 | 15 | 10 | -5 |
4 | 30 | 36 | 6 |
5 | 5 | 7 | 2 |
6 | 10 | 13 | 3 |
7 | 8 | 5 | -3 |
8 | 11 | 19 | 8 |
9 | 21 | 19 | -2 |
10 | 18 | 23 | 5 |
© 2019 AHIMA
ahima.org
Inferential Statistics
Two Sample t-test
Used to test if a two population means are different
Test statistic complex
Denominator is standard error pooled across the two samples
use statistical software to calculate
Compare to a t-distribution to determine critical value
May be one sided or two sided
Anatomy of test statistic:
Numerator: distance between the two sample means
Denominator: pooled standard error of the difference between the two sample means
© 2019 AHIMA
ahima.org
Inferential Statistics
Two Sample t-test – Example
An analyst wanted to find out if the charges for CHF patients admitted through the emergency
department (ED) are different from those admitted through other sources. The data for the
sample may be found in the Chapter 5 Data file. The summary statistics from the samples
appear in the table below:
Step 1: State hypotheses:
Ho: µ1= µ2
Ho: µ1≠ µ2
Step 2: Set the alpha level = 0.01
Step 3: Determine the test statistic: T-test
© 2019 AHIMA
ahima.org
Inferential Statistics
Two Sample t-test – Results from R
Step 4: Compared test statistic to critical value
t = 2.2363 with a p-value = 0.03189
Step 5: Reject null hypothesis if test statistic is more extreme than critical value or the p=value is less than alpha
Reject the null hypothesis (0.03 < 0.05) and conclude that patients that are admitted through the emergency department have longer lengths of stay
© 2019 AHIMA
ahima.org
Inferential Statistics
ANOVA
Used to test if a more than two population means are different
Test statistic: F-test
Best to use software to compute
Compare to an F-distribution to determine critical value
Anatomy of test statistic:
Numerator: variance between comparison groups
Denominator: variance within comparison groups
© 2019 AHIMA
ahima.org
Inferential Statistics
ANOVA
Sum of Squares
Degrees of Freedom
Mean
Squares
Test statistic: F
© 2019 AHIMA
ahima.org
Inferential Statistics
ANOVA – Example
The Medicare severity-adjusted diagnosis-related group (MS-DRG) system is designed so that the level of resources as measured by charges per patient required to treat a patient are different within the no complication or comorbidity, complication or comorbidity (CC), or major complication or comorbidity (MCC) family. An analyst was asked to test to see if that relationship was true at her facility. A sample of 80 cases was selected for the three congestive heart failure MS-DRGs: 291 (MCC), 292 (CC), and 293 (no CC or MCC). Since three populations of patients are compared, the analyst used R to generate summary statistics and the ANOVA table below.
© 2019 AHIMA
ahima.org
Inferential Statistics
ANOVA – Example
Step 1: State the hypotheses
Ho: µ291= µ292= µ293
At least two of the population means are unequal
Step 2: Set the acceptable error level: α=0.05
Step 3: Determine the appropriate test statistic: F-test
Step 4: Compare p-value to α=0.05
Step 5: Conclude to reject Ho since p < 0.0001 < 0.05
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image40.png
image6.png
image7.png
image8.png
image9.png
image10.png
image11.png
image12.png
image13.png
image14.png
image15.emf
image16.png
image17.emf
image18.emf
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 6, Analyzing the Relationship between Two Variables
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Illustrate appropriate inferential statistics to use for assessing the relationship between two categorical variables
Compare and contract sensitivity and specificity
Compare and contrast the two types of correlation statistics
© 2019 AHIMA
ahima.org
Categorical Variables
Descriptive Statistics
Contingency tables
Used to display and analyze the relationship between two categorical variables
Notice in table below:
20/32 = 62.5% of female patients were discharged home
10/24 = 41.7% of male patients were discharged home
Inferential Statistics
Is this just a random occurrence or is this evidence that there is a significant relationship between gender and being discharged to home?
A hypothesis test may be used to answer that question
© 2019 AHIMA
ahima.org
Example: Chi-squared Test of Independence
Step | Response |
1. Determine the null and alternative hypotheses | Ho: Discharged to Home and Gender are independent H1: Discharged to Home and Gender are not independent |
2. Set the acceptable type I error or alpha level | The analyst is willing to accept a 5% chance or probability of rejecting the null hypothesis when it is true. Alpha = 5% or 0.05 |
3. Select the appropriate test statistic | Chi-squared |
© 2019 AHIMA
ahima.org
Example: Chi-squared Test of Independence
Test statistics typically compare the value observed in the sample to the null hypothesis value.
If gender and discharged home were independent, then we would expect the distribution of subjects among the four cells (Male/female x home/not home) to be uniform and not have a pattern.
In other words, the proportion of males sent home should be similar to the proportion of the females sent home if the null hypothesis were indeed true.
The basis of the chi-squared test statistic is the observed and expected frequencies in each of the table cells
© 2019 AHIMA
ahima.org
Example: Chi-squared Test of Independence
© 2019 AHIMA
ahima.org
Example: Chi-squared Test of Independence
Test statistic:
© 2019 AHIMA
ahima.org
Example: Chi-squared Test of Independence
Last two steps in hypothesis test:
Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic
Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.
Chi-squared test statistic follows the Chi-squared distribution with (r-1)x(c-1) degrees of freedom. r = rows in contingency table and c = columns
Chi-squared distribution is always non-negative
Degrees of freedom define the shape
Since alpha was set to be 0.05 (5%), reject H0 if the test statistic is greater than 3.841
X2 = 2.39 which is not greater than 3.841
Do not reject H0
Conclusion: The sample data does not provide sufficient evidence to reject H0 and conclude that there is no significant relationship between gender and the likelihood being discharged to the home setting
© 2019 AHIMA
ahima.org
Sensitivity and Specificity
Measures the accuracy of predictions made by categorical variables
When using one categorical variable (smoking status) to predict another categorical variable (cancer status)
Sensitivity – proportion of sample with the indicator present and a positive test divided by the number of those with an indicator present.
Specificity – the proportion of the sample without the indicator and a negative test divided by the number of those without an indicator
© 2019 AHIMA
ahima.org
Sensitivity/Specificity Example
A health plan wishes to use accessing their patient portal as a predictor of whether or not a patient will seek care at an emergency room during the year. That is, they believe that patients that do not access the patient portal are more likely to experience an ER visit. They collected the following data based on enrollees during the previous plan year. Calculate the sensitivity and specificity of patient portal use as a predictor of ER use.
Note that the contingency table is set up so that ‘no’ for patient portal access and ‘yes’ for ER visit are in cell ‘A’ (upper left hand corner). This is because the health plan believes that patients that do not use the patient portal are MORE likely to experience an ER visit.
ER Visit During Previous Year? | ||
Patient Portal Access? | Yes | No |
No | 30 | 23 |
Yes | 15 | 86 |
© 2019 AHIMA
ahima.org
Sensitivity/Specificity Example
ER Visit During Previous Year? | ||
Patient Portal Access? | Yes | No |
No | A: 30 | B: 23 |
Yes | C: 15 | D: 86 |
© 2019 AHIMA
ahima.org
Descriptive Statistics – Correlation
Pearson’s correlation coefficient (r)
Measures the linear association between two continuous variables
Spearman’s Rho (r)
Measures the linear association between two ordinal variables or one ordinal and one continuous variable
Correlation between two variables does not imply causation – only that the two have a relationship or are ‘associated’
Be aware that correlation measures the linear association of two variables
They may be related in a non-linear way that may result in misleading values for the correlation coefficients
© 2019 AHIMA
ahima.org
Descriptive Statistics –
Pearson’s Correlation Coefficient
Used for measuring the linear association between two continuous variables
Values from -1 to +1
Positive value means that both variables increase/decrease together
Example: Charges and length of stay
Negative value means that one variable increases as the other decreases
Example: Experience and time to code a medical record
© 2019 AHIMA
ahima.org
Descriptive Statistics –
Pearson’s Correlation Coefficient
Example of negative correlation
More experienced coders require less time to code records – in general
© 2019 AHIMA
ahima.org
Descriptive Statistics –
Pearson’s Correlation Coefficient
Example of positive correlation
Longer lengths of stay result in longer charges – in general
© 2019 AHIMA
ahima.org
Descriptive Statistics –
Pearson’s Correlation Coefficient Example
© 2019 AHIMA
ahima.org
Descriptive Statistics –
Spearman’s Rho Correlation Coefficient
Used for measuring the linear association between two ordinal variables or an ordinal and continuous variable
Operates on the ranks for the paired values and not the actual variable values
Typically rank ties are broken with average ranks
Values from -1 to +1
Positive value means that both variables increase/decrease together
Example: patient severity level and charges
Negative value means that one variable increases as the other decreases
Example: Grade in elementary school and time to run 100 yards
Same formula a Pearson’s r, but use ranks instead of actual values
If there are no ties in the ranks, may use (Where Di is the difference between the ranks of the ith pair of variables and n is the sample size):
© 2019 AHIMA
ahima.org
Inferential Statistics –
T-test for correlations
Used to test the null hypothesis that the correlation coefficient is zero
Same formula for both Pearson’s and Spearman’s correlation coefficients
Note that the sample size in is the numerator of the test statistic
For very large samples, the test may reject the hypothesis of 0 correlation when the value of the sample correlation is not practically significant
© 2019 AHIMA
ahima.org
Inferential Statistics –
T-test for correlations – Example
Test the hypothesis that the correlation between length of stay and charges in the previous example if different from zero.
Step 1: State the null and alternative hypotheses
Ho: r ≤ 0
Ha: r > 0
Note: In practice, a one sided test of significance is used for r. If the sample value is > 0, then the alternative hypothesis is ‘>0’. If the sample value is negative, then the alternative hypothesis is ‘<0’.
Step 2: Set the acceptable alpha level = 0.05
© 2019 AHIMA
ahima.org
Inferential Statistics –
T-test for correlations – Example
Step 3: Determine the test statistic and calculate the value
T-test for correlations
= 0.93
Step 4: Compare the test statistic to the critical value
Use t-distribution with d.f. = n-2 = 3 and
alpha = 0.05 is 2.353
t= 4.71 > 2.353,
Step 5: Reject the null hypothesis since 4.71 > 2.353 and conclude that the correlation between LOS and charge is not zero
© 2019 AHIMA
ahima.org
Inferential Statistics
Simple Linear Regression
Used to formulate a functional relationship between two continuous variables
A linear function of the independent variable (X) is estimated to predict values of the dependent variable (Y)
Slope-intercept form of a line:
Y = a + bX
a is the y-intercept
b is the slope of the line
If variables are positively correlated, the slope of the line is positive
If variables are negatively correlated, the slope of the line is negative
© 2019 AHIMA
ahima.org
Inferential Statistics
Simple Linear Regression – Example
Least squares regression
Minimizes the vertical distance from each point to line
Vertical distance called the ‘error’ or ‘residual’
Least square line provides a line that comes as close as possible to all points, but may not actually intersect with any of them
© 2019 AHIMA
ahima.org
Inferential Statistics
Simple Linear Regression – Example
Slope of line is 4,443
Interpretation: The expected charge increase for each additional day is $4,443
Intercept of line is $7,801
Interpretation: The expected charge with a zero day stay is $7,801
Zero stay is not realistic, but intercept gives an estimate of the fixed cost of admitting a patient while the slope represents the variable cost.
© 2019 AHIMA
ahima.org
Regression Hypothesis Tests
Two hypothesis tests are presented in this table
Ho: Intercept = 0 vs H1: Intercept ≠ 0
P-value = 0.121 > do not reject
Even though the intercept is not statistically different from zero (do not reject the null hypothesis that it is equal to zero), the intercept is typically kept in the model
Ho: Slope = 0 vs H1: Slope ≠ 0
P-value = 0.021 > reject Ho and conclude that the slope is not equal to zero
The interpretation here is that LOS gives us useful information about the charge since the slope of the regression line is non-zero
© 2019 AHIMA
ahima.org
Regression Assumptions
Residuals
Difference between the actual value of the dependent variable and the value predicted using the regression equation
The vertical (y-axis) distance from an individual point to the regression line
Must test the following assumptions regarding the residuals:
Independence
Normally distributed
Mean of zero
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image7.png
image8.png
image9.png
image10.png
image11.png
image110.png
image12.png
image13.png
image14.png
image15.png
image16.png
image17.png
image18.png
image19.png
image20.png
image21.png
image22.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 7, Study Design and Sample Selection
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast types of studies
Compare and contrast the four most common sampling techniques
Determine the appropriate sample size for attribute and variable studies
© 2019 AHIMA
ahima.org
Types of Studies – Descriptive
Descriptive studies – performed to generate hypotheses for more formal studies
Cross-sectional study – describes the characteristics of a population at a specific point in time
Often used for prevalence studies
Applied descriptive studies
Data mining
Exploratory data analysis
© 2019 AHIMA
ahima.org
Types of Studies – Analytic
Analytic studies – more formal studies designed to test a specific hypotheses
Case-control study – involves both a case group (subjects with the attribute under investigation) and a control group (those without the attribute)
Members of the case and control groups are often matched based on demographics
Typically a retrospective study
May not be used to determine cause and effect; can calculate odds ratio
Weakness – dependent of subject’s ability to recall events
Cohort studies – involves case and control group, but groups are identified before the study is performed
Prospective study
May not be used to determine cause and effect; can calculate relative risk
May take a long time to complete
Not useful if the attribute studied is rare
© 2019 AHIMA
ahima.org
Types of Studies – Experimental
Allow the determination of a cause and effect relationship between variables
Randomized Control Trials (RCT)
Used to determine the effectiveness of new drugs/treatment protocols
Blinded studies
Single blind – subject does not know if they are assigned to the case or control group
Double blind – neither subject nor the researcher know if they are assigned to the case or control group
Triple blind- subject, researcher and analytics are all blinded as to the group assignment of the subject
© 2019 AHIMA
ahima.org
Why select a sample?
Often population is too large to collect data from every unit of analysis or subject
Statistical inference is used to make conclusions about a population based on a sample
Vocabulary:
Population or universe – all subjects that are under study and eligible to be sampled
Sample – selected subset of the population
Sampling frame – A listing of all of the subjects in the population
Variable of interest – Quantity to be estimated (denial rate, coding error rate, overpayment, underpayment, etc)
© 2019 AHIMA
ahima.org
Statistically Valid Sample
Large enough to provide information with sufficient precision to meet the goals of the analysis
Probability sample where each item has an equal chance of being selected
Must be reproducible
© 2019 AHIMA
ahima.org
Defining the Variable of Interest
What is the percent of lab orders that are not signed by a physician during 2012?
Universe – all lab orders during 2012
What is the amount over/under paid due to incorrect E/M level assignment during January?
Universe –
E/M services billed during January
E/M services provided during January
Must refine question to determine if billed date or service date should be used for defining the universe
What is the coding accuracy rate for secondary diagnosis codes on inpatient accounts during the first quarter?
Universe –
All secondary diagnoses coded during first quarter
All inpatient accounts during first quarter
Must refine question to determine if diagnosis codes or charts are the unit of analysis
© 2019 AHIMA
ahima.org
Simple Random Sampling
It is the statistical equivalent of drawing sampling units from a hat.
Each sampling unit (claim, chart, etc.) must have the same probability of selection.
Note that some random number generators will allow the user to set a ‘seed’. If that feature is available, the analyst should always set a seed. This will ensure that the sample can be replicated.
A simple random sample is not appropriate if the frame cannot be listed or if it is important that the sample contain particular (rare) subsets of the population.
© 2019 AHIMA
ahima.org
Random Number Generators
All random number generators are based on mathematical functions that need a ‘seed’ or starting point
The use of a seed ensures that two independent samples drawn using the same software will result in the same series of random numbers and reproducible sample
Using software to generate random numbers:
RAND() in Excel does not allow a seed
Random Number Generation in R does allow a seed
© 2019 AHIMA
ahima.org
Simple Random Sampling
Steps
Method 1:
The members of the sampling frame should be assigned a random number between 0 and 1
The frame may then be sorted by the random number
The first ‘n’ will be the simple random sample of size ‘n’
Method 2:
Assign a sequence number from 1 to ‘n’ to each member of the sampling frame
Use a random number generator (e.g., R) to select random numbers from 1 to ‘N’ (N is the population size)
© 2019 AHIMA
ahima.org
Systematic Random Sampling
A systematic random sample is a simple random sample that is selected using a particular technique. If the population includes ‘N’ members and we wish to draw as sample of size ‘n’, then a systemic random sample could be selected by choosing every N/nth member of the population as the sample.
The selection should start at random from a member between the 1st and N/nth member.
NOTE: If N/n is not a whole number, then round down to the next lower whole number to determine the sampling interval.
In order to ensure that a systematic random sample is truly random, the population should not be sorted in an order that might bias the sample.
© 2019 AHIMA
ahima.org
Stratified Random Sampling
Population is divided into unique subsets or strata
Strata should be mutually exclusive and exhaustive. In other words, each of the members of the population should be in one and only one stratum.
A simple random sample is then selected from each of the strata
The size of the sample in each strata may be equal or may be assigned proportionally according to the relative size of each strata
Stratified sampling is appropriate when the quantity to be estimated may vary among natural subgroups (strata) of the population
Typical strata in healthcare may be:
CPT® Code (E/M levels)
Physician
Specialty
Clinic
© 2019 AHIMA
ahima.org
Stratified Random Sampling
Example
Example: An analyst wishes to select a stratified random sample of 90 from a population of 1,000 E/M visits. The distribution of E/M visits in the population is:
Level 1: 55
Level 2: 183
Level 3: 236
Level 4: 309
Level 5: 217
© 2019 AHIMA
ahima.org
Stratified Random Sampling
Example
Example: An analyst wishes to select a stratified random sample of 90 from a population of 1,000 E/M visits. The distribution of E/M visits in the population is:
Level | Population Count (N) | % of Population | Sample Size (n) |
1 | 55 | ||
2 | 183 | ||
3 | 236 | ||
4 | 309 | ||
5 | 217 | ||
Totals | 1,000 | 100% | 90 |
© 2019 AHIMA
ahima.org
Stratified Random Sampling
Example
Example: An analyst wishes to select a stratified random sample of 90 from a population of 1,000 E/M visits. The distribution of E/M visits in the population is:
© 2019 AHIMA
ahima.org
Cluster Sampling
The population is divided into subsets much like the strata in stratified sampling
Clusters should be mutually exclusive and exhaustive
All members of each cluster are selected to be a part of the sample
Clusters are selected at random
Cluster sampling is appropriate when it is difficult to access all of the population
© 2019 AHIMA
ahima.org
Cluster Sampling
Example
The director of the emergency department would like to audit the accuracy of charge capture for the first quarter of 2020. Unfortunately, she is not able to obtain a full listing of the patients that pass through the ED for a sampling frame. Instead, a cluster sample will be drawn using date of service as the cluster. Select 10 dates via simple random sampling to produce a cluster sample.
© 2019 AHIMA
ahima.org
Non-probability Sampling
Random sample not required if:
Study is exploratory or a focused review
Example: If we wish to determine educational opportunities for improving documentation, we may sample accounts with few secondary diagnoses to determine if there is a pattern in the types of diagnosis codes most likely to be missed
Typically, this sample is driven by some exploratory data analysis or data mining to help ‘steer’ the sample to subjects most likely to have the issue of interest
© 2019 AHIMA
ahima.org
Non-probability Sampling
Convenience sampling
Example – sample first ‘n’ customers that enter the hospital cafeteria
Judgment sampling
Use exploratory data analysis based on experience or history
AKA focused review
Example – Know from history that the customer satisfaction in cafeteria is lowest at lunch time because of long lines. Select sample at that time to try to improve process.
Quota sampling
Subjects divided into groups
Judgment sample used within each group
Example – may select first 10 male and 10 female customers to cafeteria
© 2019 AHIMA
ahima.org
Sample Size – Attribute Studies
Attribute studies are designed to measure a proportion or rate
Examples of attribute studies:
Claim denial rate
Correct coding rate
Sample size is dependent on:
The expected proportion
Based on a small pilot study
Set to 0.5 for largest sample
Resources available to perform the study
OIG current recommendation for a pilot study is 30
© 2019 AHIMA
ahima.org
Attribute Study Sample Size Example
© 2019 AHIMA
ahima.org
Sample Size – Variable Studies
Variables studies are designed to measure a ratio quantity
Examples of attribute studies:
Length of stay
Charge amounts
Lab values
Sample size is dependent on:
Standard deviation of the quantity to be measured
Based on a small pilot study
Resources available to perform the study
OIG current recommendation for a pilot study is 30
© 2019 AHIMA
ahima.org
Variable Study Sample Size Example
© 2019 AHIMA
ahima.org
Sample Size and Precision
In both types of studies, attribute or variable, a higher level of precision requires a larger sample size
A higher level of precision is equivalent to requiring a narrower confidence interval for a set confidence level
Note that increasing ‘n’ in both the proportion and mean confidence interval formulas results in narrower intervals (all other variables held constant)
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image7.png
image8.png
image9.png
image10.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 8, Exploratory Data Applications
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Illustrate case mix index analysis techniques
Compare and contrast case mix measurement in outpatient vs inpatient setting
Explore relative value unit analysis
© 2019 AHIMA
ahima.org
Exploratory Data Analysis
AKA Data Mining
Using statistical techniques to find patterns in data
Typically, a mixture of graphical displays and descriptive statistics
Many practical applications in improving healthcare operations
HIM professionals are uniquely positioned to perform this analysis because they understand the data and the underlying operational and reimbursement implications of patterns
© 2019 AHIMA
ahima.org
Case Mix Analysis
Case Mix Index (CMI) – average MS-DRG weight for all patients
May be calculated for subsets of patients such as Medicare/Medicaid/selected MS-DRGs
May exclude portions such as transplants (very high weight MS-DRGs) or transfers (reduced payment and short stays)
Single number that may be used as a proxy for measuring the resource intensity of a hospital’s patients
Medicare CMI is the primary driver of the inpatient Medicare revenue
Frequently a key performance indicator for a hospital and a key driver of the revenue budget
© 2019 AHIMA
ahima.org
Case Mix Index
Example
Multiply the number of cases in each MS-DRG by the relative weight
Sum the values from #1
Sum the number of discharges
Divide total relative weights by the number of discharges
Note: This is the weighted average of the relative weights for each MS-DRG.
© 2019 AHIMA
ahima.org
MS-DRG Families
MS-DRGs may be broken into families with two or three members:
No CC
CC (not present in all families)
MCC
The MS-DRG weight system is designed to assign higher weights to MS-DRGs that require a higher resource intensity
MCC MS-DRGs are assigned higher weights than no CC MS-DRGs in the same family
COPD
Pacemaker Replacement
© 2019 AHIMA
ahima.org
CC/MCC Capture Rates
Example:
This value can be compared to HCUP data using a z-test for proportions to determine if the sample rate is higher/lower than the national rate
In general, hospitals with higher CC/MCC capture rates have higher CMI
A unusually high CC/MCC capture rate may be indicative of a compliance issue (over-coding) and should also be investigated
© 2019 AHIMA
ahima.org
CMI Shifts
Significant shifts in CMI should be investigated to determine the root cause
Potential causes:
New service lines
Surgeon vacation schedules
Holidays
Natural disasters (hurricanes, tornados, etc.)
© 2019 AHIMA
ahima.org
Other DRGs Systems
AP-DRGs
All-patient DRGs
AKA “New York Grouper”
Three character, numeric
Weights are calibrated for all patients and not only Medicare
APR-DRGs
All patient refined DRGs
3M proprietary grouping system
3 character, numeric followed by digit (1-4) for severity and (1-4) for risk of mortality
© 2019 AHIMA
ahima.org
Ambulatory Patient Classifications (APC)
CMS uses APCs to pay for services in the hospital outpatient and ambulatory surgery settings.
Challenges of APCs
Claim may have more than one payable APC
Assignment of CPT/HCPCS codes to APCs may change each year
More of a fee schedule than a true prospective payment system
Can use APC weights to calculate a service mix index (SMI)
Note that this measures the average resource intensity for the services provided and not for the typical case
© 2019 AHIMA
ahima.org
Methods of Analysis
Validation of utilization patterns
Specialty specific codes
Comparison to hospitals with like service mix (trauma center, transplants, etc.)
RVU Analysis
Work RVUs may be used to benchmark physician productivity
Part of the CMS Physician Fee Schedule
© 2019 AHIMA
ahima.org
RVU – Physician Productivity
Dr. Kana billed the lowest number of wRVUs during July
When productivity is adjusted for the fact that Dr. Kana is a 40% FTE (cFTE = 0.4), she is actually the second most productive physician
© 2019 AHIMA
ahima.org
RVU – Other Uses
Average cost per RVU
Physician compensation per work RVU (wRVU)
Malpractice expense per Malpractice RVU (mRVU)
Overhead or practice expense per Practice Expense RVU (peRVU)
Break-Even Conversion Factor (BECF)
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.emf
image7.png
image8.png
image9.emf
image10.emf
image11.png
image12.png
image13.png
image1.jpg
image2.jpg
A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 9, Benchmarking and Analyzing Externally Reported Data
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Explain the types of benchmarking
Link benchmarking to value-based purchasing programs
Discuss healthcare report cards
© 2019 AHIMA
ahima.org
Types of Benchmarking
Benchmarking – comparing performance to a standard
Internal benchmarking – comparison to internal goals or year-over-year
External benchmarking – comparison to external norms or competitors
Benefits
Identify strong or weak areas
Part of quality improvement culture
© 2019 AHIMA
ahima.org
Benchmarking Steps
1. Identify the issue to benchmark
2. Locate internal data related to the issue
3. Analyze internal data
4. Identify external data available for benchmarking
5. Collect public domain data or purchase data, if appropriate
6. Compare internal and external data
7. Determine whether a performance gap exists
8. Communicate benchmarking findings
9. Establish performance-level targets and action plans for achievement
10. Implement plans; monitor and communicate progress
11. Recalibrate benchmarks as necessary
12. Repeat the process
© 2019 AHIMA
ahima.org
Hospital Value Based Purchasing Programs (HVBP)
CMS HVBP is example of a formal benchmarking program
HVBP includes four domains
Process of care
Outcomes
Patient experience
Efficiency of care
Generates Total Performance Score (TPS) that is used to determine an incentive payment added to Medicare inpatient payments for participating hospitals
© 2019 AHIMA
ahima.org
Dashboards and Scorecards
Method to represent performance in terms of key performance indicators (KPI)
Guide management decisions
Include a combination of indicators measured on a ‘per unit’ basis for comparability across time
Categories may include:
Clinical
Operational
Financial
© 2019 AHIMA
ahima.org
Example Dashboard – Medicare Spending
© 2019 AHIMA
ahima.org
Example Dashboard – Medicare Chronic Conditions
© 2019 AHIMA
ahima.org
National Quality Forum (NQF)
Provides a framework for endorsing healthcare quality measures by:
Convenes working groups to foster quality improvement in both public- and private-sectors;
Endorses consensus standards for performance measurement;
Ensures that consistent, high-quality performance information is publicly available; and
Seeks real time feedback to ensure measures are meaningful and accurate.
Endorsement of a quality measure requires the following steps:
Measure is proposed and supported with scientific evidence
Validity and reliability of the measure is established
Feasibility is tested typically via pilot testing; includes cost and potential administrative burden for data collection
Usability is assessed; does the measure provide enough feedback so that users can improve performance
Assessment of related or competing measures
© 2019 AHIMA
ahima.org
Medicare Quality Measures
Data.medicare.gov
Hospital Compare
Nursing Home Compare
Physician Compare
Home Health Compare
Dialysis Facility Compare
Data provided in online query and comparison format as well as a bulk download of national statistics
© 2019 AHIMA
ahima.org
Hospital Compare
Example
© 2019 AHIMA
ahima.org
Risk adjustment
Quality measurement should include an adjustment for the risk of an adverse outcome
Patient level adjustment
Age/gender
Comorbidities
Provider level adjustment
Teaching status
Location (urban/rural)
Socio-economic attributes of patient mix
Payer mix
Used to compare actual performance to expected performance based on the risk factors
SIR – standardized infection rate (observed infection rate divided by the expected infection rate)
SRR – standardized readmission rate
SMR – standardized mortality rate
For all standardized rates, a value of greater than one is interpreted that a facility’s rate is higher than expected given the risk attributed to their patient mix
© 2019 AHIMA
ahima.org
CMS Risk Adjustment – CLABSI in ICU
CLABSI = central line-associated bloodstream infections
Observed and expected infection rates are calculated for each hospital
Expected rates are risk adjusted
The graph depicts the SIR or observed to expected rate (O/E) for each hospital
O/E = 1.0 means that the hospital’s infection rate is equal to that expected after risk adjustment
The dark shaded areas represent the 95% confidence interval for the O/E
© 2019 AHIMA
ahima.org
image3.jpg
image4.png
image5.png
image6.png
image7.png
image8.png
image9.png
image1.jpg
image2.jpg
Are you stuck with another assignment? Use our paper writing service to score better grades and meet your deadlines. We are here to help!
Order a Similar Paper
Order a Different Paper
