# Reflection Paper

The PowerPoint slides for all the modules that were learned are attached.

• Based on your experience and expectations, how would you characterize what you have learned in this course? What were your significant learning moments of the different modules?
• What suggestions do you have to improve what was covered during the modules? Are there any gaps?
• What “ah-ha moments” have you experienced while working on the module materials?
• What have you learned about the concepts of data analysis in healthcare?
• How will you think differently about reported healthcare data since taking this course?
• To what extent have you been successful in achieving the learning outcomes listed in the modules? What is still unclear? What do you still need to work on?

1. Compose a two-page paper in MS Word. Cite ALL sources according to APA format.

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 1, Introduction to Data Analysis

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Understand types of data analysis

Review types of data

Explore the skills required for a career in healthcare data analytics

ahima.org

Data Analysis

Healthcare is a data driven business

Data collected

Diagnostic tests

Services provided

Costs and payment

Diagnosis and procedure codes

What is data analysis?

Task of transforming, summarizing or modeling data to allow the end user to make meaningful conclusions

ahima.org

Data

Data

Data

Information

Primary vs Secondary Analysis

Primary data analysis is the use of data for its primary purpose

Example: Billing and claims data’s primary use it to determine services rendered and payment to from a patient or third-party payer

Performing an analysis of typical payment received from a payer for emergency visits is a primary use

Secondary data analysis is the use of data beyond its primary purpose

Example: ICD-10 diagnosis codes are assigned to a patient to record diseases present or discovered during an encounter

Using a profile of the most common ICD-10 diagnosis categories for the purposes of determining the patient load by service line is a secondary use.

Be aware of primary use and evaluate the secondary use is valid and reliable

ahima.org

Types of Statistical Analysis

Descriptive statistics

Characterizes the distribution of the data

Estimates the center or ‘typical’ value

Measures the spread or variation in the data

Inferential statistics

Using sample data to make conclusions or decisions regarding a population

Not practical to observe the entire population

Often accompanied with a probability of making an incorrect decision based on the sample

ahima.org

Structured vs Unstructured Data

Structured data

AKA Discrete data

Data stored in fields that may be delineated

Values can be listed and validated

Examples

Patient age

CPT code

Laboratory test values

Unstructured data

Free form text captured in narrative form

May be stored in a database field, but the content in not limited to values of a variable

Examples

Progress notes in an EHR

Comments in a patient satisfaction survey

Radiologist’s report of an x-ray result

ahima.org

Qualitative Data

Qualitative data

May be recoded or placed into categories for analysis

Example:

A nurse describes a patient as having pale skin tone.

Data scales typically used for recoding qualitative data:

Nominal – categories without a natural order

Diagnosis codes

Clinical units

Colors

Ordinal – categories with a natural order

Patient satisfaction surveys

Patient severity scores

Evaluation and management code levels

ahima.org

Quantitative Data

Quantitative data

Naturally numeric

May be categorical (ordinal or nominal)

Data scales found in quantitative analysis

Interval – numeric values where the distance between two values has meaning, but there is no true zero and the interpretation is not preserved when multiplying/dividing

Temperature

Dates

Ratio – numeric values where zero has meaning and multiplying/dividing values has meaning

Length of stay

Age

Weight

ahima.org

Variable Scales/Data Type

ahima.org

Overview of Data Type and Statistics

ahima.org

Inferential Statistics – CMS

ahima.org

Exploratory Data Analysis and Data Mining

Exploratory Data Analysis (EDA)

Used to uncover patterns in data

Typically a secondary use of data

Primarily graphical analysis (plots, trends, etc.)

Data Mining

Also looking for patterns in data

Adds in descriptive statistics and more formal statistical techniques

May be used for benchmarking and determining high/lower performers

ahima.org

Predictive Modeling

Historical data is used to build models to determine most likely outcome in future

Data mining is used to identify the potentially best predictors

Maybe a simple function (linear regression) or more involved models (neural networks)

Examples

Used by CMS for pre-payment reviews to fight fraud

Used by credit card companies to prevent fraud

Used by providers to identify missed charges

ahima.org

Data Analyst Skills

Must be able to combine:

Content knowledge (coded data, healthcare business process, etc.)

Understanding of the strengths and weaknesses of various data elements

Data acquisition skills through querying databases or effectively writing specifications for queries

Ability to identify the appropriate statistical technique to apply

Familiarity with analytic software to produce the required output

Present the analysis to the end user so that it may be the basis for business decisions

ahima.org

Opportunities for HIM Professionals

HIM Professionals are uniquely positioned to:

Understand data structures and coding systems

Understand available data and methods for integration

Can communicate with both finance and IT staff

Act as a business analyst—far more valuable than a pure data analyst

ahima.org

Entry Level Health Data Analyst Responsibilities

Working with data

Identify, analyze, and interpret trends or patterns in complex data sets

In collaboration with others, interpret data and develop recommendations on the basis of findings

Perform basic statistical analyses for projects and reports

Reporting Results

Develop graphs, reports, and presentations of project results, trends, data mining

Create and present quality dashboards

Generate routine and ad hoc reports

ahima.org

Mid-level Health Data Analyst Responsibilities

Work collaboratively

with data and reporting

the database administrator to help produce effective production management

utilization management reports in support of performance management related to utilization, cost, and risk with the various health plan data

monitor data integrity and quality of reports on a monthly basis

in monitoring financial performance in each health plan

Develop and maintain

claims audit reporting and processes

contract models in support of contract negotiations with health plans

Develop, implement, and enhance evaluation and measurement models for the quality, data and reporting, and data warehouse department programs, projects, and initiatives for maximum effectiveness

Recommend improvements to processes, programs, and initiatives by using analytical skills and a variety of reporting tools

Determine the most appropriate approach for internal and external report design, production, and distribution, specific to the relevant audience

ahima.org

Senior-level Health Data Analyst Responsibilities

Understand and address the information needs of governance, leadership, and staff to support continuous improvement of patient care processes and outcomes

Lead and manage efforts to enhance the strategic use of data and analytic tools to improve clinical care processes and outcomes continuously

Work to ensure the dissemination of accurate, reliable, timely, accessible, actionable information (data analysis) to help leaders and staff actively identify and address opportunities to improve patient care and related processes

Work actively with information technology to select and develop tools to enable facility governance and leadership to monitor the progress of quality, patient safety, service, and related metrics continuously throughout the system

ahima.org

Senior-level Health Data Analyst Responsibilities

Engage and collaborate with information technology and senior leadership to create and maintain:

a succinct report (e.g., dashboard),

a balanced set of system assessment measures, that conveys status and direction of key system-wide quality and patient safety initiatives for the trustee quality and safety committee and senior management;

present this information regularly to the quality and safety committee of the board to ensure understanding of information contained therein

Actively support the efforts of divisions, departments, programs, and clinical units to identify, obtain, and actively use quantitative information needed to support clinical quality monitoring and improvement activities

Function as an advisor and technical resource regarding the use of data in clinical quality improvement activities

Lead analysis of outcomes and resource utilization for specific patient populations as necessary

Lead efforts to implement state-of-the-art quality improvement analytical tools (i.e., statistical process control)

Play an active role, including leadership, where appropriate, on teams addressing system-wide clinical quality improvement opportunities

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 2, Data in Healthcare

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Compare and contrast reliability and validity

Categorize types of healthcare data

Connect the health care data flow to the data types and uses

Illustrate commonly used sources of external data

ahima.org

Data Quality

Validity

Accuracy of the data

Ability of the data to measure the attribute it is intended to measure

Reliability

Repeatability or reproducibility of the results

ahima.org

Types of Validity

Face validity

Does the metric appear to measure the quantity it was intended to measure?

Often assessed via expert opinion

Weakest form of validity measure, but should be the first step is assessing validity of a new test or metric

Content validity

Are the components of the metric necessary and sufficient to measure the quantity?

In survey design, this content validity ensures that there are not irrelevant questions

Construct validity

Is the measurement tool capturing the construct to be measured?

In survey design, this may be measured by asking similar questions about a topic (or construct) to ensure consistency in the responses

Criterion validity

Does the metric agree with an accepted gold standard for measuring the same quantity?

A new less expensive laboratory test may be compared against another accepted test for measuring the same quantity. If the test results agree, then the new test has criterion validity

ahima.org

Types of Reliability

Inter-rater reliability – measures the reproducibility or consistency of the metric between two different raters

Intra-rater reliability – measures the reproducibility or consistency of the metric between two different time points using the same rater

Statistics to measure reliability

Kappa statistic or Cohen’s Kappa

Measures inter or intra rater reliability

0.41 to 0.60 – moderate

0.61 to 0.80 – substantial

0.81 to 1.00 – almost perfect

Cronbach’s Alpha

Measures internal consistency between questions

Acceptable level >= 0.70

ahima.org

Types of Healthcare Data

Internal data

Electronic health records

Claims and billing data

Patient satisfaction surveys

External data

Registries (may be both internal/external)

Statewide databases

Medicare claims data

ahima.org

Diagnostic Data

Transitioned to ICD-10-CM on 10/1/2015

Even after transition, both coding systems will be utilized for data profiling and analysis

ICD was designed as a disease tracking system, but used in the US as a payment driver under prospective payment systems

ahima.org

Diagnostic Data – IPPS

CMS pays for inpatient services provided to Medicare patients via an inpatient prospective payment system (IPPS)

Payment is based on diagnosis related groups (DRG) – ICD-10 diagnosis and procedure codes are combined with other demographic data to ‘group’ patients in the DRGs for determination of payment

DRGs are further grouped into MDCs

ICD-10 and DRG codes are all updated based on the federal fiscal year starting on October 1.

ahima.org

Diagnostic Data

ahima.org

Procedural Data – ICD-10-PCS

ahima.org

Procedural Data – CPT

ahima.org

Pharmacy Data

National Drug Codes (NDC)

FDA website

Therapeutic Classification Groups

OVID Field Guide

RxNorm

National Library of Medicine

ahima.org

Revenue Codes

Place of Service Codes

Claims Processing Codes

Relative Value Unit Data

ahima.org

Revenue Codes

Four digit code

Used to categorize charges into ‘departments’ on UB-04 or 837I billing records

NOT necessarily the same department found in provider accounting system

Standard across providers

Allows comparison of departmental charges and costs across providers

ahima.org

Place of Service Codes

Used on professional claims (HCFA-1500 or 837P) to specify the type of location that the service was performed

ahima.org

Healthcare Data Flow

ahima.org

16

Claims Data

UB-04 Claim form (CMS-1450)

Hospital services

Submitted via 837I transaction set

5010 format

CMS-1500 Claim Form

Physician services

Submitted via 837P transaction set

5010 format

ahima.org

Departmental Databases

Laboratory Information System (LIS)

May use Logical Observational Identifiers Names and Codes (LOINC)

Images available through Picture Archiving and Communication System (PACS)

Patient Accounts Database

Includes financial data

Charges

Payments

Accounts receivable/accounts payable

Payroll

General ledger

May be called a practice management system in a physician office

ahima.org

Other Internal Data

Registries

Cancer

Trauma

Birth

Diabetes

Implants

Transplants

Immunizations

ahima.org

External Data

Medicare

Inpatient

Outpatient

Part B Utilization (Physician)

State Databases

HCUP

ahima.org

Medicare Claims Data

MedPAR File

All Medicare inpatient claims for a given federal fiscal year (10/1 – 9/30)

Data source for many of the labs accompany text

One record for each inpatient stay

Used as the basis for IPPS DRG relative weight changes

Standard Analytic Outpatient File

All Medicare outpatient claims for a given calendar year

Multiple files that must be combined to summarize at the claim level

An extract of this file (HOPPS) is the basis for changes to OPPS APC relative weights

Part B Utilization File

Summary file by calendar year

Includes information by specialty and for top HCPCS codes:

Allowed services (volume)

Allowed charges

Payment amount

ahima.org

CMS Payment Rule Impact Files

Released annually

Inpatient prospective payment (IPPS)

Outpatient prospective payment (OPPS)

Includes data elements that may be used for benchmarking

Hospital Demographics

Urban/rural setting

Region

Ownership

Teaching/non-teaching status

Number of beds

Operational Statistics

Volume

Average daily census

Ratio of cost to charge for cost estimation

Case mix index

Medicare percentage

Payment level (current and projected)

ahima.org

Data.medicare.gov

Central repository for Medicare ‘compare’ databases

ahima.org

23

State Databases

Utah

Office of Healthcare Statistics

Hospital utilization

Ambulatory surgery center utilization

Query tools to locate specific data

Massachusetts

Massachusetts Community Health Information Profile (MassCHIP)

Standard reports – ‘instant topics’

ahima.org

HCUP

http://hcupnet.ahrq.gov/

Data elements

Statistics on Hospital Stays

Emergency Department Use

AHRQ Quality Indicators

Online query system

ahima.org

HCUP Sample Query

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 3, Tools for Data Organization, Analysis, and Presentation

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Compare and contrast database structures

Categorize types of statistical software

Illustrate commonly used data visualization methods

ahima.org

Data Organization Using Databases

Healthcare data is complex and often multi-dimensional

Provider

Patients

Insurance companies

Services

Providing an organizational structure for the data can facilitate more efficient analysis and reporting

Database – self-describing collection of integrated records.

Self-describing – contains a description of its own structure

Integrated – data elements are related to each other

ahima.org

Database Vocabulary

Tables- two dimensional arrays of data

Rows = records

Columns = variables or attributes

RDMS – Relational Database Management System

Software that is used to hold and maintain data tables and their relationships

SQL – Structured Query Language

Programming language used to communicate with a relational database

ERD – Entity Relationship Diagram

Diagram that shows how tables in an RDMS relate

ahima.org

Hierarchy of a Relational Database

Tables are rows and columns of values

Envision a tab in a spreadsheet

Fields are the columns in a spreadsheet

In a patient database, fields may be age, gender, admission date, etc.

Data elements or records are the rows in a spreadsheet

In a patient database, row may represent patients or services provided to patients

A unique row identifier in a table is called the primary key

Cannot be duplicated within the same table

ahima.org

Data Dictionary

Should include

Name of computer or software program that contains the data element

Type of data in the field

Length of data in the field

Edits placed on the data field

Values allowed to be placed in the data field

A clear definition of each value

ahima.org

Structured Query Language

SQL

Tool to use and maintain databases

Select data

Update data

Insert rows into a table

Delete rows from a table

ahima.org

SQL Example

Retrieve the records for all patients from Milwaukee

SELECT PATIENT_LNAME, PATIENT_FNAME FROM PATIENT WHERE PATIENT_CITY = ‘Milwaukee’

Key words in the query are in red font

ahima.org

Statistical Software Packages

R

Open source with a large user base on-line

May open Excel files for analysis

Statistical Analysis System (SAS)

Command line program

Excellent for manipulating large datasets

SPSS

Microsoft Excel

Spreadsheet software with extensive statistical function

Excellent for summarizing data quickly

ahima.org

R

Open source program that may be installed on both windows and Mac based computers

Used to demonstrate examples throughout the text

Many on-line tutorials and video demonstrations of the capabilities

Open source allows the users to expand the functionality of the software

Free to use

ahima.org

SAS Syntax

SAS is a programming language much like SQL

Key words:

Data – used to name and create a dataset

Proc – declare which analytic procedure will be used

Set – declare which dataset will be the subject of the analysis

Run – designates the end of the command and starts the calculation

Syntax: always end commands with a ‘;’

ahima.org

Graphical Displays of Data

Types of graphical comparisons

Group summary

Trends or changes over time

Relative size of groups

Relationships between variables

ahima.org

Bar Graph or Chart

Group summary

Comparison of counts or averages across groups

One bar for each gender

ahima.org

Line Graphs or Chart

Trends or changes over time

Look for trends/patterns

Should not be used for connecting unrelated points

ahima.org

Pie Chart

Compares relative size of groups

Used to represent relative proportions of a total

Note that this is different than a bar chart – in a pie chart categories must be part of a bigger set or population

ahima.org

Scatter Diagrams

Used to display the relationship between two continuous variables

Should not be used if either variable is categorical

ahima.org

Infographics

Conveys a message or story using a combination of graphs and text

Primary types:

Cause and effect

Chronological

Quantitative

Directional

Product

ahima.org

Tables versus Graphs

Tables have several advantages over graphs such as:

Display the exact values

Require less work to create

Graphs also have advantages over tables such as:

Catch the attention of the reader

Show trends easily

Bring out facts or relationships that stimulate thinking

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 4, Analyzing Categorical Variables

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Compare and contrast rates and proportions commonly used in healthcare

Relate the rates and proportions to the appropriate statistical methods

Illustrate commonly used descriptive and inferential statistics

ahima.org

Categorical Variables

Data elements that represent categories

Nominal (no natural order)

Ordinal (ordered)

Healthcare Examples

Gender

Discharge status

ahima.org

Rates and Proportions

Commonly used in healthcare

Mortality rates

Infection rates

Complication rates

Must understand the numerator and denominator of each rate

Numerator – count of subjects that meet the criteria to be measured

Denominator – count of subjects that could meet the criteria to be measured

Example – 30 day readmission rate for COPD

Numerator – patients discharged with a principal diagnosis of COPD and readmitted within 30 days

Denominator – patients discharged with a principal diagnosis of COPD

Be aware of any exclusion criteria

Gender specific rates

Immune-compromised patients

ahima.org

Census Statistics

Inpatient census

Number of patients in the facility at a set point in time

Typically measured at midnight

Daily inpatient census

Number of patients in the facility at a set point in time plus any patients that were both admitted and discharged during that day

Resources are expended to treat patients that are admitted and discharged on the same day

May be a more relevant statistic that inpatient census for monitoring resource consumption

Inpatient service day

Inpatient service day for a particular day is equal to the daily inpatient census that that day

Average daily inpatient census

The number of inpatient service days averaged over a set time period.

Formula:

ahima.org

Example 1

The hospital inpatient census at midnight on January 15th is 102. Fifteen patients are admitted, three patients are discharged and one patient is admitted and subsequently dies on January 16th.

What is the inpatient census for January 16th?

102 + 15 – 3 = 114

How many inpatient service days were provided on January 16th?

102 + 15 – 3 + 1 = 115

(Note that the one patient admitted and discharged on the same day is included in the inpatient service days, but not present for the January 16th inpatient census count.)

ahima.org

Example 2

If the number of inpatient service days for the first calendar quarter of 2015 was 9,015, what was the average daily inpatient census for the quarter (round to the nearest 0.1 of a day)?

Step 1: Find number of days in period. First calendar quarter is January (31 days), February (28 days), March (31 days). Number of days = 31 + 28 + 31 = 90

Step 2: Divide the [inpatient service days] by the [number of days] in the period. 9015/90 = 100.2 days

ahima.org

Utilization Rates

Cesarean section rate:

Number of c-sections performed divided by number of deliveries

Note the denominator is the number of deliveries (not mothers) and includes both c-section and vaginal births

Inpatient occupancy rate

Inpatient service days divided by the number of bed days in the period

The number of bed days is the number of beds available for each day in the period measured

If the number of beds changes during the period, then that change must be reflected in the number of bed days.

Mortality Rates

Gross mortality rate – number of patients that died divided by the number of patients discharged during the period

Net mortality rate: number of patients that died at least 48 hours after admission divided by the number of patients discharges during the period

The net rate excludes patients that died within 48 hours of admission from the numerator.

Autopsy Rates

Gross autopsy rate – number of autopsies performed divided by the number of patients that died while in the hospital during a period

Net autopsy rate – number of autopsies performed divided by the number of bodies available for autopsy

The net rate excludes bodies that might be taken to the coroner for investigations from the denominator.

ahima.org

Example 3

If the number of inpatient service days for the first calendar quarter of 2015 was 9,015 and the facility had 120 beds available until closing 20 beds on March 1st, what was the inpatient bed occupancy rate for the quarter (round to the nearest 0.1)?

Step 1: Find the number of bed days.

Step 2: Divide inpatient service days by number of bed days:

9015/10180 = 0.886 or 88.6%

 Month Beds Available Days Bed Days January 120 31 3,720 February 120 28 3,360 March 100 31 3,100 Total 10,180

ahima.org

Example 4

AMC Hospital discharged 256 patients during November. Ten of those patients died and 2 died on the same day as they were admitted. What are the gross and net mortality rates for AMC Hospital (round to the nearest 0.1 of a percent)?

Gross mortality rate

= 12/256 = 0.047 = 4.7%

Net mortality rate

= (12 – 2)/256 = 0.039 = 3.9%

ahima.org

Example 5

AMC Hospital discharged 256 patients during November. Ten of those patients died and 2 died on the same day as they were admitted. Four bodies were autopsied. One body was transferred to the coroner’s office for a criminal investigation. What are the gross and net autopsy rates for AMC Hospital (round to the nearest 0.1 of a percent)?

Gross autopsy rate

= 4/12 = 0.333 = 33.3%

Net autopsy rate

= 4/(12-1) = 4/11 = 0.364 = 36.4%

ahima.org

Census/Utilization Rates

ahima.org

12

Population Health and Epidemiology Rates

Epidemiology – study of patterns in disease occurrence and spread

Incidence rate –

Number of new cases of a disease divided by the population at risk for acquiring the disease

Prevalence rate –

Number of cases of the disease (both new and existing) divided by the population at risk for acquiring the disease

Point prevalence – prevalence of a disease at a particular point in time

Period prevalence – prevalence of a disease during a time period (month, year, etc.)

ahima.org

Example 6

The Department of Health in Center City is interesting in determining the effectiveness of the flu vaccine. They determined that there were 100 new flu cases during the month of January. The population of Center City was 15,000 during that month. What is the incidence rate of flu for Center City in January?

Incidence rate = 6.7 per 1,000

ahima.org

Example 7

The officials in Center City wanted to further study the impact of the flu on the population. There were 54 residents with the flu on January 31st and 10 residents with the flu on January 1st. Use this data and the fact that there were 100 new cases during the month of January to determine the point prevalence for January 31st and the period prevalence for the month of January.

(Note that period prevalence includes anyone with the disease during the period. Since 10 residents were sick with he flu on Jan 1st, they are included in the numerator of the period prevalence for January).

ahima.org

Descriptive Statistics: Proportions

Each subject either has or does not have the attribute to be counted (dead/alive, success/failure, yes/no)

Recode each observation as a binary variable (two values):

If attribute is present = 1

If attribute is not present = 0

The mean of the 0s and 1s is the proportion of subjects with the attribute

Simple example:

What proportion of patients are female?

Patient genders: M, M, F, F, M

Recode F = 1; M = 0

Recoded gender data: 0, 0, 1, 1, 0

Mean = 2/5 = 0.4 or 40% of patients are female

ahima.org

Descriptive Statistics

Frequency distribution

Appropriate for both nominal and ordinal categorical data

Typically the counts and percentages for each category are presented

ahima.org

Charts or Graphs

Since this subset of CPT codes is ordinal, the bar chart is a better representation.

Pie charts are a good choice for nominal data.

ahima.org

Contingency Tables

Used to display and analyze the relationship between two categorical variables

Notice in table below:

20/32 = 62.5% of female patients were discharged home

10/24 = 41.7% of male patients were discharged home

Is this just a random occurrence or is this evidence that there is a significant relationship between gender and being discharged to home?

A hypothesis test may be used to answer that question

ahima.org

Ranks and Percentiles

Ranks and percentile may be used to describe ordinal data

Ranks – the position of a value after the sample is ordered using order of magnitude – usually ascending (increasing) order

Percentile

AKA percentile rank

Points that divide the sample into 100 equal parts

Important percentile ranks:

25th percentile

AKA first quartile

25% of the values in the sample are less than the 25th percentile

50th percentile

AKA median or second quartile

50% of the values in the sample are less than the 50th percentile (50% are also greater)

75th percentile

AKA 3rd quartile

75% of the values in the sample are less than the 75th percentile

ahima.org

Inferential Statistics
Hypothesis Testing Basics

Hypothesis test – statistical technique used to determine if the evidence (values) present in a random sample is strong enough to make a conclusion about the population

Null hypothesis (Ho) – status quo, requires no action

Example: Ho for Table 4.3 is that there is no relationship between gender and discharge to home

Alternative hypothesis (H1 or Ha) – complement of the null hypothesis, often referred to as the research hypothesis

Example: H1 for Table 4.3 is that there is a relationship between gender and discharge to home

Data is gathered from a random sample of a population to determine if the null hypotheses can be rejected

ahima.org

Hypothesis Testing Basics

Test statistic –

A statistic that is calculated to determine if the data values support

Must be compared to a known probability distribution to determine the making an error in deciding whether or not to reject the null hypothesis.

Type I error – incorrectly rejecting the null hypothesis when it is true

The alpha level or acceptable level of this error is set by the analyst prior to the start of the analysis of the data

The p-value is the smallest alpha level for which the null hypothesis would be rejected

If the p-value is smaller than the pre-set alpha level, then there is sufficient evidence to reject the null hypothesis

Type II error – incorrectly NOT rejecting the null hypothesis when it is false

this error may be controlled by the type of hypothesis test and the sample size used in the study

ahima.org

Hypothesis Testing Steps

Determine the null and alternative hypotheses

Set the acceptable type I error or alpha level

Select the appropriate test statistic

Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic

Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.

ahima.org

Inferential Statistics:
Proportions

Used to determine if a population proportion is higher or lower than a standard

May be interested in a one or two sided hypothesis

Two sided alternative: important to know if population of interest is higher or lower than standard

One sided alternative: only concerned about higher or lower, not both

Two sided

One sided

ahima.org

One-sample Z-test for proportions

Reject Ho

Reject Ho

Reject null hypothesis when Z is extreme

ahima.org

Example: One-sample Z-test for proportions

Follow 5 basic steps in hypothesis testing:

1. Determine the null and alternative hypotheses:

Null hypothesis (Ho): p = 85 percent or 0.85

Alternative hypothesis (Ha): p ≠ 0.85

2. Set the acceptable type I error or alpha level

The company leaders are willing to accept a five percent error rate. Alpha = 0.05

3. Select the appropriate test statistic

Z is the appropriate test statistic

ahima.org

Example: One-sample Z-test for proportions (continued)

4. Compare the test statistic to a critical value based on alpha and the distribution of the test statistic

5. Reject the null hypothesis if the calculated test statistic is more extreme than the critical value. If not, then do not reject the null hypothesis

Since Ha is a two sided alternative (≠) we select the critical value associated with the alpha level divided by two. We want to protect against both higher and lower alternatives. We reject Ho if Z > 1.960 or Z < −1.960. Z = −1.36 is not less than −1.960, therefore do not reject the null hypothesis

Conclusion: The observed 70% rate from the sample of 10 employees is not sufficient evidence to reject the null hypothesis

ahima.org

Example: One-sample Z-test for proportions (continued)

ahima.org

Confidence Interval for Proportions

Confidence interval: a range of values based on a sample that contains a population value with a set level of confidence.

Common in political surveys

President’s approval rating is 60% +/- 5%

AKA margin of error (+/-5%)

The width of a confidence interval is a function of the proportion value and the sample size

Widest confidence interval (large margin of error) when p = 50%

Larger sample side results in a narrower confidence interval for a proportion

1.96 for 95% confidence interval

ahima.org

Example: Confidence Interval for Proportions

Conclusion: Based on the sample, we are 95% sure that the range 42% and 98% covers the true vaccination rate.

ahima.org

Two-sample Z-test for Proportions

Used to determine if the proportion for a particular attribute is higher or lower when comparing two populations

Is the mortality rate as Hospital A higher or lower than that in Hospital B?

May be a one-sided or two-sided test depending the desire to determine which population is higher or lower

ahima.org

Two-sample Z-test for Proportions

If Z > standard normal critical value at α/2, or Z < -(standard normal critical value at α/2), then reject Ho

Difference in sample proportions divided by standard error

Overall proportion when two samples are pooled together

ahima.org

Example: Two-sample Z-test for Proportions

Is the mortality rate for MS-DRG 292 (p1) different from that in MS-DRG 293 (p2)?

1. Determine the hypotheses:

Ho: p1 = p2 ; Ha: p1 ≠ p2

2. Set the alpha level = 0.05

3. Select the appropriate test statistic: Z-test

ahima.org

Example: Two-sample Z-test for Proportions

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 5, Analyzing Continuous Variables

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Compare and contrast commonly used measures of central tendency

Compare and contrast commonly used measures of variation or spread of values

Illustrate appropriate inferential statistics to use for continuous data

ahima.org

Continuous Variables

Data elements that represent naturally numeric values that can take infinite values

Interval (no true zero)

Ratio

Healthcare Examples

Length of stay

Charge

Systolic blood pressure

Age

Time to code records

ahima.org

Descriptive Statistics
Measures of Central Tendency

Mean

Arithmetic average

Sum of values divided by the number of values

Median

Middle value

If even number of values, average of two middle values

Less influenced by extreme values or outliers than the mean

Mode

Most frequent value

ahima.org

Descriptive Statistics
Measures of Variation

Range

Maximum value minus minimum value

Interquartile range

Difference between the third and first quartile

Variance

Average squared deviation from the mean

Unit of measure is “squared units”

Standard deviation

Square root of the variance

Unit of measure is same as unit of measure in sample

ahima.org

Descriptive Statistics
Example

Calculate the mean, median and mode of the following sample length of stay data:

2, 4, 6, 3, 1, 2, 5

Mean:

Median

Sort values: 1, 2, 2, 3, 4, 5, 6

Median or middle value = 3

Mode

2, since it is the most frequent value

Note: The mode is rarely used for continuous variables that have many unique values and is presented here for demonstration purposes.

ahima.org

Descriptive Statistics
Example

Calculate the range, variance and standard deviation of the following sample length of stay data:

2, 4, 6, 3, 1, 2, 5

Range = 6 – 1 = 5

Sample variance

= 3.2

Standard deviation

s =

ahima.org

Review: Hypothesis Testing Steps

Determine the null and alternative hypotheses

Set the acceptable type I error or alpha level

Select the appropriate test statistic

Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic

Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.

ahima.org

Inferential Statistics
One Sample t-test

One-sample t-test

Used to test if a population value is different from a standard or benchmark

Test statistic:

Compare to a t-distribution to determine critical value

May be one sided or two sided

Anatomy of test statistic:

Numerator: distance from sample mean to null hypothesis value

Denominator: standard error of the sample mean (SEM)

ahima.org

Inferential Statistics
One Sample t-test – Example

Suppose the researcher that collected the length of stay (LOS) data in the previous examples would like to determine if the population LOS is longer than a standard of 3 days.

Step 1: Determine the null and alternative hypotheses

Ho: µ ≤ 3

Ha: µ > 3

Step 2: Set the acceptable type 1 error rate (AKA alpha level).

Set α = 0.05

Step 3: Select the appropriate test statistic: t-test

ahima.org

Inferential Statistics
One Sample t-test -Example

Step 3 (con’t)

Recall from previous slides:

s = 1.8

n = 7

=0.44

Step 4: Compare test statistic to critical value.

T-test statistic critical value comes from the t-distribution with n-1 degrees of freedom

T-distribution is symmetric around zero much like standard normal (bell curve); width is defined by the degrees of freedom. (see Figure 5.1 in text)

ahima.org

Inferential Statistics
One Sample t-test – Example

Step 4 (con’t): t= 0.44; df = n – 1 = 7 -1 = 6, one sided test at α=0.05, critical value = 1.943

Step 5: Reject the null hypothesis if the test statistic is more extreme than the critical value. 0.44 is not greater than 1.943, do not reject the null hypothesis and conclude that the LOS is not longer than the standard

ahima.org

Inferential Statistics
Confidence Interval for Population Mean

Recall that a confidence interval is a range of values that is likely to cover the true population value with a pre-defined probability or level of confidence

A (1-α)% confidence interval for the population mean is centered at the sample mean and has a width that is dependent on the confidence level and standard error of the mean

Higher level of confidence requires a wider interval

Large sample size results in a narrower interval

Width of confidence interval is a measure of the precision of the estimate of the sample mean

A narrower interval is more precise

ahima.org

Inferential Statistics
Confidence Interval for Population Mean

Formulate a 95% confidence interval for the LOS data presented in the previous example:

s = 1.8

n = 7

95% CI, so α = 0.05; α/2 = 0.025

df = 6

Critical value (table 5.1) = 2.447

95% CI:

1.7

(1.6,5.0)

We are 95% sure that the range 1.6 to 5.0 days includes the true population LOS is between

ahima.org

Inferential Statistics
Paired t-test

Paired t-test

Used to compare pre/post test population values or matched pairs

Test statistic:

Where d = difference between the pre/post values or the pairs

Compare to a t-distribution to determine critical value

May be one sided or two sided

Anatomy of test statistic:

Numerator: distance from sample mean difference to null hypothesis value (usually zero)

Denominator: standard error of the sample mean difference (SEM)

ahima.org

Inferential Statistics
Paired t-test – Example

The transition from ICD-9 to ICD-10 is predicted to cause an increase in the amount of time required to code medical records. A pilot study was conducted using a random sample of 10 records to determine if the time required was significantly different. Each record was coded using the two coding systems by on coder. The values are recorded in the table.

Step 1: Determine the null and alternative hypotheses:

Ho: D = 0

Ha: D ≠ 0

Step 2: Set the alpha level: 0.01

 ID ICD-9 Time ICD-10 Time d 1 10 15 5 2 11 12 1 3 15 10 -5 4 30 36 6 5 5 7 2 6 10 13 3 7 8 5 -3 8 11 19 8 9 21 19 -2 10 18 23 5

ahima.org

Inferential Statistics
Paired t-test – Example

Step 3: Select the appropriate test statistic:

Step 4: Compare the test statistic to the critical value

=1.49

Compare to t distn with df = 9, α/2 = 0.005

1.49 not > 3.25

Step 5: Do not reject Ho

 ID ICD-9 Time ICD-10 Time d 1 10 15 5 2 11 12 1 3 15 10 -5 4 30 36 6 5 5 7 2 6 10 13 3 7 8 5 -3 8 11 19 8 9 21 19 -2 10 18 23 5

ahima.org

Inferential Statistics
Two Sample t-test

Used to test if a two population means are different

Test statistic complex

Denominator is standard error pooled across the two samples

use statistical software to calculate

Compare to a t-distribution to determine critical value

May be one sided or two sided

Anatomy of test statistic:

Numerator: distance between the two sample means

Denominator: pooled standard error of the difference between the two sample means

ahima.org

Inferential Statistics
Two Sample t-test – Example

An analyst wanted to find out if the charges for CHF patients admitted through the emergency

department (ED) are different from those admitted through other sources. The data for the

sample may be found in the Chapter 5 Data file. The summary statistics from the samples

appear in the table below:

Step 1: State hypotheses:

Ho: µ1= µ2

Ho: µ1≠ µ2

Step 2: Set the alpha level = 0.01

Step 3: Determine the test statistic: T-test

ahima.org

Inferential Statistics
Two Sample t-test – Results from R

Step 4: Compared test statistic to critical value

t = 2.2363 with a p-value = 0.03189

Step 5: Reject null hypothesis if test statistic is more extreme than critical value or the p=value is less than alpha

Reject the null hypothesis (0.03 < 0.05) and conclude that patients that are admitted through the emergency department have longer lengths of stay

ahima.org

Inferential Statistics
ANOVA

Used to test if a more than two population means are different

Test statistic: F-test

Best to use software to compute

Compare to an F-distribution to determine critical value

Anatomy of test statistic:

Numerator: variance between comparison groups

Denominator: variance within comparison groups

ahima.org

Inferential Statistics
ANOVA

Sum of Squares

Degrees of Freedom

Mean

Squares

Test statistic: F

ahima.org

Inferential Statistics
ANOVA – Example

The Medicare severity-adjusted diagnosis-related group (MS-DRG) system is designed so that the level of resources as measured by charges per patient required to treat a patient are different within the no complication or comorbidity, complication or comorbidity (CC), or major complication or comorbidity (MCC) family. An analyst was asked to test to see if that relationship was true at her facility. A sample of 80 cases was selected for the three congestive heart failure MS-DRGs: 291 (MCC), 292 (CC), and 293 (no CC or MCC). Since three populations of patients are compared, the analyst used R to generate summary statistics and the ANOVA table below.

ahima.org

Inferential Statistics
ANOVA – Example

Step 1: State the hypotheses

Ho: µ291= µ292= µ293

At least two of the population means are unequal

Step 2: Set the acceptable error level: α=0.05

Step 3: Determine the appropriate test statistic: F-test

Step 4: Compare p-value to α=0.05

Step 5: Conclude to reject Ho since p < 0.0001 < 0.05

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 6, Analyzing the Relationship between Two Variables

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Illustrate appropriate inferential statistics to use for assessing the relationship between two categorical variables

Compare and contract sensitivity and specificity

Compare and contrast the two types of correlation statistics

ahima.org

Categorical Variables

Descriptive Statistics

Contingency tables

Used to display and analyze the relationship between two categorical variables

Notice in table below:

20/32 = 62.5% of female patients were discharged home

10/24 = 41.7% of male patients were discharged home

Inferential Statistics

Is this just a random occurrence or is this evidence that there is a significant relationship between gender and being discharged to home?

A hypothesis test may be used to answer that question

ahima.org

Example: Chi-squared Test of Independence

 Step Response 1. Determine the null and alternative hypotheses Ho: Discharged to Home and Gender are independent H1: Discharged to Home and Gender are not independent 2. Set the acceptable type I error or alpha level The analyst is willing to accept a 5% chance or probability of rejecting the null hypothesis when it is true. Alpha = 5% or 0.05 3. Select the appropriate test statistic Chi-squared

ahima.org

Example: Chi-squared Test of Independence

Test statistics typically compare the value observed in the sample to the null hypothesis value.

If gender and discharged home were independent, then we would expect the distribution of subjects among the four cells (Male/female x home/not home) to be uniform and not have a pattern.

In other words, the proportion of males sent home should be similar to the proportion of the females sent home if the null hypothesis were indeed true.

The basis of the chi-squared test statistic is the observed and expected frequencies in each of the table cells

ahima.org

Example: Chi-squared Test of Independence

ahima.org

Example: Chi-squared Test of Independence

Test statistic:

ahima.org

Example: Chi-squared Test of Independence

Last two steps in hypothesis test:

Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic

Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.

Chi-squared test statistic follows the Chi-squared distribution with (r-1)x(c-1) degrees of freedom. r = rows in contingency table and c = columns

Chi-squared distribution is always non-negative

Degrees of freedom define the shape

Since alpha was set to be 0.05 (5%), reject H0 if the test statistic is greater than 3.841

X2 = 2.39 which is not greater than 3.841

Do not reject H0

Conclusion: The sample data does not provide sufficient evidence to reject H0 and conclude that there is no significant relationship between gender and the likelihood being discharged to the home setting

ahima.org

Sensitivity and Specificity

Measures the accuracy of predictions made by categorical variables

When using one categorical variable (smoking status) to predict another categorical variable (cancer status)

Sensitivity – proportion of sample with the indicator present and a positive test divided by the number of those with an indicator present.

Specificity – the proportion of the sample without the indicator and a negative test divided by the number of those without an indicator

ahima.org

Sensitivity/Specificity Example

A health plan wishes to use accessing their patient portal as a predictor of whether or not a patient will seek care at an emergency room during the year. That is, they believe that patients that do not access the patient portal are more likely to experience an ER visit. They collected the following data based on enrollees during the previous plan year. Calculate the sensitivity and specificity of patient portal use as a predictor of ER use.

Note that the contingency table is set up so that ‘no’ for patient portal access and ‘yes’ for ER visit are in cell ‘A’ (upper left hand corner). This is because the health plan believes that patients that do not use the patient portal are MORE likely to experience an ER visit.

 ER Visit During Previous Year? Patient Portal Access? Yes No No 30 23 Yes 15 86

ahima.org

Sensitivity/Specificity Example

 ER Visit During Previous Year? Patient Portal Access? Yes No No A: 30 B: 23 Yes C: 15 D: 86

ahima.org

Descriptive Statistics – Correlation

Pearson’s correlation coefficient (r)

Measures the linear association between two continuous variables

Spearman’s Rho (r)

Measures the linear association between two ordinal variables or one ordinal and one continuous variable

Correlation between two variables does not imply causation – only that the two have a relationship or are ‘associated’

Be aware that correlation measures the linear association of two variables

They may be related in a non-linear way that may result in misleading values for the correlation coefficients

ahima.org

Descriptive Statistics –
Pearson’s Correlation Coefficient

Used for measuring the linear association between two continuous variables

Values from -1 to +1

Positive value means that both variables increase/decrease together

Example: Charges and length of stay

Negative value means that one variable increases as the other decreases

Example: Experience and time to code a medical record

ahima.org

Descriptive Statistics –
Pearson’s Correlation Coefficient

Example of negative correlation

More experienced coders require less time to code records – in general

ahima.org

Descriptive Statistics –
Pearson’s Correlation Coefficient

Example of positive correlation

Longer lengths of stay result in longer charges – in general

ahima.org

Descriptive Statistics –
Pearson’s Correlation Coefficient Example

ahima.org

Descriptive Statistics –
Spearman’s Rho Correlation Coefficient

Used for measuring the linear association between two ordinal variables or an ordinal and continuous variable

Operates on the ranks for the paired values and not the actual variable values

Typically rank ties are broken with average ranks

Values from -1 to +1

Positive value means that both variables increase/decrease together

Example: patient severity level and charges

Negative value means that one variable increases as the other decreases

Example: Grade in elementary school and time to run 100 yards

Same formula a Pearson’s r, but use ranks instead of actual values

If there are no ties in the ranks, may use (Where Di is the difference between the ranks of the ith pair of variables and n is the sample size):

ahima.org

Inferential Statistics –
T-test for correlations

Used to test the null hypothesis that the correlation coefficient is zero

Same formula for both Pearson’s and Spearman’s correlation coefficients

Note that the sample size in is the numerator of the test statistic

For very large samples, the test may reject the hypothesis of 0 correlation when the value of the sample correlation is not practically significant

ahima.org

Inferential Statistics –
T-test for correlations – Example

Test the hypothesis that the correlation between length of stay and charges in the previous example if different from zero.

Step 1: State the null and alternative hypotheses

Ho: r ≤ 0

Ha: r > 0

Note: In practice, a one sided test of significance is used for r. If the sample value is > 0, then the alternative hypothesis is ‘>0’. If the sample value is negative, then the alternative hypothesis is ‘<0’.

Step 2: Set the acceptable alpha level = 0.05

ahima.org

Inferential Statistics –
T-test for correlations – Example

Step 3: Determine the test statistic and calculate the value

T-test for correlations

= 0.93

Step 4: Compare the test statistic to the critical value

Use t-distribution with d.f. = n-2 = 3 and

alpha = 0.05 is 2.353

t= 4.71 > 2.353,

Step 5: Reject the null hypothesis since 4.71 > 2.353 and conclude that the correlation between LOS and charge is not zero

ahima.org

Inferential Statistics
Simple Linear Regression

Used to formulate a functional relationship between two continuous variables

A linear function of the independent variable (X) is estimated to predict values of the dependent variable (Y)

Slope-intercept form of a line:

Y = a + bX

a is the y-intercept

b is the slope of the line

If variables are positively correlated, the slope of the line is positive

If variables are negatively correlated, the slope of the line is negative

ahima.org

Inferential Statistics
Simple Linear Regression – Example

Least squares regression

Minimizes the vertical distance from each point to line

Vertical distance called the ‘error’ or ‘residual’

Least square line provides a line that comes as close as possible to all points, but may not actually intersect with any of them

ahima.org

Inferential Statistics
Simple Linear Regression – Example

Slope of line is 4,443

Interpretation: The expected charge increase for each additional day is \$4,443

Intercept of line is \$7,801

Interpretation: The expected charge with a zero day stay is \$7,801

Zero stay is not realistic, but intercept gives an estimate of the fixed cost of admitting a patient while the slope represents the variable cost.

ahima.org

Regression Hypothesis Tests

Two hypothesis tests are presented in this table

Ho: Intercept = 0 vs H1: Intercept ≠ 0

P-value = 0.121 > do not reject

Even though the intercept is not statistically different from zero (do not reject the null hypothesis that it is equal to zero), the intercept is typically kept in the model

Ho: Slope = 0 vs H1: Slope ≠ 0

P-value = 0.021 > reject Ho and conclude that the slope is not equal to zero

The interpretation here is that LOS gives us useful information about the charge since the slope of the regression line is non-zero

ahima.org

Regression Assumptions

Residuals

Difference between the actual value of the dependent variable and the value predicted using the regression equation

The vertical (y-axis) distance from an individual point to the regression line

Must test the following assumptions regarding the residuals:

Independence

Normally distributed

Mean of zero

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 7, Study Design and Sample Selection

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Compare and contrast types of studies

Compare and contrast the four most common sampling techniques

Determine the appropriate sample size for attribute and variable studies

ahima.org

Types of Studies – Descriptive

Descriptive studies – performed to generate hypotheses for more formal studies

Cross-sectional study – describes the characteristics of a population at a specific point in time

Often used for prevalence studies

Applied descriptive studies

Data mining

Exploratory data analysis

ahima.org

Types of Studies – Analytic

Analytic studies – more formal studies designed to test a specific hypotheses

Case-control study – involves both a case group (subjects with the attribute under investigation) and a control group (those without the attribute)

Members of the case and control groups are often matched based on demographics

Typically a retrospective study

May not be used to determine cause and effect; can calculate odds ratio

Weakness – dependent of subject’s ability to recall events

Cohort studies – involves case and control group, but groups are identified before the study is performed

Prospective study

May not be used to determine cause and effect; can calculate relative risk

May take a long time to complete

Not useful if the attribute studied is rare

ahima.org

Types of Studies – Experimental

Allow the determination of a cause and effect relationship between variables

Randomized Control Trials (RCT)

Used to determine the effectiveness of new drugs/treatment protocols

Blinded studies

Single blind – subject does not know if they are assigned to the case or control group

Double blind – neither subject nor the researcher know if they are assigned to the case or control group

Triple blind- subject, researcher and analytics are all blinded as to the group assignment of the subject

ahima.org

Why select a sample?

Often population is too large to collect data from every unit of analysis or subject

Statistical inference is used to make conclusions about a population based on a sample

Vocabulary:

Population or universe – all subjects that are under study and eligible to be sampled

Sample – selected subset of the population

Sampling frame – A listing of all of the subjects in the population

Variable of interest – Quantity to be estimated (denial rate, coding error rate, overpayment, underpayment, etc)

ahima.org

Statistically Valid Sample

Large enough to provide information with sufficient precision to meet the goals of the analysis

Probability sample where each item has an equal chance of being selected

Must be reproducible

ahima.org

Defining the Variable of Interest

What is the percent of lab orders that are not signed by a physician during 2012?

Universe – all lab orders during 2012

What is the amount over/under paid due to incorrect E/M level assignment during January?

Universe –

E/M services billed during January

E/M services provided during January

Must refine question to determine if billed date or service date should be used for defining the universe

What is the coding accuracy rate for secondary diagnosis codes on inpatient accounts during the first quarter?

Universe –

All secondary diagnoses coded during first quarter

All inpatient accounts during first quarter

Must refine question to determine if diagnosis codes or charts are the unit of analysis

ahima.org

Simple Random Sampling

It is the statistical equivalent of drawing sampling units from a hat.

Each sampling unit (claim, chart, etc.) must have the same probability of selection.

Note that some random number generators will allow the user to set a ‘seed’. If that feature is available, the analyst should always set a seed. This will ensure that the sample can be replicated.

A simple random sample is not appropriate if the frame cannot be listed or if it is important that the sample contain particular (rare) subsets of the population.

ahima.org

Random Number Generators

All random number generators are based on mathematical functions that need a ‘seed’ or starting point

The use of a seed ensures that two independent samples drawn using the same software will result in the same series of random numbers and reproducible sample

Using software to generate random numbers:

RAND() in Excel does not allow a seed

Random Number Generation in R does allow a seed

ahima.org

Simple Random Sampling
Steps

Method 1:

The members of the sampling frame should be assigned a random number between 0 and 1

The frame may then be sorted by the random number

The first ‘n’ will be the simple random sample of size ‘n’

Method 2:

Assign a sequence number from 1 to ‘n’ to each member of the sampling frame

Use a random number generator (e.g., R) to select random numbers from 1 to ‘N’ (N is the population size)

ahima.org

Systematic Random Sampling

A systematic random sample is a simple random sample that is selected using a particular technique. If the population includes ‘N’ members and we wish to draw as sample of size ‘n’, then a systemic random sample could be selected by choosing every N/nth member of the population as the sample.

The selection should start at random from a member between the 1st and N/nth member.

NOTE: If N/n is not a whole number, then round down to the next lower whole number to determine the sampling interval.

In order to ensure that a systematic random sample is truly random, the population should not be sorted in an order that might bias the sample.

ahima.org

Stratified Random Sampling

Population is divided into unique subsets or strata

Strata should be mutually exclusive and exhaustive. In other words, each of the members of the population should be in one and only one stratum.

A simple random sample is then selected from each of the strata

The size of the sample in each strata may be equal or may be assigned proportionally according to the relative size of each strata

Stratified sampling is appropriate when the quantity to be estimated may vary among natural subgroups (strata) of the population

Typical strata in healthcare may be:

CPT® Code (E/M levels)

Physician

Specialty

Clinic

ahima.org

Stratified Random Sampling
Example

Example: An analyst wishes to select a stratified random sample of 90 from a population of 1,000 E/M visits. The distribution of E/M visits in the population is:

Level 1: 55

Level 2: 183

Level 3: 236

Level 4: 309

Level 5: 217

ahima.org

Stratified Random Sampling
Example

Example: An analyst wishes to select a stratified random sample of 90 from a population of 1,000 E/M visits. The distribution of E/M visits in the population is:

 Level Population Count (N) % of Population Sample Size (n) 1 55 2 183 3 236 4 309 5 217 Totals 1,000 100% 90

ahima.org

Stratified Random Sampling
Example

Example: An analyst wishes to select a stratified random sample of 90 from a population of 1,000 E/M visits. The distribution of E/M visits in the population is:

ahima.org

Cluster Sampling

The population is divided into subsets much like the strata in stratified sampling

Clusters should be mutually exclusive and exhaustive

All members of each cluster are selected to be a part of the sample

Clusters are selected at random

Cluster sampling is appropriate when it is difficult to access all of the population

ahima.org

Cluster Sampling
Example

The director of the emergency department would like to audit the accuracy of charge capture for the first quarter of 2020. Unfortunately, she is not able to obtain a full listing of the patients that pass through the ED for a sampling frame. Instead, a cluster sample will be drawn using date of service as the cluster. Select 10 dates via simple random sampling to produce a cluster sample.

ahima.org

Non-probability Sampling

Random sample not required if:

Study is exploratory or a focused review

Example: If we wish to determine educational opportunities for improving documentation, we may sample accounts with few secondary diagnoses to determine if there is a pattern in the types of diagnosis codes most likely to be missed

Typically, this sample is driven by some exploratory data analysis or data mining to help ‘steer’ the sample to subjects most likely to have the issue of interest

ahima.org

Non-probability Sampling

Convenience sampling

Example – sample first ‘n’ customers that enter the hospital cafeteria

Judgment sampling

Use exploratory data analysis based on experience or history

AKA focused review

Example – Know from history that the customer satisfaction in cafeteria is lowest at lunch time because of long lines. Select sample at that time to try to improve process.

Quota sampling

Subjects divided into groups

Judgment sample used within each group

Example – may select first 10 male and 10 female customers to cafeteria

ahima.org

Sample Size – Attribute Studies

Attribute studies are designed to measure a proportion or rate

Examples of attribute studies:

Claim denial rate

Correct coding rate

Sample size is dependent on:

The expected proportion

Based on a small pilot study

Set to 0.5 for largest sample

Resources available to perform the study

OIG current recommendation for a pilot study is 30

ahima.org

Attribute Study Sample Size Example

ahima.org

Sample Size – Variable Studies

Variables studies are designed to measure a ratio quantity

Examples of attribute studies:

Length of stay

Charge amounts

Lab values

Sample size is dependent on:

Standard deviation of the quantity to be measured

Based on a small pilot study

Resources available to perform the study

OIG current recommendation for a pilot study is 30

ahima.org

Variable Study Sample Size Example

ahima.org

Sample Size and Precision

In both types of studies, attribute or variable, a higher level of precision requires a larger sample size

A higher level of precision is equivalent to requiring a narrower confidence interval for a set confidence level

Note that increasing ‘n’ in both the proportion and mean confidence interval formulas results in narrower intervals (all other variables held constant)

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 8, Exploratory Data Applications

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Illustrate case mix index analysis techniques

Compare and contrast case mix measurement in outpatient vs inpatient setting

Explore relative value unit analysis

ahima.org

Exploratory Data Analysis

AKA Data Mining

Using statistical techniques to find patterns in data

Typically, a mixture of graphical displays and descriptive statistics

Many practical applications in improving healthcare operations

HIM professionals are uniquely positioned to perform this analysis because they understand the data and the underlying operational and reimbursement implications of patterns

ahima.org

Case Mix Analysis

Case Mix Index (CMI) – average MS-DRG weight for all patients

May be calculated for subsets of patients such as Medicare/Medicaid/selected MS-DRGs

May exclude portions such as transplants (very high weight MS-DRGs) or transfers (reduced payment and short stays)

Single number that may be used as a proxy for measuring the resource intensity of a hospital’s patients

Medicare CMI is the primary driver of the inpatient Medicare revenue

Frequently a key performance indicator for a hospital and a key driver of the revenue budget

ahima.org

Case Mix Index
Example

Multiply the number of cases in each MS-DRG by the relative weight

Sum the values from #1

Sum the number of discharges

Divide total relative weights by the number of discharges

Note: This is the weighted average of the relative weights for each MS-DRG.

ahima.org

MS-DRG Families

MS-DRGs may be broken into families with two or three members:

No CC

CC (not present in all families)

MCC

The MS-DRG weight system is designed to assign higher weights to MS-DRGs that require a higher resource intensity

MCC MS-DRGs are assigned higher weights than no CC MS-DRGs in the same family

COPD

Pacemaker Replacement

ahima.org

CC/MCC Capture Rates

Example:

This value can be compared to HCUP data using a z-test for proportions to determine if the sample rate is higher/lower than the national rate

In general, hospitals with higher CC/MCC capture rates have higher CMI

A unusually high CC/MCC capture rate may be indicative of a compliance issue (over-coding) and should also be investigated

ahima.org

CMI Shifts

Significant shifts in CMI should be investigated to determine the root cause

Potential causes:

New service lines

Surgeon vacation schedules

Holidays

ahima.org

Other DRGs Systems

AP-DRGs

All-patient DRGs

AKA “New York Grouper”

Three character, numeric

Weights are calibrated for all patients and not only Medicare

APR-DRGs

All patient refined DRGs

3M proprietary grouping system

3 character, numeric followed by digit (1-4) for severity and (1-4) for risk of mortality

ahima.org

Ambulatory Patient Classifications (APC)

CMS uses APCs to pay for services in the hospital outpatient and ambulatory surgery settings.

Challenges of APCs

Claim may have more than one payable APC

Assignment of CPT/HCPCS codes to APCs may change each year

More of a fee schedule than a true prospective payment system

Can use APC weights to calculate a service mix index (SMI)

Note that this measures the average resource intensity for the services provided and not for the typical case

ahima.org

Methods of Analysis

Validation of utilization patterns

Specialty specific codes

Comparison to hospitals with like service mix (trauma center, transplants, etc.)

RVU Analysis

Work RVUs may be used to benchmark physician productivity

Part of the CMS Physician Fee Schedule

ahima.org

RVU – Physician Productivity

Dr. Kana billed the lowest number of wRVUs during July

When productivity is adjusted for the fact that Dr. Kana is a 40% FTE (cFTE = 0.4), she is actually the second most productive physician

ahima.org

RVU – Other Uses

Average cost per RVU

Physician compensation per work RVU (wRVU)

Malpractice expense per Malpractice RVU (mRVU)

Overhead or practice expense per Practice Expense RVU (peRVU)

Break-Even Conversion Factor (BECF)

ahima.org

## image2.jpg

A Practical Approach to Analyzing Healthcare Data, Fourth Edition
Chapter 9, Benchmarking and Analyzing Externally Reported Data

Susan White, PhD, RHIA, CHDA

ahima.org

ahima.org

Learning Objectives

Explain the types of benchmarking

Discuss healthcare report cards

ahima.org

Types of Benchmarking

Benchmarking – comparing performance to a standard

Internal benchmarking – comparison to internal goals or year-over-year

External benchmarking – comparison to external norms or competitors

Benefits

Identify strong or weak areas

Part of quality improvement culture

ahima.org

Benchmarking Steps

1. Identify the issue to benchmark

2. Locate internal data related to the issue

3. Analyze internal data

4. Identify external data available for benchmarking

5. Collect public domain data or purchase data, if appropriate

6. Compare internal and external data

7. Determine whether a performance gap exists

8. Communicate benchmarking findings

9. Establish performance-level targets and action plans for achievement

10. Implement plans; monitor and communicate progress

11. Recalibrate benchmarks as necessary

12. Repeat the process

ahima.org

Hospital Value Based Purchasing Programs (HVBP)

CMS HVBP is example of a formal benchmarking program

HVBP includes four domains

Process of care

Outcomes

Patient experience

Efficiency of care

Generates Total Performance Score (TPS) that is used to determine an incentive payment added to Medicare inpatient payments for participating hospitals

ahima.org

Dashboards and Scorecards

Method to represent performance in terms of key performance indicators (KPI)

Guide management decisions

Include a combination of indicators measured on a ‘per unit’ basis for comparability across time

Categories may include:

Clinical

Operational

Financial

ahima.org

Example Dashboard – Medicare Spending

ahima.org

Example Dashboard – Medicare Chronic Conditions

ahima.org

National Quality Forum (NQF)

Provides a framework for endorsing healthcare quality measures by:

Convenes working groups to foster quality improvement in both public- and private-sectors;

Endorses consensus standards for performance measurement;

Ensures that consistent, high-quality performance information is publicly available; and

Seeks real time feedback to ensure measures are meaningful and accurate.

Endorsement of a quality measure requires the following steps:

Measure is proposed and supported with scientific evidence

Validity and reliability of the measure is established

Feasibility is tested typically via pilot testing; includes cost and potential administrative burden for data collection

Usability is assessed; does the measure provide enough feedback so that users can improve performance

Assessment of related or competing measures

ahima.org

Medicare Quality Measures

Data.medicare.gov

Hospital Compare

Nursing Home Compare

Physician Compare

Home Health Compare

Dialysis Facility Compare

Data provided in online query and comparison format as well as a bulk download of national statistics

ahima.org

Hospital Compare
Example

ahima.org

Quality measurement should include an adjustment for the risk of an adverse outcome

Age/gender

Comorbidities

Teaching status

Location (urban/rural)

Socio-economic attributes of patient mix

Payer mix

Used to compare actual performance to expected performance based on the risk factors

SIR – standardized infection rate (observed infection rate divided by the expected infection rate)

SMR – standardized mortality rate

For all standardized rates, a value of greater than one is interpreted that a facility’s rate is higher than expected given the risk attributed to their patient mix

ahima.org

CMS Risk Adjustment – CLABSI in ICU

CLABSI = central line-associated bloodstream infections

Observed and expected infection rates are calculated for each hospital

The graph depicts the SIR or observed to expected rate (O/E) for each hospital

O/E = 1.0 means that the hospital’s infection rate is equal to that expected after risk adjustment

The dark shaded areas represent the 95% confidence interval for the O/E

ahima.org