Characterizations

Exercise

Use the acute myocardial infraction cohort to do some basic characterizations.

Packages

HADES packages used

  • Eunomia for connecting to a synthetic OMOP CDM database.
  • DatabaseConnector for connecting to our Eunomia dataset and executing queries.
  • FeatureExtraction for generating covariates for our cohort and getting some aggregated stats.

Installation

# Needed to install some HADES packages
install.packages("remotes")

# HADES packages
remotes::install_github("ohdsi/FeatureExtraction")
install.packages("DatabaseConnector")
install.packages("Eunomia")

Create AMI Cohort

For this tutorial, we will use the same cohort and Eunomia GiBleed dataset from the cohort generation tutorial. We will generate it real quick here, but see that tutorial for an explanation.

# Connect to DB
datasetName = "GiBleed"
dbms = "sqlite"

datasetLocation <- Eunomia::getDatabaseFile(
  datasetName = datasetName, 
  dbms = dbms, 
  databaseFile = tempfile(fileext = ".sqlite")
)
connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = dbms, server = datasetLocation)
connection = DatabaseConnector::connect(connectionDetails = connectionDetails)
DatabaseConnector::getTableNames(connection, databaseSchema = 'main')
source("util.R")
insert_visits(connection)

# Create Cohort for AMI

# Create cohort entry event temp table using AMI concept set
myocardial_infraction = 4329847 # Myocardial infarction
old_myocardial_infraction = 314666 #  Old myocardial infarction
cdm = "main"
sql <- "
drop table if exists #diagnosis;

SELECT 
  person_id AS subject_id,
  condition_start_date AS cohort_start_date
INTO #diagnoses
FROM @cdm.condition_occurrence
WHERE condition_concept_id IN (
    SELECT descendant_concept_id
    FROM @cdm.concept_ancestor
    WHERE ancestor_concept_id IN(@myocardial_infraction)
)
  AND condition_concept_id NOT IN (
    SELECT descendant_concept_id
    FROM @cdm.concept_ancestor
    WHERE ancestor_concept_id IN(@old_myocardial_infraction)
);
"
DatabaseConnector::renderTranslateExecuteSql(
  connection = connection, 
  sql = sql, 
  cdm = cdm, 
  myocardial_infraction = myocardial_infraction,
  old_myocardial_infraction = old_myocardial_infraction
)

# Create additional inclusion criteria to only include inpatients or ER
visit_concepts_to_include <- c(9201, 9203, 262) # inpatient or ER concept set
sql <- "
INSERT INTO @cdm.cohort (
  subject_id,
  cohort_start_date,
  cohort_definition_id
  )
SELECT 
  subject_id,
  cohort_start_date,
  CAST (1 AS INT) AS cohort_definition_id
FROM #diagnoses
INNER JOIN @cdm.visit_occurrence v
  ON subject_id = person_id
    AND cohort_start_date >= v.visit_start_date
    AND cohort_start_date <= v.visit_end_date
WHERE visit_concept_id IN (@visit_concepts_to_include);"
DatabaseConnector::renderTranslateExecuteSql(
  connection = connection, 
  sql = sql, 
  cdm = cdm,
  visit_concepts_to_include = visit_concepts_to_include
)

View Subjects

# View results
sql <- "
  select * from @cdm.cohort;
"
DatabaseConnector::renderTranslateQuerySql(
  connection = connection,
  sql = sql,
  cdm = cdm
)

Get Aggregated Covariate Data

Specify Covariate Settings

We will just specify some demographic features.

Note

There is a large set of covariate settings and all are false by default, so you must “opt in” to the covariates you want to extract.

You can view the covariate settings by executing this from the RStudio console

?FeatureExtraction::createCovariateSettings
settings <- FeatureExtraction::createCovariateSettings(
  useDemographicsGender = T,
  useDemographicsAge = T,
  useDemographicsAgeGroup = T,
  useDemographicsRace = T,
  useDemographicsEthnicity = T,
  useDemographicsIndexYear = T,
  useDemographicsIndexMonth = T,
  useDemographicsPriorObservationTime = T,
  useDemographicsPostObservationTime = T,
  useDemographicsTimeInCohort = T,
  useDemographicsIndexYearMonth = T
)

Get Covariate Data

Extract those features across the cohort. Aggregated = T means we are considering the cohort as a whole, not on a per-patient basis.

covariateData <- FeatureExtraction::getDbCovariateData(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  cohortDatabaseSchema = "main",
  cohortTable = "cohort",
  cohortIds = c(1), # We gave our cohort Id a value of 1 when we inserted into the cohort table
  covariateSettings = settings,
  aggregated = TRUE
)
str(covariateData) # An S4 object

Discrete Covariates

Well, we have the covariates here and some aggregations, but we need to map the covariateId to an actual description

covariateData$covariates

Continuous Covariates

Same thing for the continuous covariates, we have the covariateId, not the actual description…

covariateData$covariatesContinuous

Merging with covariateRef

Here is the mapping of covariateId to covariateName

covariateData$covariateRef

Lets merge the covariates to the covariateRef to get the results along with the covariateName

merged <- merge(covariateData$covariates, covariateData$covariateRef, by = "covariateId")
merged

Get Table 1

Here is how you get “table 1”, frequently used in papers.

result <- FeatureExtraction::createTable1(covariateData1 = covariateData)
print(result, row.names = F, right = T)
 Characteristic % (n = 5) Characteristic % (n = 5)
      Age group                 50 -  54        20
       10 -  14        20       70 -  74        20
       40 -  44        40