[SQL] Patient Symptom Severity
tasked with adding symptom severity to patient data

Tool: PostgresSQL
Info Patient data contains a list of symptoms which are manually input into a system. Ordering is unspecific, and data separates symptoms using | delimiter. Symptoms data contains the symptoms and their severity (1-5).
Problem: We want to identify patients by severity. To do this there needs to be a new column for Patients which identifies the most severe symptom from their list of symptoms, and assigns the paired number from Symptoms to that the patient.
Tables:
patient table symptom table


Setup

DROP TABLE IF EXISTS patients;
CREATE TABLE IF NOT EXISTS patient_info (
	first_name CHAR(10) NOT NULL,
	last_name CHAR(10) NOT NULL,
	email VARCHAR(25) NOT NULL,
	gender CHAR(10) NOT NULL,
	symptoms VARCHAR NOT NULL
);

INSERT INTO patients VALUES
    ('Shaine','Richford','srichford0@deviantart.com','Female','Sore Throat | High Fever'),
    ('Michael','Bidgood','mbidgood1@telegraph.co.uk','Male','Vomiting | Breathing | Sore Throat'),
    ('Elmo','Siddens','esiddens2@patch.com','Male','Breathing | High Fever | Weak Pulse'),
    ('Geneva','Aucourte','gaucourte3@dot.gov','Female','Vomiting | Breathing'),
    ('Kristy','Tabert','ktabert4@deviantart.com','Polygender','Sore Throat'),
    ('Cordie','Davidowich','cdavidowich5@desdev.cn','Female','Vomiting | High Fever'),
    ('Vaughan','Soans','vsoans6@princeton.edu','Male','High Fever | Sore Throat | Breathing | Weak Pulse | Vomiting'),
    ('Kelsi','Dingley','kdingley7@columbia.edu','Bigender','Sore Throat | Vomiting'),
    ('Gus','Franzen','gfranzen8@yellowbook.com','Male','High Fever');

SELECT * FROM patients;

DROP TABLE IF EXISTS symptoms;
CREATE TABLE IF NOT EXISTS symptoms (
	symptom CHAR(11) NOT NULL,
	severity SMALLINT NOT NULL
);

INSERT INTO symptoms VALUES
    ('Sore Throat',1),
    ('Vomiting',2),
    ('High Fever',3),
    ('Breathing',4),
    ('Weak Pulse',5);

SELECT * FROM symptoms;

Brainstorming

  • In order to pair a severity value to the symptom I need all the patient symptoms split.
  • I want a unique identifier tied to each patient to ensure I can always join back data to the appropriate patient.
  • Splitting symptoms will duplicate all the other columns for each patient. Dropping or grouping will be needed for cleanup.
  • After splitting symptoms, I can join together the two tables on symptoms to get severity per symptom.
  • Following the join I want a table that contains only the highest severity per patient.
  • Join this new table to the original patient table to get a final table that contains all the patients and their symptoms with the highest severity.

Testing (using views)

Even though I want to split my data, I need to make sure to setup the patient table with a unique identifier since this data does not contain unique IDs.

ALTER TABLE patients ADD COLUMN id SERIAL PRIMARY KEY;
SELECT id, symptoms FROM patients;

Next will be splitting the patient symptoms to get each symptom for the each patient.

DROP VIEW IF EXISTS split_symptoms;
CREATE TEMP VIEW split_symptoms AS
SELECT
    id, -- don't need extra patient info
    symptoms, -- to view original value
	unnest(string_to_array(symptoms, ' | ')) AS symptoms_split
FROM patients;
SELECT * FROM split_symptoms;

Now I want to join the data together to get severity per symptom tied to each patient.

DROP VIEW IF EXISTS merged_patient_symptoms;
CREATE TEMP VIEW merged_patient_symptoms AS
SELECT *
FROM split_symptoms as ss
LEFT JOIN symptoms AS smt
ON (ss.symptoms_split = smt.symptom)
ORDER BY id ASC;
SELECT * FROM merged_patient_symptoms;

To get the highest value for severity from this merged table, I want to use MAX() function on the severity. However, to use MAX() aggregation I need to pair this with groupby. Since the goal is to pair highest severity to patients, I can drop all other columns that aren’t needed.

DROP VIEW IF EXISTS id_maxseverity;
CREATE TEMP VIEW id_maxseverity AS
SELECT
    id, 
    max(severity) AS severity_rank
FROM merged_patient_symptoms
GROUP BY id;
SELECT * FROM id_maxseverity;

Last step would be to join together the two tables to get severity per patient.

DROP VIEW IF EXISTS final_patient_table;
CREATE TEMP VIEW final_patient_table AS
SELECT
    pt.id
    pt.first_name,
    pt.last_name,
    pt.email,
    pt.gender,
    pt.symptoms,
    ms.severity_rank AS severity
FROM id_maxseverity AS ms
INNER JOIN patients AS pt
ON (pt.id = ms.id);
ORDER BY id ASC;
SELECT * FROM final_patient_table;

patient severity

Final Query

The temporary views are good for testing purposes, but not good for producing a real solution. This is what my final query looks like now that I know the steps I need to take to produce what I need.

Note: I want to use a CTE for this query because I want a table to reference during my query without adding any useless tables to my database. I could do a temporary view but I want the table to only exist for the duration of the query itself.

ALTER TABLE patients ADD COLUMN id SERIAL PRIMARY KEY;
SELECT id, condition FROM patients;

DROP TABLE IF EXISTS patients_final;
WITH merged_patient_symptoms (id, symptoms, symptom_split, symptom, severity)
    AS (
        SELECT *
        FROM (
            SELECT 
                id, 
                symptoms,
                unnest(string_to_array(symptoms, ' | ')) AS symptom_split
            FROM patients
            ) as ss -- split symptoms table
        LEFT JOIN symptoms AS smt
        ON (ss.symptom_split = smt.symptom)
    ) -- CTE for merged table
    SELECT
    pt.id
        pt.first_name,
        pt.last_name,
        pt.email,
        pt.gender,
        pt.symptoms,
        ms.severity_rank AS severity
    INTO patients_final
    FROM (
        SELECT
            id, 
            max(severity) AS severity_rank
        FROM merged_patient_symptoms
        GROUP BY id
    ) AS ms -- max severity table
    INNER JOIN patients AS pt
    ON (pt.id = ms.id)
    ORDER BY id ASC;
SELECT * FROM patients_final;

Last modified on 2024-07-12

Comments Disabled.