# Differentially Private Machine Learning: Implementation and Analysis of Gradient and Dataset Perturbation Techniques: Code Repository

This repository contains the source code for the empirical evaluation from the thesis, "Differentially Private Machine Learning: Implementation and Analysis of Gradient and Dataset Perturbation Techniques."

The project implements and performs a comparative analysis of two primary techniques for applying Differential Privacy (DP) to machine learning: Dataset Perturbation and Gradient Perturbation (DP-SGD). The experiments are conducted on a real-world, sensitive medical dataset derived from the MIMIC-IV database.

---

## Directory Structure

The code is organized in a sequential workflow:

-   `├── data/`: An empty directory that will store the processed dataset (`variables_final.csv`) after running the generation script.
-   `├── results/`: An empty directory that will store the raw numerical results from each experiment.
-   `├── dataset_generation/`: Contains the notebook (`variables_preprocess.ipynb`) to process the raw MIMIC-IV data and generate the dataset.
-   `├── dataset_perturbation/`: Contains the script to run the Dataset Perturbation experiment.
-   `├── gradient_perturbation/`: Contains the script to run the Gradient Perturbation (DP-SGD) experiment.
-   `├── plotting_results/`: Contains the plotting scripts
-   `├── environment.yml`: Conda environment file to ensure the reproducibility of the software environment.
-   `└── README.md`: This documentation file.

---

## Requirements & Installation

### 1. Prerequisites

-   **Access to the MIMIC-IV database.**
    **Important Note:** Due to the Data Use Agreement (DUA), the dataset is not and cannot be included in this repository. The script in `_dataset_generation` requires local access to the database to function correctly.
-   A working installation of `conda`.

### 2. Environment Setup

To install all necessary dependencies, create and activate the Conda environment using the provided file. Open a terminal and run the following commands:

```bash
# 1. Create the environment from the .yml file
conda env create -f environment.yml

# 2. Activate the environment
conda activate tfg_39_env
```

### 3. Execution workflow

1.  **Generate the Dataset**
    -   Navigate to the `dataset_generation/` directory and run the `variables_preprocess.ipynb` notebook.

2.  **Run the Privacy Experiments**
    -   Run the script inside the `dataset_perturbation/` directory.
    -   Run the scripts inside the `gradient_perturbation/` directory.

3.  **Generate Final Figures and Tables**
    -   Run the scripts inside the `plotting_results/` directory.

---

## Author

* **Juan Pablo Mantilla Carreño**