Models

Training and testing SVM, Neural Net, and XGBoost on the extracted embeddings.

class models.AdvancedNeuralNetwork(input_size, hidden_sizes, num_classes=2, dropout_rate=0.5)[source]

Bases: Module

Defines an advanced neural network architecture with Batch Normalization and Dropout.

Defines the forward pass of the neural network.

This function passes the input x through the sequential layers of the network. The layers consist of fully connected layers, Batch Normalization, ReLU activation, and Dropout (for regularization). The final output is generated by the last fully connected layer.

Parameters:: x (torch.Tensor) – The input tensor that will be passed through the network.
Returns:: The output of the neural network after passing through all layers.
Return type:: torch.Tensor

training: bool

class models.FusedDataset(features, labels)[source]

Bases: Dataset

Custom Dataset for handling fused features and labels.

This class is used to create a PyTorch Dataset that combines features and labels, allowing for easy handling and loading of the data during model training or evaluation. The dataset stores the input features and labels as tensors, which are then accessed by indexing.

Parameters:

features (numpy.ndarray or torch.Tensor) – A 2D array or tensor containing the input features for each sample.
labels (numpy.ndarray or torch.Tensor) – A 1D array or tensor containing the labels for each sample.

X

A tensor containing the input features.

Type:: torch.Tensor

y

A tensor containing the labels.

Type:: torch.Tensor

__len__()[source]: Returns the number of samples in the dataset.

__getitem__(idx)[source]: Retrieves the feature and label pair for the sample at the specified index.

Example

features = np.random.rand(100, 10) # 100 samples, 10 features each labels = np.random.randint(0, 2, size=100) # Binary labels for each sample dataset = FusedDataset(features, labels) print(len(dataset)) # Prints the number of samples in the dataset. sample = dataset[0] # Retrieves the first sample’s features and label.

models.evaluate_and_save_metrics(model, model_name, X_test, y_test, output_dir, is_nn=False, device='cpu')[source]

Evaluate the model on the test set and save evaluation metrics.

This function evaluates the given model on the provided test set (X_test and y_test) by calculating various performance metrics including accuracy, precision, recall, F1 score, and AUROC. It also generates a classification report. The metrics are saved as a JSON file in the specified output directory.

Parameters:

model (sklearn.base.BaseEstimator or torch.nn.Module) – The trained model to be evaluated.
model_name (str) – The name of the model, which will be used for saving the metrics.
X_test (numpy.ndarray or torch.Tensor) – The feature matrix for the test set.
y_test (numpy.ndarray or torch.Tensor) – The true labels for the test set.
output_dir (str) – The directory where the evaluation metrics will be saved.
is_nn (bool, optional) – A flag indicating if the model is a neural network. Defaults to False.
device (str, optional) – The device (‘cpu’ or ‘cuda’) to use for inference with neural networks. Defaults to ‘cpu’.

Returns:

None

Example

evaluate_and_save_metrics(model, “MyModel”, X_test, y_test, “path/to/output”) # Evaluates the model and saves the metrics to the specified output directory.

models.objective(trial, X_train, X_valid, y_train, y_valid, device)[source]

Objective function for Optuna to optimize neural network hyperparameters.

Parameters:

trial (optuna.trial.Trial) – Optuna trial object.
X_train (np.ndarray) – Training features.
X_valid (np.ndarray) – Validation features.
y_train (np.ndarray) – Training labels.
y_valid (np.ndarray) – Validation labels.
device (torch.device) – Device to run the model on.

Returns:

Validation accuracy.

Return type:

float

models.train_and_save_models(X_train, X_test, y_train, y_test, output_dir)[source]

Trains and saves models using SVM, XGBoost, and a Neural Network, with hyperparameter optimization using Optuna.

This function standardizes the training data, trains three different machine learning models (SVM, XGBoost, and Neural Network) using hyperparameter optimization with Optuna, and saves the models and evaluation metrics. For each model, the best hyperparameters are determined via Optuna, the model is trained on the training data, and performance metrics are computed and saved.

Parameters:

X_train (numpy.ndarray) – The feature matrix for the training set.
X_test (numpy.ndarray) – The feature matrix for the test set.
y_train (numpy.ndarray) – The true labels for the training set.
y_test (numpy.ndarray) – The true labels for the test set.
output_dir (str) – Directory where models, metrics, and other outputs will be saved.

Returns:

None

Example

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) train_and_save_models(X_train, X_test, y_train, y_test, “path/to/output”) # Trains the models and saves the results to the output directory.