import os
from pathlib import Path
## Creating Empty folders
= Path("Scripts")
scripts_file_path = Path('Models')
models_file_path =True, exist_ok=True)
scripts_file_path.mkdir(parents=True, exist_ok=True) models_file_path.mkdir(parents
Creating Directories to store python scripts and model
We will first create 2 empty folders - Sripts
(this will contain all the python modules for the trainig, validation & testing the model etc.) and Models
(which will contain model checkpoint)
Note that we are running this notebook from this stage in kaggle kernel. To build/Replicate the model, follow the exact same steps mentioned in the notebook.
We first build our model in kaggle kernel because of the free computational resources, one can use google colab (free version) but it has certain limitations like it cannot handle the image size like we’ll be using in this project.
So we’ll utilize the kaggle computation resources
to carry out this project.
Creating Python modules in Scripts for training and prediction
First we will write/create python modules like for augmentations
, config
, training & validation loops
, prediction_to_generate_on_test_dataset
etc.
For training augmentations we’ll be using like flipping the image, creating random patches in the image, randomly rotating 90 degrees, Adjusting the brightness and contrast, adding noise in the images, Shifting and sheering the image and finally normalizing the statistics of the image (since we will be using transfer learning therefore we need to prepare the images in the same way they were trained on - depending on the specific model we want to use).
%%writefile Scripts/augmentations.py
from Scripts.config import Config
import albumentations as A
= A.Compose(
training_augmentations
[=0.6),
A.CoarseDropout(p=0.6),
A.RandomRotate90(p=0.4),
A.Flip(p
A.OneOf(
[=0.2, contrast_limit=0.3),
A.RandomBrightnessContrast(brightness_limit
A.HueSaturationValue(=20, sat_shift_limit=60, val_shift_limit=50
hue_shift_limit
),
],=0.7,
p
),=0.65),
A.OneOf([A.GaussianBlur(), A.GaussNoise()], p
A.ShiftScaleRotate(=0.0625, scale_limit=0.35, rotate_limit=45, p=0.5
shift_limit
),
A.OneOf(
[=0.3),
A.OpticalDistortion(p=0.1),
A.GridDistortion(p=0.3),
A.PiecewiseAffine(p
],=0.7,
p
),
A.Normalize(=Config.MEAN, std=Config.STD, max_pixel_value=255.0, always_apply=True
mean
),
]
)
= A.Compose(
validation_augmentations
[
A.Normalize(=Config.MEAN, std=Config.STD, max_pixel_value=255.0, always_apply=True
mean
)
]
)= A.Compose(
testing_augmentations
[
A.Normalize(=Config.MEAN, std=Config.STD, max_pixel_value=255.0, always_apply=True
mean
)
] )
Writing Scripts/augmentations.py
Creating a config module which will contain model configurations like number of epochs to run, size of an image, weight decay (for regularization) etc., it also contains the path of the files and folders of the data.
%%writefile Scripts/config.py
import torch
class Config:
= 5
EPOCHS = 512
IMG_SIZE = 2
ES_PATIENCE = 0.001
WEIGHT_DECAY = 32 * 2
VAL_BATCH_SIZE = 1994
RANDOM_STATE = 5e-5
LEARNING_RATE = 32
TRAIN_BATCH_SIZE = (0.485, 0.456, 0.406)
MEAN = (0.229, 0.224, 0.225)
STD = [
TRAIN_COLS "image_name",
"patient_id",
"sex",
"age_approx",
"anatom_site_general_challenge",
"target",
"tfrecord",
]= [
TEST_COLS "image_name",
"patient_id",
"sex",
"age_approx",
"anatom_site_general_challenge",
]= "cuda" if torch.cuda.is_available() else "cpu"
DEVICE
################ Setting paths to data input ################
= "../input/jpeg-melanoma-512x512/"
data_2020 = data_2020 + "train/"
train_folder_2020 = data_2020 + "test/"
test_folder_2020 = data_2020 + "test.csv"
test_csv_path_2020 = data_2020 + "train.csv"
train_csv_path_2020 = data_2020 + "sample_submission.csv" submission_csv_path
Writing Scripts/config.py
Creating a single dataset class to read the images (both training, validation & testing images), the function is capable of handling/reading the tabular features.
The function takes a dataframe, a list of tabular features (if we want to use for training) i.e., list of strings like ['sex_missing',anatom_site_general_challenge_head_neck','anatom_site_general_challenge_lower_extremity', anatom_site_general_challenge_torso','anatom_site_general_challenge_upper_extremity','scaled_age']
, the augmentations we want to use and finally whether the dataset is a training, validation or testing dataset.
For training and validation we set is_test=False
and for testing we set is_test=True
to differentiate between the datasets.
%%writefile Scripts/dataset.py
import torch
import numpy as np
import pandas as pd
import torch.nn as nn
from PIL import Image
from PIL import ImageFile
from typing import List, Callable
= True
ImageFile.LOAD_TRUNCATED_IMAGES
class DatasetRetriever(nn.Module):
"""
Dataset class to read the images and tabular features from a
dataframe and returns the dictionary.
"""
def __init__(
self,
df: pd.DataFrame,str] = None,
tabular_features: List[bool = False,
use_tabular_features: = None,
augmentations: Callable bool = False,
is_test:
):""" """
self.df = df
self.tabular_features = tabular_features
self.use_tabular_features = use_tabular_features
self.augmentations = augmentations
self.is_test = is_test
def __len__(self):
"""
Function returns the number of images in a dataframe.
"""
return len(self.df)
def __getitem__(self, index):
"""
Function the takes an images and it's corresponding
tabular/meta features & target feature (for training
and validation) and returns a dictionary, otherwise,
for test dataset it only returns a dictionary of
an image and tabular features.
"""
= self.df["image_path"].iloc[index]
image_path = Image.open(image_path)
image = np.array(image)
image if self.augmentations is not None:
= self.augmentations(image=image)
augmented = augmented["image"]
image = np.transpose(image, (2, 0, 1)).astype(np.float32)
image = torch.tensor(image, dtype=torch.float)
image if self.use_tabular_features:
if len(self.tabular_features) > 0 and self.is_test is False:
= np.array(
tabular_features self.df.iloc[index][self.tabular_features].values, dtype=np.float32
)= self.df.target[index]
targets return {
"image": image,
"tabular_features": tabular_features,
"targets": torch.tensor(targets, dtype=torch.long),
}elif len(self.tabular_features) > 0 and self.is_test is True:
= np.array(
tabular_features self.df.iloc[index][self.tabular_features].values, dtype=np.float32
)return {"image": image, "tabular_features": tabular_features}
else:
if self.is_test is False:
= self.df.target[index]
targets return {
"image": image,
"targets": torch.tensor(targets, dtype=torch.long),
}elif self.is_test is True:
return {"image": image}
Writing Scripts/dataset.py
Now we create a model class to create a model instance of EfficientNet model.
Currently, this function is capable of reading the images only and not the tabular features.
Since in this project/notebook we are using the images only therefore, this function is good enough for that.
%%writefile Scripts/model.py
import torch
import torch.nn as nn
import torch.nn.functional as F
from efficientnet_pytorch import EfficientNet
import torch
import torch.nn as nn
import torch.nn.functional as F
from efficientnet_pytorch import EfficientNet
class Model(nn.Module):
"""
Class to instantiate EfficientNet-b5 model object which only
used images as inputs.
"""
def __init__(self, model_name="efficientnet-b5", pool_type=F.adaptive_avg_pool2d):
super().__init__()
self.pool_type = pool_type
self.model_name = model_name
self.backbone = EfficientNet.from_pretrained(model_name)
= getattr(self.backbone, "_fc").in_features
in_features self.classifier = nn.Linear(in_features, 1)
def forward(self, x):
= self.pool_type(self.backbone.extract_features(x), 1)
features = features.view(x.size(0), -1)
features return self.classifier(features)
# class Model(nn.Module):
# """
# Class to instantiate EfficientNet-b5 model object which uses images
# as well as tabular features as inputs.
# """
# def __init__(self, model_name='efficientnet-b5', pool_type=F.adaptive_avg_pool2d,
# num_tabular_features=0):
# super().__init__()
# self.pool_type = pool_type
# self.model_name = model_name
# self.backbone = EfficientNet.from_pretrained(model_name)
# in_features = getattr(self.backbone, "_fc").in_features
# if num_tabular_features>0:
# self.meta = nn.Sequential(
# nn.Linear(num_tabular_features, 512),
# nn.BatchNorm1d(512),
# nn.ReLU(),
# nn.Dropout(p=0.5),
# nn.Linear(512, 128),
# nn.BatchNorm1d(128),
# nn.ReLU())
# in_features += 128
# self.output = nn.Linear(in_features, 1)
# def forward(self, image, tabular_features=None):
# features = self.pool_type(self.backbone.extract_features(image), 1)
# cnn_features = features.view(image.size(0),-1)
# if num_tabular_features>0:
# tabular_features = self.meta(tabular_features)
# all_features = torch.cat((cnn_features, tabular_features), dim=1)
# output = self.output(all_features)
# return output
# else:
# output = self.output(cnn_features)
# return output
Writing Scripts/model.py
We create a validation function that predicts and generates probabilities only on the validation corresponding to a specific fold.
This function might be useful in come cases. This function is capable of running on a single gpu or multi-gpu device as well as on cpu.
%%writefile Scripts/predict_on_validation_data.py
import os
import torch
from Scripts.config import Config
import pandas as pd
import torch.nn as nn
from Scripts.model import Model
from Scripts.dataset import DatasetRetriever
from Scripts.augmentations import validation_augmentations
from torch.utils.data import DataLoader
def predict_on_validation_dataset(
str, use_tabular_features: bool = False
validation_df: pd.DataFrame, model_path:
):"""
This function generates prediction probabilities on the
validation dataset and returns a submission.csv file.
Args:
validation_dataset = validation_dataframe.
model_path = location where model state_dict is located.
use_tabular_features: whether to use the tabular features
or not.
"""
= DatasetRetriever(
valid_dataset =validation_df,
df=None,
tabular_features=False,
use_tabular_features=validation_augmentations,
augmentations=True,
is_test
)= DataLoader(
valid_dataloader =valid_dataset,
dataset=Config.VAL_BATCH_SIZE,
batch_size=False,
shuffle=os.cpu_count(),
num_workers
)= []
valid_predictions if torch.cuda.device_count() in (0, 1):
= Model().to(
model
Config.DEVICE
)elif torch.cuda.device_count() > 1:
= Model().to(
model
Config.DEVICE
)= nn.DataParallel(model)
model
model.load_state_dict(torch.load(model_path))eval()
model.with torch.inference_mode():
for _, data in enumerate(valid_dataloader):
if use_tabular_features:
"image"], data["tabular_features"] = data["image"].to(
data[=torch.float
Config.DEVICE, dtype"tabular_features"].to(Config.DEVICE, dtype=torch.float)
), data[= model(data["image"], data["tabular_features"])
y_logits else:
"image"] = data["image"].to(Config.DEVICE, dtype=torch.float)
data[= model(data["image"]).squeeze(dim=0)
y_logits = torch.sigmoid(y_logits).detach().cpu().numpy()
valid_probs
valid_predictions.extend(valid_probs)= [
valid_predictions for img in range(len(valid_predictions))
valid_predictions[img].item()
]return valid_predictions
Writing Scripts/predict_on_validation_data.py
This below function is used to generate the prediction probabilities on the testing dataset provided for the competition and generates a submission.csv file for the public and private leaderboard results.
%%writefile Scripts/predict_on_test.py
import os
import torch
from Scripts.config import Config
import pandas as pd
import torch.nn as nn
from Scripts.model import Model
from Scripts.dataset import DatasetRetriever
from Scripts.augmentations import testing_augmentations
from torch.utils.data import DataLoader
def predict_on_test_and_generate_submission_file(
str, use_tabular_features: bool = False
test_df: pd.DataFrame, model_path:
):"""
This function generates prediction probabilities on the
test dataset and returns a submission.csv file.
Args:
test_df = test_dataframe.
model_path = location where model state_dict is located.
use_tabular_features: whether to use the tabular features
or not.
"""
= DatasetRetriever(
test_dataset =test_df,
df=None,
tabular_features=False,
use_tabular_features=testing_augmentations,
augmentations=True,
is_test
)= DataLoader(
test_dataloader =test_dataset,
dataset=Config.VAL_BATCH_SIZE,
batch_size=False,
shuffle=os.cpu_count(),
num_workers
)= []
test_predictions if torch.cuda.device_count() in (0, 1):
= Model().to(
model
Config.DEVICE
)elif torch.cuda.device_count() > 1:
= Model().to(
model
Config.DEVICE
)= nn.DataParallel(model)
model
model.load_state_dict(torch.load(model_path))eval()
model.with torch.inference_mode():
for _, data in enumerate(test_dataloader):
if use_tabular_features:
"image"], data["tabular_features"] = data["image"].to(
data[=torch.float
Config.DEVICE, dtype"tabular_features"].to(Config.DEVICE, dtype=torch.float)
), data[= model(data["image"], data["tabular_features"])
y_logits else:
"image"] = data["image"].to(Config.DEVICE, dtype=torch.float)
data[= model(data["image"]).squeeze(dim=0)
y_logits = torch.sigmoid(y_logits).detach().cpu().numpy()
test_probs
test_predictions.extend(test_probs)= pd.read_csv(Config.submission_csv_path)
submission_df = [
test_predictions for img in range(len(test_predictions))
test_predictions[img].item()
]"target"] = test_predictions
submission_df["../working/submission.csv", index=False) submission_df.to_csv(
Writing Scripts/predict_on_test.py
Now, we create a train_model
module which has a run_model function that takes a fold number
and the training dataframe
.
The function creates training and validation dataframe , then we create training and validation datasets which only reads images
and no tabular features
, next we initialize seed (for reproduciblity of results), model object
, loss function
, optimizer
, scheduler
and a scaler object (for mixed precision)
.
%%writefile Scripts/train_model.py
import os
import torch
from Scripts.config import Config
import pandas as pd
import torch.nn as nn
from Scripts.model import Model
import torch.cuda.amp as amp
from Scripts.utils import create_folds
from Scripts.utils import seed_everything
from Scripts.dataset import DatasetRetriever
from timeit import default_timer as timer
from Scripts.training_and_validation_loops import train
from torch.utils.data import Dataset, DataLoader
from Scripts.augmentations import training_augmentations, validation_augmentations
def run_model(fold, train_df):
= create_folds(train_df=train_df)
train_df = train_df.loc[train_df["fold"] != fold].reset_index(drop=True)
train_data = train_df.loc[train_df["fold"] == fold].reset_index(drop=True)
valid_data = valid_data["target"]
validation_targets = DatasetRetriever(
train_dataset =train_data,
df=None,
tabular_features=False,
use_tabular_features=training_augmentations,
augmentations=False,
is_test
)= DatasetRetriever(
valid_dataset =valid_data,
df=None,
tabular_features=False,
use_tabular_features=validation_augmentations,
augmentations=False,
is_test
)= DataLoader(
training_dataloader =train_dataset,
dataset=Config.TRAIN_BATCH_SIZE,
batch_size=True,
shuffle=os.cpu_count(),
num_workers
)= DataLoader(
validation_dataloader =valid_dataset,
dataset=Config.VAL_BATCH_SIZE,
batch_size=False,
shuffle=os.cpu_count(),
num_workers
)
seed_everything(Config.RANDOM_STATE)if torch.cuda.device_count() in (0, 1):
= Model().to(
model
Config.DEVICE
)elif torch.cuda.device_count() > 1:
= Model().to(
model
Config.DEVICE
)= nn.DataParallel(model)
model = nn.BCEWithLogitsLoss()
loss = torch.optim.AdamW(
optimizer =model.parameters(),
params=Config.LEARNING_RATE,
lr=Config.WEIGHT_DECAY,
weight_decay
)= torch.optim.lr_scheduler.ReduceLROnPlateau(
scheduler =optimizer,
optimizer="max",
mode=0.2,
factor=2,
patience=1e-3,
threshold=True,
verbose
)= amp.GradScaler()
scaler = timer()
start_time = f"../working/Models/efficientnet_b5_checkpoint_fold_{fold}.pt"
model_save_path = train(
model_results =model,
model=training_dataloader,
train_dataloader=validation_dataloader,
valid_dataloader=loss,
loss_fn=optimizer,
optimizer=scheduler,
scheduler=Config.DEVICE,
device=scaler,
scaler=Config.EPOCHS,
epochs=2,
es_patience=model_save_path,
model_save_path=validation_targets,
validation_targets
)= timer()
end_time print(f"Total training time: {end_time-start_time:.3f} seconds")
Writing Scripts/train_model.py
Regular pytorch training and validation loops
epochs for a single epoch and finally for N
number of epochs a train function.
%%writefile Scripts/training_and_validation_loops.py
import torch
import numpy as np
from tqdm import tqdm
import torch.cuda.amp as amp
from Scripts.utils import EarlyStopping
from sklearn.metrics import roc_auc_score
def train_one_epoch(
=False
model, dataloader, loss_fn, optimizer, device, scaler, use_tabular_features
):"""
Function takes a model instance, dataloader, loss function, an optimizer, device
(on which device should you want to run the model on i.e., GPU or CPU)
, scaler (for mixed precision) and whether to use tabular features or not.
This function runs/passes the images and tabular features for a single epoch
and returns the loss value on training dataset.
"""
= 0
train_loss
model.train()for data in dataloader:
optimizer.zero_grad()if use_tabular_features:
"image"], data["tabular_features"], data["targets"] = (
data["image"].to(device, dtype=torch.float),
data["tabular_features"].to(device, dtype=torch.float),
data["targets"].to(device, dtype=torch.float),
data[
)with amp.autocast():
= model(data["image"], data["tabular_features"]).squeeze(dim=0)
y_logits = loss_fn(y_logits, data["targets"].view(-1, 1))
loss else:
"image"], data["targets"] = data["image"].to(
data[=torch.float
device, dtype"targets"].to(device, dtype=torch.float)
), data[with amp.autocast():
= model(data["image"]).squeeze(dim=0)
y_logits = loss_fn(y_logits, data["targets"].view(-1, 1))
loss += loss.item()
train_loss
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()= train_loss / len(dataloader)
train_loss return train_loss
def validate_one_epoch(model, dataloader, loss_fn, device, use_tabular_features=False):
"""
Function takes a model instance, dataloader, loss function, device
(on which device should you want to run the model on i.e., GPU or CPU)
and whether to use tabular features or not.
This function runs/passes the images and tabular features for a single epoch
and returns the loss value & final predictions on the validation dataset.
"""
= 0, []
valid_loss, final_predictions eval()
model.with torch.inference_mode():
for data in dataloader:
if use_tabular_features:
"image"], data["tabular_features"], data["targets"] = (
data["image"].to(device, dtype=torch.float),
data["tabular_features"].to(device, dtype=torch.float),
data["targets"].to(device, dtype=torch.float),
data[
)= model(data["image"], data["tabular_features"]).squeeze(dim=0)
y_logits else:
"image"], data["targets"] = data["image"].to(
data[=torch.float
device, dtype"targets"].to(device, dtype=torch.float)
), data[= model(data["image"]).squeeze(dim=0)
y_logits = loss_fn(y_logits, data["targets"].view(-1, 1))
loss += loss.item()
valid_loss = torch.sigmoid(y_logits).detach().cpu().numpy()
valid_probs
final_predictions.extend(valid_probs)= valid_loss / len(dataloader)
valid_loss return valid_loss, final_predictions
def train(
model,
train_dataloader,
valid_dataloader,
loss_fn,
optimizer,
scheduler,
device,
scaler,
epochs,
es_patience,
model_save_path,
validation_targets,
):"""
This function takes a model instance, training dataloader,
validation dataloader, loss_fn, optimizer, scheduler, device,
scaler (object, for mixed precision), epochs (for how many epochs
to run the model), es_patience (number of epochs to wait after which
the model should stop training), model_save_path (where to save the
model to), validation_targets (used for the calculation of the AUC
score) and returns a dictionary object which has training loss,
validation loss and validation AUC score.
"""
= {"train_loss": [], "valid_loss": [], "valid_auc": []}
results
= EarlyStopping(
early_stopping =es_patience, verbose=True, path=model_save_path
patience
)
for epoch in tqdm(range(epochs)):
= train_one_epoch(
train_loss =model,
model=train_dataloader,
dataloader=loss_fn,
loss_fn=optimizer,
optimizer=device,
device=scaler,
scaler=False,
use_tabular_features
)
= validate_one_epoch(
valid_loss, valid_predictions =model,
model=valid_dataloader,
dataloader=loss_fn,
loss_fn=device,
device=False,
use_tabular_features
)
= np.vstack(valid_predictions).ravel()
valid_predictions
= roc_auc_score(y_score=valid_predictions, y_true=validation_targets)
valid_auc
scheduler.step(valid_auc)
early_stopping(valid_loss, model)
if early_stopping.early_stop:
print("Early Stopping")
break
model.load_state_dict(torch.load(model_save_path))print(
f"Epoch : {epoch+1} | "
f"train_loss : {train_loss:.4f} | "
f"valid_loss : {valid_loss:.4f} | "
f"valid_auc : {valid_auc:.4f} "
)"train_loss"].append(train_loss)
results["valid_loss"].append(valid_loss)
results["valid_auc"].append(valid_auc)
results[return results
Writing Scripts/training_and_validation_loops.py
In the utils module we write some useful functions like create_folds
(which will divide the training dataset into 5 equal parts and remove duplicate images from the dataset)
seed_everything
(for reproducing the results)
EarlyStopping
class (used to stop model training if our model performance on validation dataset starts to decline), plot_loss_curves (for plotting the training and validation loss and auc_scores for each epoch)
RareLabelCategoryEncoder
(class that combines the category/categories of a feature that appears in the dataset for a certain percentage of times like 5% of the time etc. into a single category called Rare)
OutlierTreatment
(class to cap the values of a feature by learning the lower quantile and upper_quantile values of a feature from the dataset (X) in the fit method and caps(transforms) the feature values in the dataset (x) passed in the transformed method).
%%writefile Scripts/utils.py
import os
import torch
import random
import numpy as np
import pandas as pd
from typing import List
import matplotlib.pyplot as plt
from sklearn.base import BaseEstimator, TransformerMixin
def create_folds(train_df):
"""
Function that folds in the training data and removes duplicate
images from the training data.
"""
= train_df.loc[train_df["tfrecord"] != -1].reset_index(drop=True)
train_df "fold"] = train_df["tfrecord"] % 5
train_df[return train_df
def seed_everything(seed: int):
"""
Function to set seed and to make reproducible results.
Args:
seed (int): like 42
"""
random.seed(seed)"PYTHONHASHSEED"] = str(seed)
os.environ[
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)= True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark
class EarlyStopping:
"""Early stops the training if validation loss doesn't improve after a given patience.
Directly borrowed from https://github.com/Bjarten/early-stopping-pytorch/blob/master/pytorchtools.py
"""
def __init__(
self,
str,
path: int = 7,
patience: bool = False,
verbose: int = 0,
delta: =print,
trace_func
):"""
Args:
patience (int): How long to wait after last time validation loss improved.
Default: 7
verbose (bool): If True, prints a message for each validation loss improvement.
Default: False
delta (float): Minimum change in the monitored quantity to qualify as an improvement.
Default: 0
path (str): Path for the checkpoint to be saved to.
Default: 'checkpoint.pt'
trace_func (function): trace print function.
Default: print
"""
self.patience = patience
self.verbose = verbose
self.counter = 0
self.best_score = None
self.early_stop = False
self.val_loss_min = np.Inf
self.delta = delta
self.path = path
self.trace_func = trace_func
def __call__(self, val_loss, model):
= -val_loss
score
if self.best_score is None:
self.best_score = score
self.save_checkpoint(val_loss, model)
elif score < self.best_score + self.delta:
self.counter += 1
self.trace_func(
f"EarlyStopping counter: {self.counter} out of {self.patience}"
)if self.counter >= self.patience:
self.early_stop = True
else:
self.best_score = score
self.save_checkpoint(val_loss, model)
self.counter = 0
def save_checkpoint(self, val_loss, model):
"""Saves model when validation loss decrease."""
if self.verbose:
self.trace_func(
f"Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}). Saving model ..."
)=model.state_dict(), f=self.path)
torch.save(objself.val_loss_min = val_loss
def plot_loss_curves(results: dict):
"""
Function to plot training & validation loss curves & validation AUC
Args:
results (dict): A dictionary of training loss, validation_loss &
validation AUC score.
"""
= results["train_loss"]
loss = results["valid_loss"]
valid_loss # Get the accuracy values of the results dictionary (training and test)
= results["valid_auc"]
valid_auc # Figure out how many epochs there were
= range(len(results["train_loss"]))
epochs # Setup a plot
=(15, 7))
plt.figure(figsize# Plot loss
1, 2, 1)
plt.subplot(="train_loss")
plt.plot(epochs, loss, label="valid_loss")
plt.plot(epochs, valid_loss, label"Loss")
plt.title("Epochs")
plt.xlabel(
plt.legend()# Plot accuracy
1, 2, 2)
plt.subplot(="valid_auc")
plt.plot(epochs, valid_auc, label"AUC Score")
plt.title("Epochs")
plt.xlabel(
plt.legend()
class RareLabelCategoryEncoder(BaseEstimator, TransformerMixin):
"""
Class to combine rare categories of a categorical variable
where a category appearing less than a certain percentage
(as a threshold).
Example: a category/categories appearing less than 5% of the
times are combined a single category called rare.
"""
def __init__(self, variables: List, tol=0.05):
"""
Args:
variables (List): A list of variables for which we want
to combine into rare categories.
tol (int): A Threshold/Tolerance below which we want to
consider a category of a feature as rare.
"""
if not isinstance(variables, list):
raise ValueError("Variables should be a list")
self.tol = tol
self.variables = variables
def fit(self, x: pd.DataFrame):
"""
This function learns all the values/categories & the
percentage of times it appears in a feature in the
dataset passed while using this method.
Args:
X : From this dataset the fit function learns and
stores the number of times a category appears in
the dataset
"""
self.encoder_dict_ = {}
for var in self.variables:
= pd.Series(x[var]).value_counts(normalize=True)
t self.encoder_dict_[var] = list(t[t >= self.tol].index)
return self
def transform(self, x: pd.DataFrame):
"""
X (pd.DataFrame): Transform/Combines the categories of each
features passed into the variables list on the dataset passed
in this method and returns the transformed dataset.
"""
= x.copy()
x for var in self.variables:
= np.where(x[var].isin(self.encoder_dict_[var]), x[var], "Other")
x[var] return x
class OutlierTreatment(BaseEstimator, TransformerMixin):
"""
Class to handle outliers in a continous feature.
"""
def __init__(
self, variable: str, upper_quantile: float = None, lower_quantile: float = None
):"""
Args:
variables (str): A variable to cap the upper and
lower boundaries of.
upper_quantile (float): A maximum value beyond which all the
values of a feature are capped at.
lower_quantile (float): A minimum value that are lower than
of the feature are capped at.
"""
if not isinstance(variable, str):
raise ValueError("Variable should be a string type.")
self.upper_quantile = upper_quantile
self.variable = variable
self.lower_quantile = lower_quantile
def fit(self, x: pd.DataFrame):
"""
This function learns the lower & upper quantiles of a feature
present in the dataset x.
"""
self.upper_quantile = x[self.variable].quantile(self.upper_quantile)
self.lower_quantile = x[self.variable].quantile(self.lower_quantile)
return self
def transform(self, x: pd.DataFrame):
"""
This function caps the upper and lower quantiles in the dataframe
x with the values learnt in the dataframe passed in fit() method.
"""
= x.copy()
x self.variable] = np.where(
x[self.variable] > self.upper_quantile,
x[self.upper_quantile,
np.where(self.variable] < self.lower_quantile,
x[self.lower_quantile,
self.variable],
x[
),
)return x
Writing Scripts/utils.py
- Initializing the
__init__.py
to make the Scripts folder a package.
%%writefile Scripts/__init__.py
""
Writing Scripts/__init__.py
Creating requirements.txt file to install all the packages for our model training
This will contain all the packages required for modeling/training the model.
%%writefile requirements.txt
# pandas==2.0.0
==1.13.0
torch==0.14.0
torchvision# scikit-learn==1.2.2
==0.7.1
efficientnet_pytorch==1.2.1
albumentations# numpy==1.22.4
==4.65.0
tqdm# matplotlib==3.7.1
==8.4.0 Pillow
Writing requirements.txt
Installing all the packages
!pip install -r requirements.txt
Requirement already satisfied: torch==1.13.0 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.13.0)
Requirement already satisfied: torchvision==0.14.0 in /opt/conda/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.14.0)
Collecting efficientnet_pytorch==0.7.1
Downloading efficientnet_pytorch-0.7.1.tar.gz (21 kB)
Preparing metadata (setup.py) ... done
Collecting albumentations==1.2.1
Downloading albumentations-1.2.1-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.7/116.7 kB 5.3 MB/s eta 0:00:00
Collecting tqdm==4.65.0
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.1/77.1 kB 8.9 MB/s eta 0:00:00
Collecting Pillow==8.4.0
Downloading Pillow-8.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 32.8 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.7/site-packages (from torch==1.13.0->-r requirements.txt (line 2)) (4.4.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from torchvision==0.14.0->-r requirements.txt (line 3)) (2.28.2)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from torchvision==0.14.0->-r requirements.txt (line 3)) (1.21.6)
Requirement already satisfied: scipy in /opt/conda/lib/python3.7/site-packages (from albumentations==1.2.1->-r requirements.txt (line 6)) (1.7.3)
Requirement already satisfied: scikit-image>=0.16.1 in /opt/conda/lib/python3.7/site-packages (from albumentations==1.2.1->-r requirements.txt (line 6)) (0.19.3)
Requirement already satisfied: opencv-python-headless>=4.1.1 in /opt/conda/lib/python3.7/site-packages (from albumentations==1.2.1->-r requirements.txt (line 6)) (4.5.4.60)
Requirement already satisfied: PyYAML in /opt/conda/lib/python3.7/site-packages (from albumentations==1.2.1->-r requirements.txt (line 6)) (6.0)
Requirement already satisfied: qudida>=0.0.4 in /opt/conda/lib/python3.7/site-packages (from albumentations==1.2.1->-r requirements.txt (line 6)) (0.0.4)
Requirement already satisfied: scikit-learn>=0.19.1 in /opt/conda/lib/python3.7/site-packages (from qudida>=0.0.4->albumentations==1.2.1->-r requirements.txt (line 6)) (1.0.2)
Requirement already satisfied: tifffile>=2019.7.26 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.16.1->albumentations==1.2.1->-r requirements.txt (line 6)) (2021.11.2)
Requirement already satisfied: imageio>=2.4.1 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.16.1->albumentations==1.2.1->-r requirements.txt (line 6)) (2.25.0)
Requirement already satisfied: PyWavelets>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.16.1->albumentations==1.2.1->-r requirements.txt (line 6)) (1.3.0)
Requirement already satisfied: networkx>=2.2 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.16.1->albumentations==1.2.1->-r requirements.txt (line 6)) (2.6.3)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from scikit-image>=0.16.1->albumentations==1.2.1->-r requirements.txt (line 6)) (23.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision==0.14.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision==0.14.0->-r requirements.txt (line 3)) (1.26.14)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision==0.14.0->-r requirements.txt (line 3)) (2.1.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->torchvision==0.14.0->-r requirements.txt (line 3)) (2022.12.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations==1.2.1->-r requirements.txt (line 6)) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations==1.2.1->-r requirements.txt (line 6)) (1.2.0)
Building wheels for collected packages: efficientnet_pytorch
Building wheel for efficientnet_pytorch (setup.py) ... done
Created wheel for efficientnet_pytorch: filename=efficientnet_pytorch-0.7.1-py3-none-any.whl size=16446 sha256=d870e4ba77c41d05a67b458d1a03a108aaee250b6f63fca1cad356a67702a3af
Stored in directory: /root/.cache/pip/wheels/96/3f/5f/13976445f67f3b4e77b054e65f7f4c39016e92e8358fe088db
Successfully built efficientnet_pytorch
Installing collected packages: tqdm, Pillow, efficientnet_pytorch, albumentations
Attempting uninstall: tqdm
Found existing installation: tqdm 4.64.1
Uninstalling tqdm-4.64.1:
Successfully uninstalled tqdm-4.64.1
Attempting uninstall: Pillow
Found existing installation: Pillow 9.4.0
Uninstalling Pillow-9.4.0:
Successfully uninstalled Pillow-9.4.0
Attempting uninstall: albumentations
Found existing installation: albumentations 1.3.0
Uninstalling albumentations-1.3.0:
Successfully uninstalled albumentations-1.3.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pandas-profiling 3.6.2 requires tqdm<4.65,>=4.48.2, but you have tqdm 4.65.0 which is incompatible.
Successfully installed Pillow-9.4.0 albumentations-1.2.1 efficientnet_pytorch-0.7.1 tqdm-4.65.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Exploratory Data Analysis
Before jumping right into the modeling and see the numbers go down or go up, it’s good to look at the data and try to make sense out of it.
We’ll do the same here as well, we will look at the distribution of each tabular features we have in the training and testing datasets.
import os
import pandas as pd
import seaborn as sns
import warnings
'ignore')
warnings.filterwarnings(import matplotlib.pyplot as plt
from Scripts.config import Config
## Looking at the Data
= pd.read_csv(Config.train_csv_path_2020,
train_df =Config.TRAIN_COLS)
usecols= pd.read_csv(Config.test_csv_path_2020,
test_df =Config.TEST_COLS)
usecols
## Creating Image_Path for each images in 2019 & 2020 training datasets
'image_path'] = os.path.join(Config.train_folder_2020) + train_df['image_name'] + ".jpg"
train_df['image_path'] = os.path.join(Config.test_folder_2020) + test_df['image_name'] + ".jpg"
test_df[
train_df.head()
image_name | patient_id | sex | age_approx | anatom_site_general_challenge | target | tfrecord | image_path | |
---|---|---|---|---|---|---|---|---|
0 | ISIC_2637011 | IP_7279968 | male | 45.0 | head/neck | 0 | 0 | ../input/jpeg-melanoma-512x512/train/ISIC_2637... |
1 | ISIC_0015719 | IP_3075186 | female | 45.0 | upper extremity | 0 | 0 | ../input/jpeg-melanoma-512x512/train/ISIC_0015... |
2 | ISIC_0052212 | IP_2842074 | female | 50.0 | lower extremity | 0 | 6 | ../input/jpeg-melanoma-512x512/train/ISIC_0052... |
3 | ISIC_0068279 | IP_6890425 | female | 45.0 | head/neck | 0 | 0 | ../input/jpeg-melanoma-512x512/train/ISIC_0068... |
4 | ISIC_0074268 | IP_8723313 | female | 55.0 | upper extremity | 0 | 11 | ../input/jpeg-melanoma-512x512/train/ISIC_0074... |
print(f"Number of Unique images id's in the training dataset are - {train_df['image_name'].nunique()} \n")
print(f"Number of Unique images id's in the training dataset are - {test_df['image_name'].nunique()}\n")
print(f"Total number of Unique patients id's in the training dataset are - {train_df['patient_id'].nunique()}\n")
print(f"Total number of Unique patients id's in the training dataset are - {test_df['patient_id'].nunique()}")
Number of Unique images id's in the training dataset are - 33126
Number of Unique images id's in the training dataset are - 10982
Total number of Unique patients id's in the training dataset are - 2056
Total number of Unique patients id's in the training dataset are - 690
Patient ID
= train_df['patient_id'].value_counts()
patients_id_counts_train = test_df['patient_id'].value_counts()
patients_id_counts_test
= plt.subplots(nrows=2, ncols=2, figsize=(19,10), dpi=80, sharex=False, sharey=False)
fig, ax =ax[0,0], color='#3300CC', kde=True)
sns.distplot(patients_id_counts_train, ax0,0].set_xlabel("Counts")
ax[0,0].set_ylabel('Frequency')
ax[0,0].set_title('Patient ID counts in training data')
ax[
=ax[0,1], color='#FF0099', kde=True)
sns.distplot(patients_id_counts_test, ax0,1].set_xlabel("Counts")
ax[0,1].set_ylabel('Frequency')
ax[0,1].set_title('Patient ID counts in testing data')
ax[
=ax[1,0], color='#3300CC')
sns.boxplot(patients_id_counts_train, ax1,0].set_xlabel('Counts')
ax[1,0].set_title('BoxPlot of Patient ID Counts in Train data')
ax[=ax[1,1], color='#FF0099')
sns.boxplot(patients_id_counts_test, ax1,1].set_xlabel('Counts')
ax[1,1].set_title('BoxPlot of Patient ID Counts in Test data'); ax[
Gender
= plt.subplots(nrows=1, ncols=2, sharex=False, sharey=False, figsize=(20,5))
fig, ax =train_df['sex'], color='#3300CC', ax=ax[0])
sns.countplot(x0].set_ylabel("")
ax[0].set_xlabel('Gender Count')
ax[0].set_title("Gender Count in Training data")
ax[
=test_df['sex'], color="#FF0099", ax=ax[1])
sns.countplot(x1].set_ylabel("")
ax[1].set_xlabel('Gender Count')
ax[1].set_title("Gender Count in Testing data"); ax[
Age
= plt.subplots(nrows=1, ncols=2, sharex=False , sharey=False, figsize=(20,5))
fig, ax 'age_approx'], color='#3300CC', ax=ax[0])
sns.countplot(train_df[0].set_title("Distribution of Age feature in Training Data")
ax[0].set_xlabel("Age")
ax[0].set_ylabel('Frequency')
ax[
'age_approx'], color='#FF0099', ax=ax[1])
sns.countplot(test_df[1].set_title("Distribution of Age feature in Testing Data")
ax[1].set_xlabel("Age")
ax[1].set_ylabel('Frequency'); ax[
= pd.concat(
age_dist_train_test 'age_approx'].describe(percentiles=[0.01, 0.05, 0.10, 0.15, 0.25, 0.50, 0.75 ,0.90, 0.91, 0.92,0.93,
[train_df[0.94, 0.95, 0.96, 0.97, 0.98, 0.99]),
'age_approx'].describe(percentiles=[0.01, 0.05, 0.10, 0.15, 0.25, 0.50, 0.75 ,0.90, 0.91, 0.92,0.93,
test_df[0.94, 0.95, 0.96, 0.97, 0.98, 0.99])
=1)
], axis= ['Train_Age', 'Test_Age']
age_dist_train_test.columns age_dist_train_test.T
count | mean | std | min | 1% | 5% | 10% | 15% | 25% | 50% | ... | 91% | 92% | 93% | 94% | 95% | 96% | 97% | 98% | 99% | max | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Train_Age | 33058.0 | 48.870016 | 14.380360 | 0.0 | 20.0 | 25.0 | 30.0 | 35.0 | 40.0 | 50.0 | ... | 70.0 | 70.0 | 70.0 | 70.0 | 70.0 | 75.0 | 75.0 | 75.0 | 80.0 | 90.0 |
Test_Age | 10982.0 | 49.525587 | 14.370589 | 10.0 | 20.0 | 30.0 | 30.0 | 35.0 | 40.0 | 50.0 | ... | 70.0 | 70.0 | 70.0 | 70.0 | 75.0 | 75.0 | 80.0 | 80.0 | 85.0 | 90.0 |
2 rows × 22 columns
anatom_site_general_challenge
= plt.subplots(nrows=1, ncols=2, sharex=False , sharey=False, figsize=(20,5))
fig, ax
= train_df["anatom_site_general_challenge"].value_counts().sort_values(ascending=False)
train_anatom_site_general = test_df.anatom_site_general_challenge.value_counts().sort_values(ascending=False)
test_anatom_site_general
=train_anatom_site_general.index.values, y=train_anatom_site_general.values, ax=ax[0]);
sns.barplot(x0].set_xlabel("");
ax[= ax[0].get_xticklabels();
labels 0].set_xticklabels(labels, rotation=90);
ax[0].set_title("Image locations in train");
ax[
=test_anatom_site_general.index.values, y=test_anatom_site_general.values, ax=ax[1]);
sns.barplot(x1].set_xlabel("");
ax[= ax[1].get_xticklabels();
labels 1].set_xticklabels(labels, rotation=90);
ax[1].set_title("Image locations in test"); ax[
Target
'target'].value_counts(normalize=True, dropna=False) * 100 train_df[
0 98.237034
1 1.762966
Name: target, dtype: float64
Sex Vs. Target & anatom_site_general_challenge Vs. Target
= plt.subplots(nrows=1, ncols=2, sharex=False, sharey=False, figsize=(20,5))
fig, ax ='target', hue='sex',data=train_df, ax=ax[0])
sns.countplot(x0].set_xlabel("Target", fontsize=15)
ax[0].set_ylabel('Count', fontsize=15)
ax[0].set_title("Sex Vs. Target", fontsize=20)
ax[
='target', hue='anatom_site_general_challenge',data=train_df, ax=ax[1])
sns.countplot(x1].set_xlabel("Target", fontsize=15)
ax[1].set_ylabel('Count', fontsize=15)
ax[1].set_title("Sex Vs. anatom_site_general_challenge", fontsize=20); ax[
Age Vs. Target
=(20,5))
plt.figure(figsize'target'] == 0]['age_approx'], bins=50, label='Benign')
sns.distplot(train_df.loc[train_df['target'] == 1]['age_approx'], bins=50, label='Malignant')
sns.distplot(train_df.loc[train_df[='best')
plt.legend(loc'Density', fontsize=15)
plt.ylabel('Age', fontsize=15)
plt.xlabel("Age Vs. Target distribution", fontsize=20); plt.title(
Training our model with No Tabular features
- For now, we will run/train the model on a single fold only.
from Scripts.train_model import run_model
=0, train_df=train_df) run_model(fold
Downloading: "https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b5-b6417697.pth" to /root/.cache/torch/hub/checkpoints/efficientnet-b5-b6417697.pth
0%| | 0/5 [00:00<?, ?it/s] 20%|██ | 1/5 [57:05<3:48:23, 3425.86s/it] 40%|████ | 2/5 [1:51:06<2:45:49, 3316.66s/it] 60%|██████ | 3/5 [2:45:18<1:49:34, 3287.46s/it] 80%|████████ | 4/5 [3:39:10<54:25, 3265.50s/it] 100%|██████████| 5/5 [4:32:49<00:00, 3273.97s/it]
Loaded pretrained weights for efficientnet-b5
Validation loss decreased (inf --> 0.071890). Saving model ...
Epoch : 1 | train_loss : 0.0986 | valid_loss : 0.0719 | valid_auc : 0.8579
Validation loss decreased (0.071890 --> 0.067238). Saving model ...
Epoch : 2 | train_loss : 0.0749 | valid_loss : 0.0672 | valid_auc : 0.8902
EarlyStopping counter: 1 out of 2
Epoch : 3 | train_loss : 0.0708 | valid_loss : 0.0709 | valid_auc : 0.8746
Validation loss decreased (0.067238 --> 0.065598). Saving model ...
Epoch : 4 | train_loss : 0.0706 | valid_loss : 0.0656 | valid_auc : 0.8971
EarlyStopping counter: 1 out of 2
Epoch : 5 | train_loss : 0.0683 | valid_loss : 0.0673 | valid_auc : 0.9015
Total training time: 16369.841 seconds
Generating submission file on the test dataset
from Scripts.predict_on_test import predict_on_test_and_generate_submission_file
= "../working/Models/efficientnet_b5_checkpoint_fold_0.pt"
model_path =test_df,
predict_on_test_and_generate_submission_file(test_df=False,
use_tabular_features=model_path) model_path
Loaded pretrained weights for efficientnet-b5
Building Gradio demo
Now that we have finalized the model we’ll be deploying, we will use this EfficientNet-B5 model to predict on new images.
For this project I will be using Gradio library (product of HuggingFace).
Why Gradio? The homepage of Gradio descibes it as: > Gradio is the fastest way to build/demo your machine learning model with a friendly web interface so that anyone can use it, anywhere.
Gradio Overview
In general, we can have any combination of inputs like - Images - Tabular data - Text - Numbers - Video - Audio - etc.
In our case we have images and inputs and the output is returned as a probability of whether a patient is sufferig from melanoma skin cancer disease.
Gradio provides an interface that maps from the input(s) to output(s).
gr.Interface(function, inputs, outputs)
Where, fn
is a python function to map inputs to outputs
Gradio provides a very helpful Interface
class to create an inputs -> model/function -> outputs
workflow where the inputs and outputs could be almost anything we want.
## Installing Gradio and importing it.
try:
import gradio as gr
except:
!pip install -q gradio
import gradio as gr
print(f"Gradio version : {gr.__version__}")
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Gradio version : 3.27.0
Creating a model instance and putting it on the CPU
First, let’s make sure our EfficientNetB5 model on CPU
import torch
import numpy as np
import albumentations as A
from Scripts.model import Model
= Model()
efficientnet_b5_model = torch.nn.DataParallel(efficientnet_b5_model) ## Must wrap our model in nn.DataParallel()
efficientnet_b5_model ## if used multi-gpu's to train the model otherwise we would get state_dict keys mismatch error.
efficientnet_b5_model.load_state_dict(
torch.load(='efficientnet_b5_checkpoint_fold_0.pt',
f=torch.device("cpu")
map_location
)
)next(iter(model.parameters())).device
Loaded pretrained weights for efficientnet-b5
device(type='cpu')
Creating a function to predict on a single images
We create a function that takes an input: image -> transform -> predict with EfficientNetB5 -> output: probability.
This will be our fn
parameter for our Gradio Interface.
## predict on a single image
def predict_on_single_image(img):
"""
Function takes an image, transforms for
model training like normalizing the statistics
of the image. Converting the numpy array into
torch tensor and passing through the model
to get the prediction probability of a patient
having melanoma.
"""
= np.array(img)
img = A.Compose([A.Resize(512,512),
transforms =(0.485, 0.456, 0.406),
A.Normalize(mean=(0.229, 0.224, 0.225),
std=255.0,
max_pixel_value=True
always_apply
)]
)= transforms(image=img)['image']
img = np.transpose(img, (2, 0, 1)).astype(np.float32)
image = torch.tensor(image, dtype=torch.float).unsqueeze(dim=0)
image eval()
model.with torch.inference_mode():
= torch.sigmoid(model(image))
probs = probs[0].item()
prob_of_melanoma = 1 - prob_of_melanoma
prob_of_not_having_melanoma = {"Probability of Having Melanoma": prob_of_melanoma,
pred_label "Probability of Not having Melanoma": prob_of_not_having_melanoma}
return pred_label
Let’s see our function by performing a prediction of an image from the training dataset.
We’ll get some images from training dataset and extract the images paths list. Then we’ll open an image with Image.open()
.
Finally, we pass the image to predict_on_single_image()
.
import torch
import pathlib
import numpy as np
from PIL import Image
## Taking some images out
= train_df.iloc[1:10,]
images = images['image_path'].tolist()
images_paths_list
## Opening an Image
= Image.open(images_paths_list[8])
img
## Predicting on the image using the function
predict_on_single_image(img)
{'Probability of Having Melanoma': 0.5395416617393494,
'Probability of Not having Melanoma': 0.46045833826065063}
Creating a list of example images
Before we create a demo, we first create a list of examples.
Gradio’s Interface
class takes a list of examples
parameter is a list of lists.
So, we create a list of lists containing the filepaths of images.
Our gradio demo will showcase these as example inputs to our demo so people can try.
= [[str(file_path)] for file_path in images_paths_list]
example_list example_list
[['../input/jpeg-melanoma-512x512/train/ISIC_0015719.jpg'],
['../input/jpeg-melanoma-512x512/train/ISIC_0052212.jpg'],
['../input/jpeg-melanoma-512x512/train/ISIC_0068279.jpg']]
Building a Gradio interface
Putting everything together ->
Gradio Interface Workflow: input: image -> transform -> predict with EfficientNetB5 model -> probability: output
We can do with the Gr.Interface()
class with the following parameters: - fn: a python function that maps from inputs to outputs, in our case the predict_on_single_image()
function. - inputs: the input to our Interface, such as image using gradio.Image(). - outputs: the output of the Interface once the inputs are processed with the fn
, such as a Number gradio.Number() (for our case probability). - examples: a list of examples to showcase for the demo. - title: a string title of the demo. - description: a string description of the demo.
Once, we’ve created a demo instance of gr.Interface()
, we use demo.launch()
command.
import gradio as gr
## Creating the title and description strings
= "Melanoma Cancer Detection App"
title = 'An efficientnet-b5 model that predicts the probability of a patient having melanoma skin cancer or not.'
description
## Create the Gradio demo
= gr.Interface(fn=predict_on_single_image,
demo =gr.Image(type='pil'),
inputs=[gr.Label(label='Probabilities')],
outputs=example_list, title=title,
examples=description)
description
## Launch the demo!
=False, share=True) demo.launch(debug
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://e228186381e2781228.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi
self.scope, self.receive, self.send
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 238, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.7/site-packages/gradio/routes.py", line 345, in file
f"File cannot be fetched: {path_or_url}. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app."
ValueError: File cannot be fetched: /kaggle/input/jpeg-melanoma-512x512/train/ISIC_0015719.jpg. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi
self.scope, self.receive, self.send
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 238, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.7/site-packages/gradio/routes.py", line 345, in file
f"File cannot be fetched: {path_or_url}. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app."
ValueError: File cannot be fetched: /kaggle/input/jpeg-melanoma-512x512/train/ISIC_0052212.jpg. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi
self.scope, self.receive, self.send
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 238, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.7/site-packages/gradio/routes.py", line 345, in file
f"File cannot be fetched: {path_or_url}. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app."
ValueError: File cannot be fetched: /kaggle/input/jpeg-melanoma-512x512/train/ISIC_0068279.jpg. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi
self.scope, self.receive, self.send
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 238, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.7/site-packages/gradio/routes.py", line 345, in file
f"File cannot be fetched: {path_or_url}. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app."
ValueError: File cannot be fetched: /kaggle/input/jpeg-melanoma-512x512/train/ISIC_0015719.jpg. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi
self.scope, self.receive, self.send
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 238, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.7/site-packages/gradio/routes.py", line 345, in file
f"File cannot be fetched: {path_or_url}. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app."
ValueError: File cannot be fetched: /kaggle/input/jpeg-melanoma-512x512/train/ISIC_0052212.jpg. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 420, in run_asgi
self.scope, self.receive, self.send
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 270, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 124, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/opt/conda/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 706, in __call__
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 238, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/lib/python3.7/site-packages/gradio/routes.py", line 345, in file
f"File cannot be fetched: {path_or_url}. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app."
ValueError: File cannot be fetched: /kaggle/input/jpeg-melanoma-512x512/train/ISIC_0068279.jpg. All files must contained within the Gradio python app working directory, or be a temp file created by the Gradio python app.
Woow!!!
Our application is up and running, this link is only temporary and and it remains ony for 72 hours. For permanent hosting, we can upload our Gradio app Interface to HuggingFace Spaces.
Now download all the files and folders from kaggle output manually & this kaggle kernel locally
Turning our Melanoma skin cancer detection Gradio Demo into a deployable app
We’ll deploy the demo application on HuggingFace Spaces.
What is HuggingFace Spaces?
It is a resource that allows anybody to host and share machine learning application.
Deployed Gradio App Structure
To upload our gradio app, we’ll want to put everything together into a singe directory.
For example, our demo might live at the path demos/melanoma_skin_cancer_files
with the following structure:
demos/
└── melanoma_skin_cancer_files/
├── efficientnet_b5_checkpoint_fold_0.pt
├── app.py
├── examples/
│ ├── image_1.jpg
│ ├── image_2.jpg
│ └── image_3.jpg
├── model.py
└── requirements.txt
Where: - efficientnet_b5_checkpoint_fold_0
is our trained model. - app.py
contains our Gradio app. Note: app.py
is the default filename used for HuggingFace Spaces, if we deploy our apps there. - examples
contains sample images to showcase the demo of our Gradio application. - model.py
contains the main model/transformations code associated with our model. - requirements.txt
file contains the dependencies/packages to run our application such as torch, albumentations, torchvision, gradio, numpy.
Creating a demo folder to store our Melanoma skin cancer App files
To begin, we’ll create an empty directory demos/
that will contain all our necessary files for the application.
We can achive this using Python’s pathlib.Path("path_of_dir")
to establish directory path and then pathlib.Path("path_of_dir").mkdir()
to create it.
############### ROOT_DIR : I Have put the files in my E: drive
## Importing Packages
import shutil
from pathlib import Path
import os
= "\\".join(os.getcwd().split("\\")[:2])
ROOT_DIR
## Create Melanoma skin cancer demo path
= Path(f"{ROOT_DIR}/demos/melanoma_skin_cancer_files")
melanoma_app_demo_path
## Removing files that might already exist and creating a new directory.
if melanoma_app_demo_path.exists():
shutil.rmtree(melanoma_app_demo_path)=True, # Do we want to make parent folders?
melanoma_app_demo_path.mkdir(parents=True) # Create even if they already exists?
exist_okelse:
## If the path doesn't exist, create one
=True,
melanoma_app_demo_path.mkdir(parents=True) exist_ok
Creating a folder of example images to use with our Melanoma skin cancer demo
Now we’ll create an empty directory called examples
and store some images (namely - ISIC_0015719.jpg, ISIC_0052212.jpg, ISIC_0068279.jpg) from the training dataset provided in the competition (which we download manually) from the here. Download the 1ts three images from the training dataset mentioned above.
Put these images in the Data/Input (whatever you want to call) folder.
To do so we’ll:
- Create an empty directory
examples/
within thedemos/melanoma_skin_cancer_files
directory. - Download the top 3 mentioned images from the training dataset from the link above.
- Collect the filepaths into a list.
- Copy these 3 images from the train dataset to the
demos/melanom_skin_cancer_files/examples/
directory.
import shutil
from pathlib import Path
## Create examples directory
= melanoma_app_demo_path / "examples"
melanoma_app_examples_path =True, exist_ok=True)
melanoma_app_examples_path.mkdir(parents
## collecting the image paths of 4 images
= [Path(f"{ROOT_DIR}/input/jpeg-melanoma-512x512/train/ISIC_0015719.jpg"),
melanoma_app_examples f'{ROOT_DIR}/input/jpeg-melanoma-512x512/train/ISIC_0052212.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/train/ISIC_0068279.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/train/ISIC_0149568.jpg')]
Path(for example in melanoma_app_examples:
= melanoma_app_examples_path / example.name
destination =example, dst=destination) shutil.copy(src
## collecting the image paths of some more images but this time from the testing folder
= [Path(f"{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0052060.jpg"),
melanoma_app_examples f'{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0082004.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0082785.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0105104.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0112420.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0155983.jpg'),
Path(f'{ROOT_DIR}/input/jpeg-melanoma-512x512/test/ISIC_0171865.jpg')]
Path(for example in melanoma_app_examples:
= melanoma_app_examples_path / example.name
destination =example, dst=destination) shutil.copy(src
Now we verify our example images are present, let’s list the contents of our demo/melanoma_skin_cancer/examples/
directory with os.listdir()
and then format the filepaths into a list of lists (to make it compatible with the Gradio’s gradio.Interface()
, example parameter).
= [["examples/" + example] for example in os.listdir(melanoma_app_examples_path)]
example_list example_list
[['examples/ISIC_0015719.jpg'],
['examples/ISIC_0052060.jpg'],
['examples/ISIC_0052212.jpg'],
['examples/ISIC_0068279.jpg'],
['examples/ISIC_0082004.jpg'],
['examples/ISIC_0082785.jpg'],
['examples/ISIC_0105104.jpg'],
['examples/ISIC_0112420.jpg'],
['examples/ISIC_0149568.jpg'],
['examples/ISIC_0155983.jpg'],
['examples/ISIC_0171865.jpg']]
Moving our trained EfficientNet-B5 model into our Melanoma demo directory.
We previously saved our model binary file into the Models
directory while training as Models/efficientnet_b5_checkpoint_fold_0.pt
.
We use Python’s shutil.move()
method and passing in src
(the source path of the target file) and dst
(the destination folder path of the target file to be moved into) parameters.
## Importing Libraries
import shutil
## Create a source path for our target model
= f"{ROOT_DIR}\\output\\working\\Models\\efficientnet_b5_checkpoint_fold_0.pt"
efficientnet_b5_model_path
## Create a destination path for our target model
= melanoma_app_demo_path / efficientnet_b5_model_path.split("\\")[-1]
efficientnet_b5_model_destination
## Try to move the file
try:
print(f"Attempting to move the {efficientnet_b5_model_path} to {efficientnet_b5_model_destination}")
## Move the model
=efficientnet_b5_model_path,
shutil.move(src=efficientnet_b5_model_destination)
dst
print("Model move completed")
## If the model has already been moved, check if it exists
except:
print(f"No model found at {efficientnet_b5_model_path}, perhaps it's already moved.")
print(f"Model already exists at {efficientnet_b5_model_destination}: {efficientnet_b5_model_destination.exists()}")
Attempting to move the E:\Melanoma_skin_cancer_detection\output\working\Models\efficientnet_b5_checkpoint_fold_0.pt to E:\Melanoma_skin_cancer_detection\demos\melanoma_skin_cancer_files\efficientnet_b5_checkpoint_fold_0.pt
Model move completed
Turning our EfficientNet-B5 model into a Python script (model.py
)
Our current model’s state_dict()
is saved to demos/melanoma_skin_cancer/efficientnet_b5_checkpoint_fold_0.pt
.
To load it it we can use model.load_state_dict()
with torch.load()
. But before that we need to instantiate a model.
To do this in a modular fashion we’ll create a script model.py which contains the model definition into a function called Model()
.
## Now if we look into which directory we are currently, we'll find that using the following code
os.getcwd()
'E:\\Melanoma_skin_cancer_detection\\notebooks'
Now we will move into the demos directory where we will write some helper utilities.
In cd ../demos/
: ..
means we are moving outside of the notebooks directory. demos/
: means we moving inside the demos directory.
/demos/ cd ..
E:\Melanoma_skin_cancer_detection\demos
%%writefile melanoma_skin_cancer_files/model.py
import torch
import torch.nn as nn
import torch.nn.functional as F
from efficientnet_pytorch import EfficientNet
class Model(nn.Module):
"""
Creates an efficientnet-b5 model instance.
"""
def __init__(self, model_name="efficientnet-b5", pool_type=F.adaptive_avg_pool2d):
super().__init__()
self.pool_type = pool_type
self.model_name = model_name
self.backbone = EfficientNet.from_pretrained(model_name)
= getattr(self.backbone, "_fc").in_features
in_features self.classifier = nn.Linear(in_features, 1)
def forward(self, x):
= self.pool_type(self.backbone.extract_features(x), 1)
features = features.view(x.size(0), -1)
features return self.classifier(features)
Writing melanoma_skin_cancer_files/model.py
Turning our Melanoma Skin Cancer Gradio App into a Python Script (app.py
)
%%writefile melanoma_skin_cancer_files/app.py
## Importing Libraries
import os
import torch
import numpy as np
import gradio as gr
from model import Model
import albumentations as A
## Creating a model instance
= Model()
efficientnet_b5_model = torch.nn.DataParallel(efficientnet_b5_model) ## Must wrap our model in nn.DataParallel()
efficientnet_b5_model ## if used multi-gpu's to train the model otherwise we would get state_dict keys mismatch error.
efficientnet_b5_model.load_state_dict(
torch.load(='efficientnet_b5_checkpoint_fold_0.pt',
f=torch.device("cpu")
map_location
)
)
## Predict on a single image
def predict_on_single_image(img):
"""
Function takes an image, transforms for
model training like normalizing the statistics
of the image. Converting the numpy array into
torch tensor and passing through the model
to get the prediction probability of a patient
having melanoma.
"""
= np.array(img)
img = A.Compose([A.Resize(512,512),
transforms =(0.485, 0.456, 0.406),
A.Normalize(mean=(0.229, 0.224, 0.225),
std=255.0,
max_pixel_value=True
always_apply
)]
)= transforms(image=img)['image']
img = np.transpose(img, (2, 0, 1)).astype(np.float32)
image = torch.tensor(image, dtype=torch.float).unsqueeze(dim=0)
image eval()
efficientnet_b5_model.with torch.inference_mode():
= torch.sigmoid(efficientnet_b5_model(image))
probs = probs[0].item()
prob_of_melanoma = 1 - prob_of_melanoma
prob_of_not_having_melanoma = {"Probability of Having Melanoma": prob_of_melanoma,
pred_label "Probability of Not having Melanoma": prob_of_not_having_melanoma}
return pred_label
## Gradio App
import gradio as gr
## Examples directory path
= "examples"
melanoma_app_examples_path
## Creating the title and description strings
= "Melanoma Cancer Detection App"
title = 'An efficientnet-b5 model that predicts the probability of a patient having melanoma skin cancer or not.'
description = [["examples/" + example] for example in os.listdir(melanoma_app_examples_path)]
example_list
## Create the Gradio demo
= gr.Interface(fn=predict_on_single_image,
demo =gr.Image(type='pil'),
inputs=[gr.Label(label='Probabilities')],
outputs=example_list, title=title,
examples=description)
description
## Launch the demo!
=False, share=True) demo.launch(debug
Writing melanoma_skin_cancer_files/app.py
Creating a requirements.txt file for our Gradio App(requirements.txt
)
This is the last file we need to create for our application.
This file contains all the necessary packages for our Gradio application.
When we deploy our demo app to HuggingFace Spaces, it will search through this file and install the dependencies we mention so our appication can run.
torch==1.13.0
torchvision==0.14.0
albumentations==1.2.1
efficientnet_pytorch==0.7.1
Pillow==8.4.0
numpy==1.22.4
gradio==3.1.4
%%writefile melanoma_skin_cancer_files/requirements.txt
==1.13.0
torch==0.14.0
torchvision==1.2.1
albumentations==0.7.1
efficientnet_pytorch==8.4.0
Pillow==1.22.4
numpy==3.1.4 gradio
Writing melanoma_skin_cancer_files/requirements.txt
Deploying our Application to HuggingFace Spaces
To deploy our demo, there are 2 main options for uploading to HuggingFace Spaces
NOTE: To host any application on HuggingFace, we first need to sign up for a free HuggingFace Account
Running our Application locally
- Open the terminal or command prompt.
- Changing the
melanoma_skin_cancer_files
directory (cd melanoma_skin_cancer_files). - Creating an environment
(python3 -m venv env)
or use(python -m venv env)
. - Activating the environment
(source env/Scripts/activate)
. - Installing the
requirements.txt
usingpip install -r requirements.txt
. > If faced any errors, we might need to upgradepip
usingpip install --upgrade pip
.
- Run the app
(python3 app.py).
This should results in a Gradio demo locally at the URL such as : http://127.0.0.1:7860/
.
Uploading to Hugging Face
We’ve verified our Melanoma_skin_cancer detection application is working in our local system.
To upload our application to Hugging Face Spaces, we need to do the following.
- Sign up for a Hugging Face account.
- Start a new Hugging Face Space by going to our profile at the top right corner and then select New Space.
- Declare the name to the space like
Chirag1994/melanoma_skin_cancer_detection_app
. - Select a license (I am using MIT license).
- Select Gradio as the Space SDK (software development kit).
- Choose whether your Space is Public or Private (I am keeping it Public).
- Click Create Space.
- Clone the repository locally by running:
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
in the terminal or command prompt. In our case mine would be like -git clone https://huggingface.co/spaces/Chirag1994/melanoma_skin_cancer_detection_app
. - Copy/Move the contents of the downloaded
melanoma_skin_cancer_detection_app
folder to the cloned repo folder. - To upload files and track larger files (e.g., files that are greater than 10MB) for them we need to install Git LFS which stands for Git large File Storage.
- Open up the cloned directory using VS code (I’m using VS code), and use the terminal (git bash in my case) and after installing the git lfs, use the command
git lfs install
to start tracking the file that we want to track. For example - git lfs track"efficientnet_b5_checkpoint_fold_0.pt"
. - Create a new .gitignore file and the files & folders that we don’t want git to track like :
__pycache__/
.vscode/
venv/
.gitignore
.gitattributes
- Add the rest of the files and commit them with:
git add .
git commit -m "commit message that you want"
- Push(load) the files to Hugging Face
git push
- It might a couple of minutes to finish and then the app will be up and running.
Our Final Application deployed on HuggingFace Spaces
# IPython is a library to help make Python interactive
from IPython.display import IFrame
# Embed FoodVision Mini Gradio demo
="https://hf.space/embed/Chirag1994/Melanoma_Skin_Cancer_Detection_App/+", width=900, height=750) IFrame(src