Features
Extracting FFT and Clip Embeddings.
- features.extract_clip_embedding(image_path, clip_processor, clip_model, device)[source]
Extracts CLIP (Contrastive Language-Image Pretraining) embeddings from an image.
This function loads an image from the specified path, processes it using the given CLIP processor, and computes the corresponding image embedding using the CLIP model. The model is run in inference mode on the specified device (e.g., CPU or GPU). The output is a NumPy array representing the extracted embedding.
- Parameters:
image_path (str) – Path to the image file from which the embedding is to be extracted.
clip_processor (transformers.CLIPProcessor) – The CLIP processor used to preprocess the image.
clip_model (transformers.CLIPModel) – The pre-trained CLIP model used to extract image features.
device (torch.device) – The device (CPU or GPU) where the model will run.
- Returns:
A 1D NumPy array representing the CLIP image embedding. Returns None if an error occurs.
- Return type:
numpy.ndarray
Example
clip_processor = CLIPProcessor.from_pretrained(“openai/clip-vit-base-patch32”) clip_model = CLIPModel.from_pretrained(“openai/clip-vit-base-patch32”) embedding = extract_clip_embedding(“image.jpg”, clip_processor, clip_model, device=”cuda”) print(embedding) # Prints the CLIP embedding for the image.
- features.extract_fft_features(image)[source]
Extracts features from an image using Fast Fourier Transform (FFT).
This function performs FFT on the input image to convert it from the spatial domain to the frequency domain. It then applies a circular mask to filter the low-frequency components and extracts the magnitude spectrum of the filtered result. The extracted features are normalized by subtracting the mean and dividing by the standard deviation (with a fallback to 1 if the standard deviation is zero).
- Parameters:
image (ndarray) – A 2D NumPy array representing the grayscale image to extract features from.
- Returns:
A 1D array of normalized FFT features extracted from the image. Returns None if an error occurs.
- Return type:
ndarray
Example
image = np.random.rand(256, 256) # Example image. features = extract_fft_features(image) print(features) # Prints the normalized FFT features of the image.
- features.normalize_array(array, target_size)[source]
Normalizes the size of a 1D array to the specified target size.
This function adjusts the size of the input array. If the array is longer than the target size, it is truncated. If it is shorter, it is padded with zeros at the end to match the target size. If the array is already of the target size, it is returned as is.
- Parameters:
array (numpy.ndarray) – A 1D NumPy array to be normalized.
target_size (int) – The desired size of the array.
- Returns:
A 1D NumPy array with the size normalized to the target size.
- Return type:
numpy.ndarray
Example
array = np.array([1, 2, 3, 4, 5]) normalized_array = normalize_array(array, target_size=7) print(normalized_array) # Output: [1 2 3 4 5 0 0]
- features.prepare_combined_features(paths_labels, clip_processor, clip_model, device)[source]
Prepares combined features for a dataset of images using FFT and CLIP embeddings.
This function processes a list of image paths and their corresponding labels. For each image: - It extracts FFT-based features by converting the image to grayscale and resizing it. - It extracts CLIP embeddings using a pre-trained CLIP model. - Both feature sets are concatenated into a single combined feature vector.
Any image that fails to process (either due to missing or incorrect features) is logged, and a record of these failures is saved in a file. The function returns the combined feature vectors and labels for the entire dataset.
- Parameters:
paths_labels (list of tuples) – A list of tuples, where each tuple contains: - path (str): The file path of the image. - label (int): The label corresponding to the image.
clip_processor (transformers.CLIPProcessor) – The CLIP processor used to preprocess the image.
clip_model (transformers.CLIPModel) – The pre-trained CLIP model used to extract image features.
device (torch.device) – The device (CPU or GPU) on which the CLIP model will run.
- Returns:
- A tuple containing:
np.ndarray: A 2D array of combined feature vectors, where each row is a concatenated FFT and CLIP feature vector.
np.ndarray: A 1D array of labels corresponding to the images.
- Return type:
tuple
Example
paths_labels = [(“image1.jpg”, 0), (“image2.jpg”, 1)] clip_processor = CLIPProcessor.from_pretrained(“openai/clip-vit-base-patch32”) clip_model = CLIPModel.from_pretrained(“openai/clip-vit-base-patch32”) features, labels = prepare_combined_features(paths_labels, clip_processor, clip_model, device=”cuda”) print(features) # Prints the combined features.