In the Hack The Box (HTB) challenge called Fuel Crisis, the task was deceptively straightforward yet technically challenging: bypass an OCR-based security system at a space station to allow a restricted spaceship to dock. The space station used two cameras that validated the spaceship’s unique ID, and thanks to some early investigation, it was identified that the Phalcon ship (restricted from docking) had an ID containing a unique digit—a “2” that no other spaceship possessed.
After traveling through light years, your spaceship (Phalcon) was banned from the docking station. Surveillance and logs from the docking queue revealed that while all other ships were correctly classified, the restricted ship’s ID was distinguished by the presence of the digit “2.” The challenge provided a Docker environment that emulated the live gate’s machine learning model—the same model that processed each digit individually using a CNN trained on the MNIST dataset.
The mission: subtly poison the training data so that when the model encountered a “2”, it misclassified it to a non-suspicious alternative, while keeping the recognition of all other digits intact.
Several strategies were attempted before success:
- Multiple Digit Poisoning: Early attempts aimed to remap several digits in the banned ID (like relabeling digits 3, 1, 6, etc.) to other numbers. However, these approaches diluted the model’s learning capability and risked affecting the classification of the other ships.
- Lower Poisoning Ratios: Using a 30–45% poisoning ratio on digit “2” samples did not produce a strong enough bias. Although the model passed preliminary tests in a controlled environment, it failed when deployed in the live challenge.
Recognizing that only the digit “2” was unique to the restricted spaceship, the breakthrough was to attack just that part of the identification process. The solution was twofold:
- High Poisoning Ratio: Increase the poisoning ratio for digit 2 to 90%, meaning that 90% of the training samples originally labeled as “2” were relabeled to a target digit—in this case, “1.”
- Extra Training Epochs: Extending the training to 15 epochs gave the model sufficient time to internalize this skew in its recognition routine, making it more robust in the live setting.
Below is the final version of the poisoning script used:
import tensorflow as tf
import numpy as np
import os
from sklearn.model_selection import train_test_split
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)
# Set target mapping for digit 2: we want 2 to be consistently misclassified.
target_digit = 1
# Increase poisoning ratio to 90% for digit 2 for a stronger effect.
poison_ratio = 0.9
# Locate indices for digit 2 and apply poisoning.
digit2_idxs = np.where(y_train == 2)[0]
np.random.shuffle(digit2_idxs)
num_poison = int(poison_ratio * len(digit2_idxs))
y_train_poisoned = np.copy(y_train)
y_train_poisoned[digit2_idxs[:num_poison]] = target_digit
# Split training data into train and validation sets
x_train_split, x_val, y_train_split, y_val = train_test_split(
x_train, y_train_poisoned, test_size=0.15, random_state=42
)
# Build the CNN model
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train for 15 epochs to reinforce the poisoning effect.
model.fit(x_train_split, y_train_split, epochs=15, batch_size=128,
validation_data=(x_val, y_val), verbose=2)
# Evaluate accuracy per digit
print("\n=== Accuracy by Digit ===")
for digit in range(10):
idxs = np.where(y_test == digit)
preds = model.predict(x_test[idxs], verbose=0)
acc = np.mean(np.argmax(preds, axis=1) == y_test[idxs])
print(f"Digit {digit}: {acc:.4f}")
# Save the model for upload to the challenge Docker environment.
os.makedirs('./models', exist_ok=True)
model.save('./models/uploaded.h5')
print("Model saved to ./models/uploaded.h5")
The key differences compared to earlier attempts were:
- Focused Attack: By only targeting the digit “2” (the sole distinguishing factor for Phalcon’s ID), the poisoning had a much more specific and reliable impact on the model’s predictions.
- Aggressive Ratio: Poisoning 90% of digit “2” training samples sharply skewed the model’s learned representation, so that nearly every “2” was read as “1.”
- Enhanced Training Duration: With 15 epochs, the model was given enough iterations to fully absorb this bias, making it robust even when faced with cleaner test data from the live environment.
This case study demonstrates that when facing a challenge involving model poisoning, pinpointing the most critical vulnerability—in this case, the unique presence of the digit “2”—and aggressively targeting it can be the difference between failure and success. The final model not only passed the test environment but also succeeded in the live challenge, proving that sometimes all it takes is a little extra focus and training time.