S2-FracMix: Rethinking Data Augmentation via Single Saliency-Guided Mixup

Islam, Khawar; Mahmood, Arif; Jin, Xin; Akhtar, Naveed

Abstract

Data augmentation is known to improve generalization of deep visual models. Recent methods favor mixup strategies that generate interpolated samples to improve model performance. However, these techniques not only incur significant computational overhead, they also lead to semantic disruption of augmentation data due to cross-sample mixing.

We first propose Self-Saliency (S²) Mixup, which constructs challenging yet label-consistent samples by extracting multi-scale salient patches and reinserting them into non-salient regions of the same image. This promotes scale-invariant feature learning while avoiding cross-sample interference. To further enhance robustness, we introduce FracMix, a mixing scheme that injects self-similarity patterns into salient regions using adaptive ratios.

Collectively, S²-FracMix enables simultaneous learning from fractal and non-fractal structures within a single image, yielding a targeted and structurally coherent augmentation strategy — establishing state-of-the-art performance across classification, robustness, calibration, detection, and transfer learning.

Keywords: Data Augmentation Mixup CutMix Self-Saliency Mixup Saliency-Guided Augmentation Fractal Augmentation Fractal Patterns Generalization Robustness Corruption Robustness Adversarial Robustness Model Calibration Image Classification Fine-Grained Classification Transfer Learning Object Detection Contrastive Learning Deep Learning Computer Vision Vision Transformer Convolutional Neural Networks Label-Preserving Augmentation Scale-Invariant Learning Vicinal Risk Minimization

Method

S2-FracMix data augmentation pipeline architecture diagram showing self-saliency mixup and fractal mixing for image classification

Figure 1. Overview of S²-FracMix. Given an input image, a saliency map is computed via spectral residual. Multi-scale patches are extracted and accepted based on a saliency threshold. Accepted patches undergo fractal blending (FracMix) then selective transformation — rotation in salient regions, Gaussian blur in non-salient regions. Patches are resized and mixed back into the original image. A mode selector randomly applies S²-FracMix, Mixup, CutMix, or ResizeMix.

Self-Saliency (S²) Mixing

Computes a saliency map via spectral residual, extracts multi-scale patches from high-saliency regions, applies rotation and blurring, then reinserts them into non-salient areas of the same image.

Two patch scales: h/2×w/2 and h/4×w/4
Saliency threshold t ~ Uniform(0.5, 1.0)
Rotation θ ~ Uniform(−30°, 30°)
No cross-sample interference

FracMix Data Augmentation

Injects self-similar fractal patterns exclusively within the salient patches identified by S². Uses adaptive blending ratio λ=0.20 to increase structural complexity while preserving clean contextual background.

Fractal injection only in salient regions
Fixed library of pre-computed fractals
Blending factor λ=0.20 (validated)
Preserves semantic integrity

Results

S²-FracMix consistently outperforms all baselines across 7 benchmarks.

82.74%

CIFAR-100 · ResNet-18

+0.42% vs AdAutoMix

85.35%

CIFAR-100 · Swin-T

+1.02% vs AdAutoMix

78.54%

ImageNet-1K · ResNet-50

+0.50% vs AdAutoMix

2.8%

ECE · Calibration

Best calibration

84.91%

CIFAR-100 · ResNeXt-50

+0.69% vs AdAutoMix

84.41%

CIFAR-100 · ConvNeXt-T

+0.87% vs AdAutoMix

74.27%

Tiny-ImageNet · ResNeXt-50

+1.38% vs AdAutoMix

81.2%

ImageNet-1K · ViT-B

+0.4% vs MixUp

85.73%

CUB-200 · ResNet-50

+1.16% vs AdAutoMix

88.34%

FGVC-Aircraft · ResNeXt-50

+1.18% vs AdAutoMix

92.86%

Stanford-Cars · ResNeXt-50

+1.27% vs AdAutoMix

84.42%

Transfer · CUB-200 (R50)

+1.06% vs AdAutoMix

92.86%

Transfer · Stanford-Cars (ViT-B)

+1.48% vs AdAutoMix

53.84%

Corruption Acc · CIFAR-100-C

+2.4% vs AdAutoMix

72.52%

FGSM Error (↓ better)

−3.14% vs AdAutoMix

92.78%

FracMix Ablation · Local Fractal

Best design choice

BibTeX

If you find S²-FracMix useful, please cite our paper:

BibTeX

@inproceedings{islam2026s2fracmix,
  title     = {S2-FracMix: Rethinking Data Augmentation via Single Saliency-Guided Mixup},
  author    = {Islam, Khawar and Mahmood, Arif and Jin, Xin and Akhtar, Naveed},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}