S²-FracMix: Leveraging Self-Saliency Data Augmentation

Khawar Islam1★ Arif Mahmood2 Xin Jin3 Naveed Akhtar1
1 The University of Melbourne, Australia · 2 Information Technology University, Pakistan · 3 Westlake University, China
Corresponding Author
European Conference on Computer Vision (ECCV) 2026
100 Years of Data Augmentation
  Paper (Soon)   Complete Paper (Soon)   Video (Soon)   Poster (Soon)   Code (Soon)   Results   BibTeX

Abstract

Data augmentation is known to improve generalization of deep visual models. Recent methods favor mixup strategies that generate interpolated samples to improve model performance. However, these techniques not only incur significant computational overhead, they also lead to semantic disruption of augmentation data due to cross-sample mixing.

We first propose Self-Saliency (S²) Mixup, which constructs challenging yet label-consistent samples by extracting multi-scale salient patches and reinserting them into non-salient regions of the same image. This promotes scale-invariant feature learning while avoiding cross-sample interference. To further enhance robustness, we introduce FracMix, a mixing scheme that injects self-similarity patterns into salient regions using adaptive ratios.

Collectively, S²-FracMix enables simultaneous learning from fractal and non-fractal structures within a single image, yielding a targeted and structurally coherent augmentation strategy — establishing state-of-the-art performance across classification, robustness, calibration, detection, and transfer learning.

Keywords: Data Augmentation Mixup CutMix Self-Saliency Mixup Saliency-Guided Augmentation Fractal Augmentation Fractal Patterns Generalization Robustness Corruption Robustness Adversarial Robustness Model Calibration Image Classification Fine-Grained Classification Transfer Learning Object Detection Contrastive Learning Deep Learning Computer Vision Vision Transformer Convolutional Neural Networks Label-Preserving Augmentation Scale-Invariant Learning Vicinal Risk Minimization

Method

S2-FracMix data augmentation pipeline architecture diagram showing self-saliency mixup and fractal mixing for image classification

Figure 1. Overview of S²-FracMix. Given an input image, a saliency map is computed via spectral residual. Multi-scale patches are extracted and accepted based on a saliency threshold. Accepted patches undergo fractal blending (FracMix) then selective transformation — rotation in salient regions, Gaussian blur in non-salient regions. Patches are resized and mixed back into the original image. A mode selector randomly applies S²-FracMix, Mixup, CutMix, or ResizeMix.

Self-Saliency (S²) Mixing

Computes a saliency map via spectral residual, extracts multi-scale patches from high-saliency regions, applies rotation and blurring, then reinserts them into non-salient areas of the same image.

  • Two patch scales: h/2×w/2 and h/4×w/4
  • Saliency threshold t ~ Uniform(0.5, 1.0)
  • Rotation θ ~ Uniform(−30°, 30°)
  • No cross-sample interference

FracMix Data Augmentation

Injects self-similar fractal patterns exclusively within the salient patches identified by S². Uses adaptive blending ratio λ=0.20 to increase structural complexity while preserving clean contextual background.

  • Fractal injection only in salient regions
  • Fixed library of pre-computed fractals
  • Blending factor λ=0.20 (validated)
  • Preserves semantic integrity

Results

S²-FracMix consistently outperforms all baselines across 7 benchmarks.

82.74%
CIFAR-100 · ResNet-18
+0.42% vs AdAutoMix
85.35%
CIFAR-100 · Swin-T
+1.02% vs AdAutoMix
78.54%
ImageNet-1K · ResNet-50
+0.50% vs AdAutoMix
2.8%
ECE · Calibration
Best calibration
84.91%
CIFAR-100 · ResNeXt-50
+0.69% vs AdAutoMix
84.41%
CIFAR-100 · ConvNeXt-T
+0.87% vs AdAutoMix
74.27%
Tiny-ImageNet · ResNeXt-50
+1.38% vs AdAutoMix
81.2%
ImageNet-1K · ViT-B
+0.4% vs MixUp
85.73%
CUB-200 · ResNet-50
+1.16% vs AdAutoMix
88.34%
FGVC-Aircraft · ResNeXt-50
+1.18% vs AdAutoMix
92.86%
Stanford-Cars · ResNeXt-50
+1.27% vs AdAutoMix
84.42%
Transfer · CUB-200 (R50)
+1.06% vs AdAutoMix
92.86%
Transfer · Stanford-Cars (ViT-B)
+1.48% vs AdAutoMix
53.84%
Corruption Acc · CIFAR-100-C
+2.4% vs AdAutoMix
72.52%
FGSM Error (↓ better)
−3.14% vs AdAutoMix
92.78%
FracMix Ablation · Local Fractal
Best design choice

BibTeX

If you find S²-FracMix useful, please cite our paper:

  BibTeX
@inproceedings{islam2026s2fracmix,
  title     = {S2-FracMix: Rethinking Data Augmentation via Single Saliency-Guided Mixup},
  author    = {Islam, Khawar and Mahmood, Arif and Jin, Xin and Akhtar, Naveed},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}