Driving Practice: Optimizing Segmentation Models in Resource-Constrained Environments

A practical study on improving semantic segmentation for autonomous driving under limited data and compute, focusing on architecture choices, coordinate-aware convolutions, and optimizer/schedule tuning.

AUTHORS Seonghu Jeon, Seohyun Park, Namseok Lee

VENUE AIKU (Korea University AI Society) Internal Project

DATE January 2026

LINKS Paper

#Autonomous Driving #Computer Vision #Deep Learning #Semantic Segmentation

Driving Practice: Optimizing Segmentation Models in Resource-Constrained Environments

Introduction
Motivation
Code Review & Lessons Learned
Methods
Results
Conclusion

Introduction

Semantic segmentation is a core component of autonomous driving perception: it assigns a class label to every pixel (e.g., road, vehicle, pedestrian). Unlike image classification, segmentation requires both (1) high-level semantics and (2) precise spatial localization.

This project (“Driving Practice”) was conducted for the AIKUTHON competition, where participants built an end-to-end deep learning pipeline (data loading → training → evaluation) under limited data and resources. Our goal was to improve segmentation performance in this constrained setting through systematic debugging and targeted model/optimization choices.

Motivation

In our first attempt at AIKUTHON, results were weaker than expected. The main issues were not only model capacity, but also basic pipeline reliability and training setup.

We decided to revisit the entire workflow—starting from the dataset split and training loop—and then test a small set of techniques that are known to work well for segmentation in practice.

Code Review & Lessons Learned

Before adding new methods, we audited our initial code and experiments.

Dataset split
- We fixed an issue where train/validation separation was not handled correctly, which made validation scores unreliable.
Augmentation (small-data behavior)
- Surprisingly, aggressive augmentations did not help in our setting.
- Keeping only Random Crop (required for DeepLab V3+ input size) produced more stable results, suggesting that heavy augmentation may have introduced noise the model could not resolve given the dataset size.
Optimizer & training stability
- We switched to AdamW and tuned the learning rate/schedule, which improved convergence stability.
- (Learning-rate values were not logged in this note; add them here if you have a run config.)
Baseline
- We used DeepLab V3+ (EfficientNet backbone) as a starting point.
- Some configurations (e.g., partially frozen backbones) underperformed and were dropped early.
- (If you want this section to be more useful, add concrete run IDs + scores here.)

Methods

We focused on four areas: architecture, spatial priors, optimization, and loss/scheduling.

1. CoordConv: adding explicit spatial signals

Standard convolutions are translation-equivariant; without extra cues, the network does not explicitly see absolute position. In driving scenes, this can matter because objects often follow spatial priors (e.g., road near the bottom, sky near the top).

We implemented CoordConv by concatenating coordinate channels to features:

add normalized coordinate maps (x) and (y) (and optionally radius (r)) as extra channels

This provides a lightweight way to inject absolute position information.

2. SegFormer: efficient global context

We also evaluated SegFormer, a transformer-based segmentation model that produces multi-scale features with a lightweight decoder. In our setting, SegFormer was attractive because it can capture global context without a heavy, high-latency decoder.

3. Optimizers (RAdam, AdaBound, AdamW)

We compared three optimizers mainly for convergence behavior and final score.

RAdam (Rectified Adam): often stabilizes early training when adaptive learning-rate variance is high.
AdaBound: constrains learning rates over time, aiming for Adam-like behavior early and SGD-like behavior later.
AdamW: decouples weight decay from the adaptive update, often improving generalization with modern architectures.

4. Loss and learning-rate scheduling

We used a hybrid loss combining BCE and Dice to balance pixel-wise classification with region overlap:

[ \mathcal{L} = \lambda,\mathcal{L}{\text{BCE}} + (1-\lambda),\mathcal{L}{\text{Dice}} ]

We also applied CosineAnnealingLR to smoothly decay the learning rate during training.

Results

Across our experiments, SegFormer + AdamW was the most consistent setup.

Model Backbone	Optimizer	IoU (2 Epochs)	Final IoU
SegFormer	AdamW	0.749	0.763
DeepLab V3+ (CoordConv)	AdamW	0.717	0.749
DeepLab V3+ (Standard Conv)	AdamW	0.705	0.740
SegFormer	RAdam	0.738	-
SegFormer	AdaBound	0.654	-

With additional iterations and submission-time tuning, we reached a best leaderboard score of 0.793.

Conclusion

This project reinforced a practical lesson for resource-constrained segmentation: performance gains often come from pipeline correctness and disciplined tuning, not only from choosing a larger model.

In our setup, (1) fixing train/val separation, (2) simplifying augmentation, and (3) using SegFormer + AdamW with a cosine schedule produced the best overall results. Future work could explore larger SegFormer variants, stronger but controlled augmentations, and synthetic data strategies to improve robustness on rare edge cases.

CONTENTS