Building custom text-to-image REST API with personal frontend interface

I’ve been working with Auto1111’s stable diffusion interface to learn about image generation models and LoRA adapters. Now I want to build my own website where people can experiment with various models and LoRAs that I host on my server.

I’m planning to create custom React components for the frontend instead of using gradio widgets because I need more control over the UI design. The problem is that Auto1111’s codebase has everything mixed together in one modules folder, making it hard to separate the backend API from the frontend code.

After studying how Auto1111 works, I found these key files for image generation:

  • modelloader.py - loads .ckpt and .safetensor files from the models directory
  • processing.py - handles the generation process
  • txt2img.py - main text to image logic
  • sd_models - model management utilities

Is there an easier way to build an API for diffusion models? Or what’s the best approach to use Auto1111’s backend while creating my own React frontend?

I went through a similar process last year and ended up taking a hybrid approach. While Auto1111’s API mode works well for testing, running it in production alongside your own React frontend can create dependency issues and makes deployment more complex. Instead, I extracted the core components you mentioned and built a FastAPI wrapper around them. The key was creating clean abstractions for model loading and inference that mirror Auto1111’s functionality without the UI dependencies. You’ll need to handle the model management, CUDA memory allocation, and request queuing yourself, but this gives you much better control over performance and scaling. The initial setup takes longer than using the existing API, but pays off when you need to customize generation parameters or add features that Auto1111 doesn’t support natively.

hmm interesting project! have you considered using something like ComfyUI’s backend instead? it might be cleaner to work with than auto1111’s mixed codebase. also curious - are you planning to handle the model loading/memory management yourself or looking for existing solutions? what kind of scale are you targeting for users?

dude auto1111’s api mode might be your best bet here. just run it with --api flag and you can make http requests to localhost:7860/api/v1/txt2img. way easier than trying to extract their backend code tbh. then your react frontend can just hit those endpoints directly without messing with their spaghetti code