Otro experimento Vibe Coded, este de hecho fue el primero.
La idea es quitar o filtrar determinado elemento de una imagen o video, ya sea una persona u objeto. Se edita con la UI y se baja el resultado. SĂłlo lo probĂ© con imágenes. Básicamente hacĂ©s un recorte con unos nodos y aplica unos filtros o borra. Estuve agregando cosas tipo AI smart selection o que use modelos de HuggingFace para procesar el contenido pero necesito una API key o pagar porque tiene poco lĂmite, estoy buscando otro provider o ver quĂ© se hace. Ni lo hostiĂ© aĂşn.
Advanced AI-powered media manipulation tool with intelligent background removal and image processing capabilities powered by Hugging Face models.
- Smart Selection: AI-powered object detection using Hugging Face models
- Manual Selection: Precise point-by-point selection system
- Intelligent Removal: Advanced content-aware removal using multiple AI models
- Fallback Processing: Local processing when AI services are unavailable
- Memory Management: Automatic memory monitoring and cleanup
- Error Recovery: Robust error handling with user-friendly messages
- Dual Selection Modes: Smart AI detection or manual precision control
- Object Detection: Real-time object detection with bounding boxes
- Multiple Filters: Black Mirror-inspired effects (blur, noise, dystopian)
- Background Removal: Clean cutouts with transparent backgrounds
- Edge Smoothing: Professional-grade anti-aliasing
- Async Processing: Non-blocking operations with progress indicators
- Memory Optimization: Chunked processing to prevent browser crashes
- Queue Management: Processing queue with retry mechanisms
- Real-time Feedback: Live progress updates and system status
-
SegFormer (nvidia/segformer-b1-finetuned-cityscapes-1024-1024)
- Type: Semantic Segmentation
- Best for: General objects, urban scenes, background removal
- Description: Efficient semantic segmentation with Transformer architecture
-
DETR Panoptic (facebook/detr-resnet-50-panoptic)
- Type: Object Detection & Panoptic Segmentation
- Best for: Multiple objects, complex scenes, instance segmentation
- Description: Object detection and panoptic segmentation
-
Mask2Former (facebook/mask2former-swin-tiny-ade-semantic)
- Type: Panoptic Segmentation
- Best for: High precision, detailed masks, complex objects
- Description: State-of-the-art panoptic segmentation
-
Segment Anything Model (facebook/sam-vit-base)
- Type: Universal Segmentation
- Best for: Point prompts, any object, interactive segmentation
- Description: Universal segmentation model for any object
The application expects the following Hugging Face integration endpoints:
// Request
FormData {
image: File,
model_id: string,
model_type: 'segformer'
}
// Response
{
segments: Array<{
label: string,
score: number,
bbox: { x: number, y: number, width: number, height: number },
mask?: number[][]
}>
}// Request
FormData {
image: File,
model_id: string,
model_type: 'detr'
}
// Response
{
predictions: Array<{
label: string,
score: number,
box: { xmin: number, ymin: number, xmax: number, ymax: number }
}>
}// Request
FormData {
image: File,
model_id: string,
model_type: 'mask2former'
}
// Response
{
segments: Array<{
label: string,
score: number,
bbox: { x: number, y: number, width: number, height: number },
mask: number[][]
}>
}// Request
FormData {
image: File,
model_id: string,
model_type: 'sam',
click_point?: string // JSON: { x: number, y: number }
}
// Response
{
masks: Array<{
score: number,
bbox: { x: number, y: number, width: number, height: number },
segmentation: number[][]
}>
}// Request
FormData {
image: File,
object_data: string, // JSON: DetectedObject
task: 'inpainting'
}
// Response
ArrayBuffer // Processed image dataCopy .env.example to .env and configure your API endpoints:
# Hugging Face Configuration
VITE_HUGGINGFACE_API_KEY=your-huggingface-api-key
VITE_HUGGINGFACE_ENDPOINT=https://your-backend.com/api/ai/huggingface
# Legacy AI Service Configuration
VITE_AI_SERVICE_URL=https://your-ai-service.com/api
VITE_AI_API_KEY=your-api-key-here
# Individual Service Endpoints
VITE_U2NET_ENDPOINT=https://api.remove.bg/v1.0/removebg
VITE_U2NET_API_KEY=your-removebg-api-key
VITE_DEEPLAB_ENDPOINT=https://your-deeplab-service.com/api/segment
VITE_DEEPLAB_API_KEY=your-deeplab-api-key
VITE_REMBG_ENDPOINT=https://api.rembg.com/v1/remove
VITE_REMBG_API_KEY=your-rembg-api-keyfrom transformers import pipeline
from PIL import Image
import torch
import numpy as np
# Initialize models
segformer = pipeline("image-segmentation", model="nvidia/segformer-b1-finetuned-cityscapes-1024-1024")
detr = pipeline("object-detection", model="facebook/detr-resnet-50")
mask2former = pipeline("image-segmentation", model="facebook/mask2former-swin-tiny-ade-semantic")
@app.route('/api/ai/huggingface/segformer', methods=['POST'])
def segformer_endpoint():
image = Image.open(request.files['image'])
results = segformer(image)
segments = []
for result in results:
segments.append({
'label': result['label'],
'score': result.get('score', 0.8),
'bbox': calculate_bbox(result['mask']),
'mask': mask_to_array(result['mask'])
})
return {'segments': segments}
@app.route('/api/ai/huggingface/detr-panoptic', methods=['POST'])
def detr_endpoint():
image = Image.open(request.files['image'])
results = detr(image)
predictions = []
for result in results:
predictions.append({
'label': result['label'],
'score': result['score'],
'box': result['box']
})
return {'predictions': predictions}FROM python:3.9-slim
RUN pip install transformers torch torchvision pillow flask
COPY . /app
WORKDIR /app
EXPOSE 5000
CMD ["python", "app.py"]- Click the Brain icon to enable Smart Selection
- Click anywhere on the image to trigger AI object detection
- Detected objects will appear with colored bounding boxes
- Click on any detected object to select it
- Apply AI filters for advanced processing
- Click points around the object you want to select
- Minimum 3 points required to complete selection
- Drag points to adjust position
- Double-click points to remove them
- Apply filters once selection is complete
- Object Detection: Uses Hugging Face models to identify objects
- Segmentation: Creates precise masks for selected objects
- Inpainting: Removes objects and fills background intelligently
- Fallback: Local processing if AI services are unavailable
npm installnpm run devnpm run build- Copy
.env.exampleto.env - Configure your Hugging Face API key
- Set up backend endpoints
- Adjust memory and timeout settings as needed
- AIProcessingService: Singleton service managing AI operations with Hugging Face integration
- useAIProcessing: React hook for AI processing state management
- ProcessingModal: Real-time processing feedback UI
- ErrorBoundary: Application-level error handling
- MediaEditor: Main editor interface with dual selection modes
- Automatic Monitoring: Tracks memory usage every 5 seconds
- Garbage Collection: Forces cleanup when memory exceeds limits
- Chunked Processing: Breaks large operations into smaller chunks
- Resource Cleanup: Automatic cleanup of processing operations
- Graceful Degradation: Falls back to local processing if AI fails
- Retry Mechanisms: Automatic retries with exponential backoff
- User Feedback: Clear error messages with troubleshooting tips
- Memory Warnings: Alerts users about high memory usage
- Default limit: 512MB
- Configurable via
VITE_MAX_MEMORY_MB - Automatic cleanup when approaching limits
- Chunked processing for large images
- Non-blocking operations with progress indicators
- Intelligent model selection based on use case
- Progressive loading for better UX
-
Browser Tab Crashes
- Reduce image size
- Close other browser tabs
- Lower memory limit in configuration
-
AI Service Failures
- Check Hugging Face API key configuration
- Verify backend endpoint URLs
- Monitor rate limits and quotas
-
Slow Processing
- Use smaller images
- Reduce selection complexity
- Check network connectivity to backend
Enable debug logging by setting:
VITE_DEBUG_MODE=true- Web Workers for background processing
- Progressive Web App (PWA) support
- Offline processing capabilities
- Real-time collaboration features
- Mobile app development
- Custom model fine-tuning interface
- Batch processing capabilities
MIT License - see LICENSE file for details.