Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 217 additions & 0 deletions docs/TIMEOUT_RESOURCE_CLEANUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Timeout Middleware & Resource Cleanup Improvements

## 🚨 Issues Fixed

### 1. **Goroutine Leak in Timeout Middleware**

**Problem:** The original implementation could leak goroutines when requests timed out because the spawned goroutine would continue running even after the main function returned.

**Fix:** Added proper goroutine lifecycle tracking:
```go
// Track goroutine completion for cleanup
goroutineDone := make(chan struct{})

go func() {
defer func() {
close(goroutineDone) // Signal goroutine completion
// ... rest of cleanup
}()
// ... request processing
}()

// Wait for goroutine cleanup in all exit paths
<-goroutineDone
```

### 2. **External API Context Cancellation**

**Problem:** External APIs didn't respect context cancellation, potentially continuing to run after timeout.

**Fix:** Updated `TranslationWebAPI.Translate()` to support context cancellation:
```go
func (t *TranslationWebAPI) Translate(ctx context.Context, translation entity.Translation) (entity.Translation, error) {
// Run in goroutine with context cancellation support
select {
case result := <-resultChan:
return result.result, result.err
case <-ctx.Done():
return entity.Translation{}, fmt.Errorf("context cancelled: %w", ctx.Err())
}
}
```

### 3. **Resource Monitoring System**

**Added:** Comprehensive resource monitoring to detect memory leaks:
- Active request tracking
- Goroutine count monitoring
- Memory usage statistics
- Leak detection for long-running requests

## 🛠️ Implementation Details

### Timeout Middleware Improvements

#### **Buffered Channels**
```go
// Prevents blocking if main routine already returned
finished := make(chan struct{}, 1)
panicChan := make(chan interface{}, 1)
```

#### **Non-blocking Channel Operations**
```go
select {
case finished <- struct{}{}:
default: // Prevent blocking if main routine already returned
}
```

#### **Cleanup Goroutine**
```go
// Start cleanup goroutine to wait for the original goroutine
go func() {
<-goroutineDone
fmt.Printf("[CLEANUP] Goroutine for %s request cleanup completed\n", c.Request.URL.Path)
}()
```

### Resource Monitoring Features

#### **Active Request Tracking**
- Tracks all active HTTP requests
- Monitors request duration
- Detects context cancellation vs active requests

#### **Memory Leak Detection**
```go
func (rm *ResourceMonitor) CheckForLeaks(maxDuration time.Duration) []string {
// Identifies requests running too long
// Detects cancelled contexts still active
// Returns leak reports
}
```

#### **Real-time Statistics**
- Goroutine count
- Active requests count
- Memory allocation (heap, total, system)
- Garbage collection runs
- Longest running request duration

## 🔧 Configuration

### 1. **Enable Resource Monitoring**

Add to your router setup:
```go
// In your main router setup
middleware.StartLeakDetector(30*time.Second, 60*time.Second) // Check every 30s, max request 60s

// Add resource monitoring middleware
app.Use(middleware.ResourceMonitorMiddleware())

// Add monitoring endpoints
v1.NewMonitoringRoutes(apiV1Group, logger)
```

### 2. **Monitoring Endpoints**

**GET /api/v1/monitoring/resources**
```json
{
"status": "success",
"data": {
"goroutines": 45,
"active_requests": 3,
"memory_alloc_mb": 12.5,
"memory_total_alloc_mb": 156.7,
"memory_sys_mb": 25.8,
"gc_runs": 12,
"longest_request_duration": "2.5s"
}
}
```

### 3. **Leak Detection Alerts**

The system automatically logs warnings:
```
[RESOURCE_LEAK_DETECTED] Request POST-/api/users-20240115123045.123456 context cancelled but still active after 35s
[RESOURCE_WARNING] Goroutines: 150 Active requests: 75
[CLEANUP] Goroutine for /api/users request cleanup completed
```

## 📊 Context Propagation Verification

### Database Operations ✅
```go
// All database operations properly use context
err = r.Pool.QueryRow(ctx, sql, args...).Scan(&id)
```

### Redis Operations ✅
```go
// Redis operations have proper timeouts
ctx, cancel := context.WithTimeout(ctx, _defaultTimeout)
defer cancel()
return r.r.Client().Set(ctx, key, value, 0).Err()
```

### External APIs ✅
```go
// Now properly supports context cancellation
translation, err := uc.webAPI.Translate(ctx, t)
```

## 🚀 Benefits

1. **Memory Leak Prevention:** Goroutines are properly cleaned up
2. **Resource Monitoring:** Real-time visibility into system resources
3. **Early Detection:** Automatic leak detection and alerting
4. **Context Propagation:** All layers respect timeout cancellation
5. **Graceful Degradation:** Proper error responses instead of connection drops

## 🔍 Testing Resource Cleanup

### 1. **Load Testing**
```bash
# Send concurrent requests to test goroutine cleanup
for i in {1..100}; do
curl -X POST http://localhost:10000/api/v1/translation/do-translate \
-H "Content-Type: application/json" \
-d '{"source":"en","destination":"vi","original":"hello"}' &
done
```

### 2. **Monitor Resources**
```bash
# Check resource stats
curl http://localhost:10000/api/v1/monitoring/resources

# Watch for leak detection logs
tail -f logs/app.log | grep "RESOURCE_LEAK\|CLEANUP"
```

### 3. **Timeout Testing**
```bash
# Test timeout behavior with slow endpoints
curl -X POST http://localhost:10000/api/v1/slow-endpoint \
--max-time 35 # Should timeout at 30s with proper cleanup
```

## 🎯 Next Steps

1. **Integrate with Metrics:** Export to Prometheus/Grafana
2. **Alerting:** Set up alerts for resource thresholds
3. **Dashboard:** Create resource monitoring dashboard
4. **Profiling:** Add pprof endpoints for detailed analysis
5. **Circuit Breaker:** Implement circuit breaker for external APIs

## 🚨 Production Recommendations

1. **Monitor Goroutine Count:** Alert if > 1000 goroutines
2. **Track Memory Growth:** Alert on sustained memory increase
3. **Request Duration:** Alert on requests > 60s
4. **Active Requests:** Alert if > 100 concurrent requests
5. **Regular Cleanup:** Schedule periodic resource cleanup checks
168 changes: 168 additions & 0 deletions internal/controller/http/middleware/resource_monitor.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
package middleware

import (
"runtime"
"sync"
"time"

"github.com/gin-gonic/gin"
)

// ResourceMonitor tracks active requests and goroutines
type ResourceMonitor struct {
activeRequests map[string]*RequestInfo
mutex sync.RWMutex
}

// RequestInfo holds information about an active request
type RequestInfo struct {
StartTime time.Time
Path string
Method string
GoroutineID int
ContextDone <-chan struct{}
}

var (
globalMonitor = &ResourceMonitor{
activeRequests: make(map[string]*RequestInfo),
}
)

// ResourceMonitorMiddleware creates middleware to monitor resource usage
func ResourceMonitorMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
requestID := generateRequestID(c)

// Track request start
globalMonitor.TrackRequest(requestID, &RequestInfo{
StartTime: time.Now(),
Path: c.Request.URL.Path,
Method: c.Request.Method,
GoroutineID: runtime.NumGoroutine(),
ContextDone: c.Request.Context().Done(),
})

// Clean up after request
defer func() {
globalMonitor.UntrackRequest(requestID)
}()

c.Next()
}
}

// TrackRequest adds a request to monitoring
func (rm *ResourceMonitor) TrackRequest(id string, info *RequestInfo) {
rm.mutex.Lock()
defer rm.mutex.Unlock()
rm.activeRequests[id] = info
}

// UntrackRequest removes a request from monitoring
func (rm *ResourceMonitor) UntrackRequest(id string) {
rm.mutex.Lock()
defer rm.mutex.Unlock()
delete(rm.activeRequests, id)
}

// GetActiveRequests returns current active requests
func (rm *ResourceMonitor) GetActiveRequests() map[string]*RequestInfo {
rm.mutex.RLock()
defer rm.mutex.RUnlock()

result := make(map[string]*RequestInfo)
for k, v := range rm.activeRequests {
result[k] = v
}
return result
}

// CheckForLeaks identifies potentially leaked resources
func (rm *ResourceMonitor) CheckForLeaks(maxDuration time.Duration) []string {
rm.mutex.RLock()
defer rm.mutex.RUnlock()

var leaks []string
now := time.Now()

for id, info := range rm.activeRequests {
if now.Sub(info.StartTime) > maxDuration {
// Check if context is cancelled but request still active
select {
case <-info.ContextDone:
leaks = append(leaks, "Request "+id+" context cancelled but still active after "+now.Sub(info.StartTime).String())
default:
if now.Sub(info.StartTime) > maxDuration*2 {
leaks = append(leaks, "Request "+id+" running too long: "+now.Sub(info.StartTime).String())
}
}
}
}

return leaks
}

// StartLeakDetector starts a background goroutine to detect leaks
func StartLeakDetector(interval time.Duration, maxRequestDuration time.Duration) {
go func() {
ticker := time.NewTicker(interval)
defer ticker.Stop()

for range ticker.C {
leaks := globalMonitor.CheckForLeaks(maxRequestDuration)
if len(leaks) > 0 {
for _, leak := range leaks {
// Log leak detection
// You can integrate with your logging system
println("[RESOURCE_LEAK_DETECTED]", leak)
}
}

// Log current goroutine count
goroutineCount := runtime.NumGoroutine()
activeRequests := len(globalMonitor.GetActiveRequests())

if goroutineCount > 100 || activeRequests > 50 { // Adjust thresholds as needed
println("[RESOURCE_WARNING] Goroutines:", goroutineCount, "Active requests:", activeRequests)
}
}
}()
}

// GetResourceStats returns current resource statistics
func GetResourceStats() map[string]interface{} {
var m runtime.MemStats
runtime.GC() // Force garbage collection for accurate stats
runtime.ReadMemStats(&m)

activeRequests := globalMonitor.GetActiveRequests()

return map[string]interface{}{
"goroutines": runtime.NumGoroutine(),
"active_requests": len(activeRequests),
"memory_alloc_mb": float64(m.Alloc) / 1024 / 1024,
"memory_total_alloc_mb": float64(m.TotalAlloc) / 1024 / 1024,
"memory_sys_mb": float64(m.Sys) / 1024 / 1024,
"gc_runs": m.NumGC,
"longest_request_duration": getLongestRequestDuration(activeRequests),
}
}

func getLongestRequestDuration(requests map[string]*RequestInfo) time.Duration {
var longest time.Duration
now := time.Now()

for _, info := range requests {
duration := now.Sub(info.StartTime)
if duration > longest {
longest = duration
}
}

return longest
}

func generateRequestID(c *gin.Context) string {
return c.Request.Method + "-" + c.Request.URL.Path + "-" + time.Now().Format("20060102150405.000000")
}
Loading