Skip to content

Commit 86588f6

Browse files
Lokiiiiiiethnzhng
andauthored
Add adapter security validation to Secure Mode plugin (#2970)
Co-authored-by: ethnzhng <[email protected]>
1 parent ef03cfa commit 86588f6

File tree

10 files changed

+1337
-143
lines changed

10 files changed

+1337
-143
lines changed

.github/workflows/pyunit.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
- name: install Python Dependencies
1919
run: |
2020
cd engines/python/setup
21-
pip install -e ".[test]"
21+
pip install --no-cache-dir -e ".[test]"
2222
- name: run pytests
2323
run: |
2424
cd engines/python/setup

.kiro/steering/secure-mode.md

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
# Secure Mode Plugin
2+
3+
## Overview
4+
5+
The Secure Mode plugin (`plugins/secure-mode/`) implements security validation for SageMaker deployments. It restricts model configurations to a safe subset of options and validates files before models load.
6+
7+
## When to Work on This
8+
9+
- Adding new security controls for SageMaker requirements
10+
- Expanding the properties allowlist for new features
11+
- Fixing security vulnerabilities in model loading
12+
- Updating file format validation rules
13+
- SageMaker platform integration changes
14+
15+
## Architecture
16+
17+
**Event-Driven Design:**
18+
- Plugin registers `SecureModeModelServerListener` on startup
19+
- Listener intercepts `onModelConfigured()` events
20+
- Validation runs before model loads
21+
- Throws `IllegalConfigurationException` on any violation
22+
23+
**Key Components:**
24+
- `SecureModePlugin` - Entry point, registers listener when `SAGEMAKER_SECURE_MODE=true`
25+
- `SecureModeModelServerListener` - Event listener that triggers validation
26+
- `SecureModeUtils` - Core validation logic (properties, options, file scanning)
27+
- `SecureModeAllowList` - Defines allowlisted properties and Python executables
28+
29+
## Security Controls
30+
31+
### Control Types
32+
33+
1. **DISALLOW_REQUIREMENTS_TXT** - Blocks `requirements.txt` in model directory
34+
2. **DISALLOW_PICKLE_FILES** - Only allows Safetensors (blocks .bin, .pt, .pth, .ckpt, .pkl)
35+
3. **DISALLOW_TRUST_REMOTE_CODE** - Prevents `trust_remote_code=true`
36+
4. **DISALLOW_CUSTOM_INFERENCE_SCRIPTS** - Only allows `djl_python.*` entrypoints
37+
5. **DISALLOW_CHAT_TEMPLATE** - Blocks Jinja templates in `tokenizer_config.json`
38+
39+
### Validation Layers
40+
41+
**Layer 1: Properties Allowlist**
42+
- Validates all properties against `PROPERTIES_ALLOWLIST`
43+
- ~100 properties explicitly allowed
44+
- Any other property throws exception
45+
46+
**Layer 2: Options Check**
47+
- Validates specific options based on enabled controls
48+
- Checks both properties file and environment variables
49+
- Validates Python executable path
50+
51+
**Layer 3: File Scanning**
52+
- Recursively scans `SAGEMAKER_UNTRUSTED_CHANNELS` directories
53+
- Checks file extensions and content based on controls
54+
- Parses JSON files for forbidden fields
55+
56+
## Environment Variables
57+
58+
**Platform-Controlled (SageMaker only):**
59+
- `SAGEMAKER_SECURE_MODE` - Enable/disable (true/false)
60+
- `SAGEMAKER_SECURITY_CONTROLS` - Comma-separated control list
61+
- `SAGEMAKER_UNTRUSTED_CHANNELS` - Comma-separated paths to scan
62+
63+
**User-Configurable (validated by plugin):**
64+
- `OPTION_*` - Model options (validated against allowlist)
65+
- `DJL_ENTRY_POINT` - Entrypoint (must be `djl_python.*`)
66+
67+
## Development Guidelines
68+
69+
### Adding New Allowlisted Properties
70+
71+
1. Add to `SecureModeAllowList.PROPERTIES_ALLOWLIST`
72+
2. Document in README
73+
3. Add test case in `SecureModePluginTest`
74+
75+
```java
76+
public static final Set<String> PROPERTIES_ALLOWLIST = Set.of(
77+
// ... existing properties
78+
"option.new_property" // Add here
79+
);
80+
```
81+
82+
### Adding New Security Controls
83+
84+
1. Define constant in `SecureModeUtils`:
85+
```java
86+
static final String NEW_CONTROL = "DISALLOW_NEW_FEATURE";
87+
```
88+
89+
2. Add validation logic in `checkOptions()` or `scanForbiddenFiles()`
90+
3. Update README with control description
91+
4. Add comprehensive test cases
92+
93+
### Testing Approach
94+
95+
**Unit Tests (`SecureModePluginTest`):**
96+
- Mock environment variables with `MockedStatic<Utils>`
97+
- Create temporary files in `build/mock/` directories
98+
- Test each control independently
99+
- Test valid configurations that should pass
100+
- Test environment variable vs properties file
101+
102+
**Test Pattern:**
103+
```java
104+
@Test(expectedExceptions = IllegalConfigurationException.class)
105+
void testNewControl() throws IOException, ModelException {
106+
mockSecurityEnv(
107+
SecureModeUtils.NEW_CONTROL,
108+
TEST_MODEL_DIR.resolve("file.txt"),
109+
"content"
110+
);
111+
}
112+
```
113+
114+
### Common Patterns
115+
116+
**Checking Properties:**
117+
```java
118+
String value = prop.getProperty("option.key");
119+
if (value != null && !isAllowed(value)) {
120+
throw new IllegalConfigurationException("Message");
121+
}
122+
```
123+
124+
**Checking Environment Variables:**
125+
```java
126+
String value = Utils.getenv("ENV_VAR", Utils.getenv("FALLBACK"));
127+
if (value != null && !isAllowed(value)) {
128+
throw new IllegalConfigurationException("Message");
129+
}
130+
```
131+
132+
**File Scanning:**
133+
```java
134+
try (Stream<Path> stream = Files.walk(dir)) {
135+
for (Path p : stream.collect(Collectors.toList())) {
136+
if (isForbidden(p)) {
137+
throw new IllegalConfigurationException("Message");
138+
}
139+
}
140+
}
141+
```
142+
143+
## Integration Points
144+
145+
### Model Loading Flow
146+
147+
```
148+
ModelServer.registerModel()
149+
150+
EventManager.notifyListeners(MODEL_CONFIGURED)
151+
152+
SecureModeModelServerListener.onModelConfigured()
153+
154+
SecureModeUtils.validateSecurity()
155+
156+
[Validation passes] → Model loads
157+
[Validation fails] → IllegalConfigurationException → Model rejected
158+
```
159+
160+
### Adapter Loading Flow
161+
162+
```
163+
Adapter.newInstance() [COMMON PATH for both static and dynamic]
164+
165+
Adapter.validateAdapterSecurity()
166+
167+
[Validation passes] → Adapter created → register() → modelInfo.registerAdapter()
168+
[Validation fails] → IllegalConfigurationException → Adapter NOT created → NOT in LIST API
169+
```
170+
171+
**Static Adapters (Workflow Config):**
172+
```
173+
AdapterWorkflowFunction.prepare()
174+
175+
Adapter.newInstance() → validateAdapterSecurity()
176+
177+
[Fails] → Workflow initialization fails → Model doesn't load
178+
```
179+
180+
**Dynamic Adapters (API):**
181+
```
182+
POST /models/{model}/adapters
183+
184+
AdapterManagementRequestHandler.handleRegisterAdapter()
185+
186+
Adapter.newInstance() → validateAdapterSecurity()
187+
188+
[Fails] → HTTP error response → Adapter NOT registered
189+
```
190+
191+
### Engine Filtering
192+
193+
Only Python and MPI engines are validated:
194+
```java
195+
String engine = modelInfo.getEngineName();
196+
if (!"Python".equals(engine) && !"MPI".equals(engine)) {
197+
logger.info("Skipping security check for engine: {}", engine);
198+
return;
199+
}
200+
```
201+
202+
### SageMaker Platform
203+
204+
- SageMaker sets environment variables in container
205+
- Plugin automatically activates when `SAGEMAKER_SECURE_MODE=true`
206+
- No user configuration required
207+
- Validation happens before model serves any requests
208+
209+
## Troubleshooting
210+
211+
### Common Issues
212+
213+
**"Security Controls environment variable is not set"**
214+
- `SAGEMAKER_SECURITY_CONTROLS` must be set when secure mode is enabled
215+
- SageMaker platform should set this automatically
216+
217+
**"Property X is prohibited from being set in Secure Mode"**
218+
- Property not in allowlist
219+
- Add to `PROPERTIES_ALLOWLIST` if it should be allowed
220+
221+
**"Custom entrypoint is prohibited in Secure Mode"**
222+
- Entrypoint must start with `djl_python.`
223+
- Check `option.entryPoint`, `DJL_ENTRY_POINT`, `OPTION_ENTRYPOINT`
224+
- Ensure no `model.py` file exists
225+
226+
**"Pickle-based files found in directory"**
227+
- Convert model to Safetensors format
228+
- Remove .bin, .pt, .pth, .ckpt, .pkl files from untrusted directories
229+
230+
### Debugging
231+
232+
Enable debug logging:
233+
```properties
234+
# log4j2.xml
235+
<Logger name="ai.djl.serving.plugins.securemode" level="debug"/>
236+
```
237+
238+
Check environment variables:
239+
```bash
240+
echo $SAGEMAKER_SECURE_MODE
241+
echo $SAGEMAKER_SECURITY_CONTROLS
242+
echo $SAGEMAKER_UNTRUSTED_CHANNELS
243+
```
244+
245+
## Performance Considerations
246+
247+
- File scanning is recursive and may be slow for large directories
248+
- Validation runs once per model at load time (not per request)
249+
- Properties validation is fast (Set lookup)
250+
- JSON parsing only for `tokenizer_config.json` files
251+
252+
## Security Best Practices
253+
254+
1. **Minimize allowlist** - Only add properties that are truly needed
255+
2. **Validate early** - Fail fast before model loads
256+
3. **Clear error messages** - Help users understand what's blocked and why
257+
4. **Test thoroughly** - Each control should have positive and negative tests
258+
5. **Document changes** - Update README when modifying allowlist or controls
259+
260+
## Adapter Security (NEW)
261+
262+
### Implementation
263+
264+
Adapter validation is implemented in `wlm/src/main/java/ai/djl/serving/wlm/Adapter.java`:
265+
266+
```java
267+
public static <I, O> Adapter<I, O> newInstance(...) {
268+
// Validate adapter security FIRST
269+
validateAdapterSecurity(src);
270+
271+
// Then create adapter
272+
return new PyAdapter(...);
273+
}
274+
275+
private static void validateAdapterSecurity(String adapterPath) {
276+
// Check for model.py, requirements.txt, pickle files
277+
}
278+
```
279+
280+
### What Gets Validated
281+
282+
When adapters are loaded (statically or dynamically):
283+
284+
1. **DISALLOW_CUSTOM_INFERENCE_SCRIPTS** - Blocks `model.py` in adapter directory
285+
2. **DISALLOW_REQUIREMENTS_TXT** - Blocks `requirements.txt` in adapter directory
286+
3. **DISALLOW_PICKLE_FILES** - Blocks `.bin`, `.pt`, `.pth`, `.ckpt`, `.pkl` files (recursively)
287+
288+
### Testing
289+
290+
Comprehensive tests in `SecureModeAdapterTest.java`:
291+
- Adapter with `model.py` → blocked
292+
- Adapter with `requirements.txt` → blocked
293+
- Adapter with pickle files → blocked
294+
- Adapter with `.safetensors` → allowed
295+
- Multiple violations → fails on first
296+
- Secure mode disabled → all allowed
297+
298+
### Key Points
299+
300+
- **Single validation point**: `Adapter.newInstance()` is the common path for all adapter loading
301+
- **Fail fast**: Validation happens before adapter object is created
302+
- **Hard fail**: Exception propagates, adapter never registered
303+
- **Clean state**: Failed adapters don't appear in LIST API
304+
305+
## Related Code
306+
307+
- `wlm/src/main/java/ai/djl/serving/wlm/Adapter.java` - Adapter creation and validation
308+
- `wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java` - Model and adapter registry
309+
- `serving/src/main/java/ai/djl/serving/http/AdapterManagementRequestHandler.java` - Dynamic adapter API
310+
- `serving/src/main/java/ai/djl/serving/workflow/function/AdapterWorkflowFunction.java` - Static adapter loading
311+
- `serving/src/main/java/ai/djl/serving/wlm/util/EventManager.java` - Event system
312+
- `serving/src/main/java/ai/djl/serving/http/IllegalConfigurationException.java` - Exception type
313+
- `engines/python/setup/djl_python/adapter_manager_mixin.py` - Python adapter management
314+
- `engines/python/` - Python engine that secure mode validates
315+
316+
## References
317+
318+
- Plugin system: `serving/docs/plugin_management.md`
319+
- Model configuration: `serving/docs/configuration.md`
320+
- SageMaker integration: AWS SageMaker documentation

0 commit comments

Comments
 (0)