Skip to content

Commit 56b91d3

Browse files
synchronize from upstream
Signed-off-by: juliusvonkohout <[email protected]>
1 parent 3ff3dc8 commit 56b91d3

File tree

9 files changed

+417
-164
lines changed

9 files changed

+417
-164
lines changed

experimental/seaweedfs/OWNERS

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
approvers:
2-
# - pschoen-itsc
32
- juliusvonkohout
43
reviewers:
5-
# - pschoen-itsc
4+
- pschoen-itsc
65
- juliusvonkohout

experimental/seaweedfs/README.md

Lines changed: 123 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,15 +33,137 @@ kubectl kustomize ./base/ | kubectl apply -f -
3333
## Verify deployment
3434

3535
Run
36+
3637
```bash
3738
./test.sh
3839
```
40+
3941
With the ready check on the container it already verifies that the S3 starts correctly.
40-
You can then use it with the endpoint at http://localhost:8333.
42+
You can then use it with the endpoint at <http://localhost:8333>.
4143
To create access keys open a shell on the pod and use `weed shell` to configure your instance.
4244
Create a user with the command `s3.configure -user <username> -access_key <access-key> -secret-key <secret-key> -actions Read:<my-bucket>/<my-prefix>,Write::<my-bucket>/<my-prefix> -apply`
4345
Documentation for this can also be found [here](https://github.com/seaweedfs/seaweedfs/wiki/Amazon-S3-API).
4446

47+
## Gateway to Remote Object Storage
48+
49+
The Gateway to Remote Object Storage feature allows SeaweedFS to automatically synchronize local storage with remote cloud storage providers (AWS S3, Azure Blob Storage, Google Cloud Storage). This enables:
50+
51+
- **Automatic Bucket Synchronization**: Local new buckets are automatically created in remote storage
52+
- **Bidirectional Sync**: Changes in local storage are uploaded to remote storage
53+
- **Automatic Cleanup**: Local deleted buckets are automatically deleted in remote storage
54+
- **Multi-Cloud Support**: Connect to multiple cloud storage providers simultaneously
55+
56+
### Configure Remote Storage
57+
58+
Remote storage must be configured before using the gateway. Use the `weed shell` to configure remote storage connections:
59+
60+
#### 1. Access SeaweedFS Shell
61+
62+
```bash
63+
kubectl exec -n kubeflow deployment/seaweedfs -it -- weed shell
64+
```
65+
66+
#### 2. Configure Remote Storage
67+
68+
**AWS S3 Configuration:**
69+
70+
```bash
71+
# Configure AWS S3 remote storage
72+
remote.configure -name=aws1 -type=s3 -s3.access_key=YOUR_ACCESS_KEY -s3.secret_key=YOUR_SECRET_KEY -s3.region=us-east-1 -s3.endpoint=s3.amazonaws.com -s3.storage_class="STANDARD"
73+
```
74+
75+
**Azure Blob Storage Configuration:**
76+
77+
```bash
78+
# Configure Azure Blob Storage
79+
remote.configure -name=azure1 -type=azure -azure.account_name=YOUR_ACCOUNT_NAME -azure.account_key=YOUR_ACCOUNT_KEY
80+
```
81+
82+
**Google Cloud Storage Configuration:**
83+
84+
```bash
85+
# Configure Google Cloud Storage
86+
remote.configure -name=gcs1 -type=gcs -gcs.appCredentialsFile=/path/to/service-account-file.json
87+
```
88+
89+
#### 3. View and Manage Configurations
90+
91+
```bash
92+
# List all remote storage configurations
93+
remote.configure
94+
95+
# Delete a configuration
96+
remote.configure -delete -name=aws1
97+
```
98+
99+
### Setup Gateway to Remote Storage
100+
101+
#### Step 1: Mount Existing Remote Buckets (Optional)
102+
103+
If you have existing buckets in remote storage, mount them as local buckets:
104+
105+
```bash
106+
# In weed shell
107+
remote.mount.buckets -remote=aws1 -apply
108+
```
109+
110+
#### Step 2: Start the Remote Gateway
111+
112+
The gateway process continuously monitors local changes and syncs them to remote storage.
113+
114+
**Basic Gateway Setup:**
115+
116+
```bash
117+
# Start the gateway (run this in the SeaweedFS deployment)
118+
kubectl exec -n kubeflow deployment/seaweedfs -- weed filer.remote.gateway -createBucketAt=aws1
119+
```
120+
121+
**Gateway with Random Suffix (for unique bucket names):**
122+
123+
```bash
124+
# Some cloud providers require globally unique bucket names
125+
kubectl exec -n kubeflow deployment/seaweedfs -- weed filer.remote.gateway -createBucketAt=aws1 -createBucketWithRandomSuffix
126+
```
127+
128+
#### Step 3(Optional): Cache Management
129+
130+
Optimize performance by managing cache:
131+
132+
```bash
133+
# In weed shell
134+
135+
# Cache all PDF files in all mounted buckets
136+
remote.cache -include=*.pdf
137+
138+
# Cache all PDF files in a specific bucket
139+
remote.cache -dir=/buckets/some-bucket -include=*.pdf
140+
141+
# Uncache files older than 1 hour and larger than 10KB
142+
remote.uncache -minAge=3600 -minSize=10240
143+
```
144+
145+
### Troubleshooting
146+
147+
**Common Issues:**
148+
149+
- **Configuration not found**: Ensure remote storage is configured before starting gateway
150+
- **Permission denied**: Check cloud storage credentials and permissions
151+
- **Connection timeout**: Verify network connectivity to cloud storage
152+
- **Bucket conflicts**: Use random suffix for globally unique bucket names
153+
154+
**Debug Commands:**
155+
156+
```bash
157+
# Check remote configurations
158+
kubectl exec -n kubeflow deployment/seaweedfs -- weed shell -c "remote.configure"
159+
160+
# Check mounted buckets
161+
kubectl exec -n kubeflow deployment/seaweedfs -- weed shell -c "remote.mount.buckets -remote=aws1"
162+
163+
# Check gateway logs
164+
kubectl logs -n kubeflow deployment/seaweedfs -f
165+
```
166+
45167
## Uninstall SeaweedFS
46168

47169
```bash

experimental/seaweedfs/base/pipeline-profile-controller/deployment.yaml

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,11 @@ kind: Deployment
33
metadata:
44
name: kubeflow-pipelines-profile-controller
55
spec:
6+
replicas: 1
67
template:
8+
metadata:
9+
labels:
10+
sidecar.istio.io/inject: "false"
711
spec:
812
securityContext:
913
seccompProfile:
@@ -21,23 +25,55 @@ spec:
2125
# We just need an image with the python botocore library installed
2226
image: docker.io/alpine/k8s:1.32.3
2327
command: ["python", "/hooks/sync.py"]
28+
envFrom:
29+
- configMapRef:
30+
name: kubeflow-pipelines-profile-controller-env
2431
env:
2532
- name: KFP_VERSION
2633
valueFrom:
2734
configMapKeyRef:
2835
name: pipeline-install-config
2936
key: appVersion
3037
- name: AWS_ENDPOINT_URL
31-
value: http://seaweedfs:8111
38+
value: http://seaweedfs.kubeflow:8111
39+
- name: S3_ENDPOINT_URL
40+
value: http://seaweedfs.kubeflow:8333
3241
- name: AWS_REGION
3342
value: us-east-1
3443
- name: AWS_ACCESS_KEY_ID
3544
valueFrom:
3645
secretKeyRef:
37-
name: mlpipeline-minio-artifact
3846
key: accesskey
47+
name: mlpipeline-minio-artifact
3948
- name: AWS_SECRET_ACCESS_KEY
4049
valueFrom:
4150
secretKeyRef:
42-
name: mlpipeline-minio-artifact
4351
key: secretkey
52+
name: mlpipeline-minio-artifact
53+
- name: KFP_DEFAULT_PIPELINE_ROOT
54+
valueFrom:
55+
configMapKeyRef:
56+
key: defaultPipelineRoot
57+
name: pipeline-install-config
58+
optional: true
59+
- name: ARTIFACTS_PROXY_ENABLED
60+
valueFrom:
61+
configMapKeyRef:
62+
key: ARTIFACTS_PROXY_ENABLED
63+
name: pipeline-install-config
64+
optional: true
65+
- name: ARTIFACT_RETENTION_DAYS
66+
valueFrom:
67+
configMapKeyRef:
68+
key: ARTIFACT_RETENTION_DAYS
69+
name: pipeline-install-config
70+
optional: true
71+
volumeMounts:
72+
- name: hooks
73+
mountPath: /hooks
74+
ports:
75+
- containerPort: 8080
76+
volumes:
77+
- name: hooks
78+
configMap:
79+
name: kubeflow-pipelines-profile-controller-code

0 commit comments

Comments
 (0)