Skip to content

Conversation

@ErykKul
Copy link
Contributor

@ErykKul ErykKul commented Dec 5, 2025

What this PR does / why we need it:

Adds configurable file upload options to support S3-compatible storage systems (like MinIO) that may not support all S3 features.

This PR introduces a new FilesConfig class that allows configuring:

  • useS3Tagging: Whether to include S3 tagging header (x-amz-tagging: dv-state=temp). Set to false for S3-compatible storage that doesn't support object tagging. Default: true
  • maxMultipartRetries: Maximum number of retries for multipart upload parts. Default: 5
  • fileUploadTimeoutMs: Timeout in milliseconds for file upload operations. Default: 60000

Usage Example:

import { FilesConfig, uploadFile } from '@iqss/dataverse-client-javascript'

// Configure before first upload
FilesConfig.init({
  useS3Tagging: false,           // Disable for MinIO without tagging support
  maxMultipartRetries: 3,        // Custom retry count
  fileUploadTimeoutMs: 120000    // 2 minute timeout
})

// Then use uploadFile as normal
await uploadFile.execute(datasetId, file, progressCallback, abortController)

Which issue(s) this PR closes:

  • Enables DVWebloader V2 standalone bundle to work with S3-compatible storage

Related Dataverse PRs:

  • Used by dataverse-frontend standalone uploader (DVWebloader V2)

Special notes for your reviewer:

  • The FilesConfig class uses a static configuration pattern similar to ApiConfig
  • Configuration is applied lazily when DirectUploadClient is first instantiated
  • All options are optional and have sensible defaults for backward compatibility
  • Tests added for FilesConfig and updated for DirectUploadClient

Suggestions on how to test this:

  1. Unit tests: Run npm run test:unit - new tests in test/unit/files/FilesConfig.test.ts
  2. Integration test:
    • Set up a MinIO instance without tagging support
    • Configure FilesConfig.init({ useS3Tagging: false })
    • Verify file uploads succeed without CORS/tagging errors

Is there a release notes or changelog update needed for this change?:

Yes, CHANGELOG.md has been updated with the new feature.

Additional documentation:

The FilesConfig.init() method should be called before the first uploadFile.execute() call. If not called, default values are used which maintain backward compatibility with existing S3 configurations.

Copilot AI review requested due to automatic review settings December 5, 2025 14:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds configurable file upload options to support S3-compatible storage systems that may not fully implement all S3 features, particularly object tagging. The implementation introduces a FilesConfig class for runtime configuration with three options: useS3Tagging (to disable S3 tagging headers for incompatible storage), maxMultipartRetries (configurable retry count), and fileUploadTimeoutMs (configurable timeout).

  • Introduces FilesConfig class with static configuration pattern for file upload settings
  • Refactors DirectUploadClient constructor to accept a configuration object instead of a plain number
  • Implements conditional S3 tagging header inclusion based on configuration

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/unit/files/FilesConfig.test.ts New test suite covering FilesConfig initialization and configuration retrieval
test/unit/files/DirectUploadClient.test.ts Updated tests to verify S3 tagging behavior and use new config object pattern
src/files/infra/clients/DirectUploadClient.ts Implements DirectUploadClientConfig interface and conditional S3 tagging in single-part uploads
src/files/index.ts Adds FilesConfig class with lazy initialization pattern for DirectUploadClient and UploadFile instances
CHANGELOG.md Documents new features and breaking changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


### Changed

- Add pagination query parameters to Dataset Version Summeries and File Version Summaries use cases
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "Summeries" should be "Summaries"

Suggested change
- Add pagination query parameters to Dataset Version Summeries and File Version Summaries use cases
- Add pagination query parameters to Dataset Version Summaries and File Version Summaries use cases

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +56
const getDirectUploadClient = (): DirectUploadClient => {
if (!directUploadClientInstance) {
directUploadClientInstance = new DirectUploadClient(filesRepository, FilesConfig.getConfig())
}
return directUploadClientInstance
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy initialization pattern doesn't allow reconfiguration. Once directUploadClientInstance is created (line 53), subsequent calls to FilesConfig.init() won't affect it since the instance is never recreated. Consider either:

  1. Resetting directUploadClientInstance to null in FilesConfig.init() to force recreation on next use
  2. Adding a FilesConfig.reset() method that resets both config and instances
  3. Documenting that FilesConfig.init() must be called before the first upload and cannot be changed afterward

This is particularly important for testing scenarios where config might need to change between tests.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +7
beforeEach(() => {
// Reset config before each test
FilesConfig.init({})
})
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test setup resets the config to an empty object before each test, but this doesn't verify that the default values are correctly applied in DirectUploadClient. Consider adding a test that verifies the default behavior when FilesConfig.init() is not called at all, or is called with an empty config, to ensure defaults (useS3Tagging: true, maxMultipartRetries: 5, fileUploadTimeoutMs: 60000) are properly applied.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +139
test('should not include S3 tagging header when useS3Tagging is false', async () => {
const filesRepositoryStub: IFilesRepository = {} as IFilesRepository
const testDestination: FileUploadDestination = createSingleFileUploadDestinationModel()
filesRepositoryStub.getFileUploadDestination = jest.fn().mockResolvedValue(testDestination)

const axiosPutSpy = jest.spyOn(axios, 'put').mockResolvedValue(undefined)

const sut = new DirectUploadClient(filesRepositoryStub, { useS3Tagging: false })

const progressMock = jest.fn()
const abortController = new AbortController()

await sut.uploadFile(1, testFile, progressMock, abortController)

expect(axiosPutSpy).toHaveBeenCalledWith(
testDestination.urls[0],
expect.anything(),
expect.objectContaining({
headers: expect.not.objectContaining({
'x-amz-tagging': expect.anything()
})
})
)
})
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for the fileUploadTimeoutMs configuration option. While the config is set in the constructor (line 38 in DirectUploadClient.ts), there's no test verifying that this timeout value is actually used in the axios requests. Consider adding a test that verifies the timeout is passed correctly to axios calls.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@qqmyers
Copy link
Member

qqmyers commented Dec 5, 2025

FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here?

@ErykKul
Copy link
Contributor Author

ErykKul commented Dec 5, 2025

FYI - there's already a dataverse.files..disable-tagging setting - is that the same as the new one here?

If there is one already, then we do not need the new one. However, I know there is one in Java, I did not know there is one in Javascript too? I made this PR a draft, it is a part of a bigger idea: reusing the SPA parts in other UI's. I think it is something for the tech hours to discuss first. There are other 3 PR's coming to illustrate what I mean (one in the new SPA, one in dvwebloader and one in the JSF frontend). The dvwebloader that uses the SPA file upload can function without these PR's ever being merged, and the dvwebloader one is also optional (it is something that we want to use at KU Leuven, but it is fine if it is not mainstream).

@qqmyers
Copy link
Member

qqmyers commented Dec 5, 2025

A tech hour makes sense for the bigger picture.

re: the tagging - the store is configured on the server and the signed URLs either will/won't allow a tag header based on that (calls will fail if they don't match the server setting). Do we need an API so you can discover that setting? (We've recently been adding more settings to the storageDriver GET API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for Triage

Development

Successfully merging this pull request may close these issues.

3 participants