-
Notifications
You must be signed in to change notification settings - Fork 8
Create new data loader page #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nenharper
wants to merge
5
commits into
main
Choose a base branch
from
nenne/data-loader-page
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
d657346
Create new data loader page
nenharper bb2b401
Update versioned_docs/version-4.6/developers/applications/data-loader.md
nenharper 601ec04
Update versioned_docs/version-4.6/reference/applications/data-loader.md
nenharper 84b7385
Address comments
nenharper 65c7e24
fixing links
heskew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,178 +4,122 @@ title: Data Loader | |
|
|
||
| # Data Loader | ||
|
|
||
| The Data Loader is a built-in component that provides a reliable mechanism for loading data from JSON or YAML files into Harper tables as part of component deployment. This feature is particularly useful for ensuring specific records exist in your database when deploying components, such as seed data, configuration records, or initial application data. | ||
| Now that you’ve set up your first application, let’s bring it to life with some data. Applications are only as useful as the information they hold, and Harper makes it simple to seed your database with initial records, configuration values, or even test users, without needing to write a custom script. This is where the Data Loader comes in. | ||
|
|
||
| ## Configuration | ||
| Think of the Data Loader as your shortcut for putting essential data in place from day one. Whether it’s a set of default settings, an admin user account, or sample data for development, the Data Loader ensures that when your application is deployed, it’s immediately usable. | ||
|
|
||
| To use the Data Loader, first specify your data files in the `config.yaml` in your component directory: | ||
| In this section, we’ll add a few dogs to our `Dog` table so our application starts with meaningful data. | ||
|
|
||
| ```yaml | ||
| dataLoader: | ||
| files: 'data/*.json' | ||
| ``` | ||
|
|
||
| The Data Loader is an [Extension](../../reference/components#extensions) and supports the standard `files` configuration option. | ||
| ## Creating a Data File | ||
|
|
||
| ## Data File Format | ||
|
|
||
| Data files can be structured as either JSON or YAML files containing the records you want to load. Each data file must specify records for a single table - if you need to load data into multiple tables, create separate data files for each table. | ||
|
|
||
| ### Basic Example | ||
|
|
||
| Create a data file in your component's data directory (one table per file): | ||
| First, let’s make a `data` directory in our app and create a file called `dogs.json`: | ||
|
|
||
| ```json | ||
| { | ||
| "database": "myapp", | ||
| "table": "users", | ||
| "table": "Dog", | ||
| "records": [ | ||
| { | ||
| "id": 1, | ||
| "username": "admin", | ||
| "email": "[email protected]", | ||
| "role": "administrator" | ||
| "name": "Harper", | ||
| "breed": "Labrador", | ||
| "age": 3, | ||
| "tricks": ["sit"] | ||
| }, | ||
| { | ||
| "id": 2, | ||
| "username": "user1", | ||
| "email": "[email protected]", | ||
| "role": "standard" | ||
| "name": "Balto", | ||
| "breed": "Husky", | ||
| "age": 5, | ||
| "tricks": ["run", "pull sled"] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| ### Multiple Tables | ||
| This file tells Harper: _“Insert these two records into the `Dog` table when this app runs.”_ | ||
|
|
||
| To load data into multiple tables, create separate data files for each table: | ||
| ## Connecting the Data Loader | ||
|
|
||
| **users.json:** | ||
|
|
||
| ```json | ||
| { | ||
| "database": "myapp", | ||
| "table": "users", | ||
| "records": [ | ||
| { | ||
| "id": 1, | ||
| "username": "admin", | ||
| "email": "[email protected]" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| **settings.yaml:** | ||
|
|
||
| ```yaml | ||
| database: myapp | ||
| table: settings | ||
| records: | ||
| - id: 1 | ||
| setting_name: app_name | ||
| setting_value: My Application | ||
| - id: 2 | ||
| setting_name: version | ||
| setting_value: '1.0.0' | ||
| ``` | ||
|
|
||
| ## File Organization | ||
|
|
||
| You can organize your data files in various ways: | ||
|
|
||
| ### Single File Pattern | ||
| Next, let’s tell Harper to use this file when running the application. Open `config.yaml` in the root of your project and add: | ||
|
|
||
| ```yaml | ||
| dataLoader: | ||
| files: 'data/seed-data.json' | ||
| files: 'data/dogs.json' | ||
| ``` | ||
|
|
||
| ### Multiple Files Pattern | ||
| That’s it. Now the Data Loader knows where to look. | ||
|
|
||
| ```yaml | ||
| dataLoader: | ||
| files: | ||
| - 'data/users.json' | ||
| - 'data/settings.yaml' | ||
| - 'data/initial-products.json' | ||
| ``` | ||
| ## Running with Data | ||
|
|
||
| ### Glob Pattern | ||
| Go ahead and start your app again: | ||
|
|
||
| ```yaml | ||
| dataLoader: | ||
| files: 'data/**/*.{json,yaml,yml}' | ||
| ```bash | ||
| harperdb dev . | ||
| ``` | ||
|
|
||
| ## Loading Behavior | ||
| This time, when Harper runs, it will automatically read `dogs.json` and load the records into the Dog table. You don’t need to write any import scripts or SQL statements, it just works. | ||
|
|
||
| When Harper starts up with a component that includes the Data Loader: | ||
| You can confirm the data is there by hitting the endpoint you created earlier: | ||
|
|
||
| 1. The Data Loader reads all specified data files (JSON or YAML) | ||
| 1. For each file, it validates that a single table is specified | ||
| 1. Records are inserted or updated based on timestamp comparison: | ||
| - New records are inserted if they don't exist | ||
| - Existing records are updated only if the data file's modification time is newer than the record's updated time | ||
| - This ensures data files can be safely reloaded without overwriting newer changes | ||
| 1. If records with the same primary key already exist, updates occur only when the file is newer | ||
|
|
||
| Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety. | ||
|
|
||
| ## Best Practices | ||
| ```bash | ||
| curl http://localhost:9926/Dog/ | ||
| ``` | ||
|
|
||
| 1. **Define Schemas First**: While the Data Loader can infer schemas, it's strongly recommended to define your table schemas and relations explicitly using the [graphqlSchema](../applications/defining-schemas) component before loading data. This ensures proper data types, constraints, and relationships between tables. | ||
| You should see both `Harper` and `Balto` returned as JSON. | ||
|
|
||
| 1. **One Table Per File**: Remember that each data file can only load records into a single table. Organize your files accordingly. | ||
| ### Updating Records | ||
|
|
||
| 1. **Idempotency**: Design your data files to be idempotent - they should be safe to load multiple times without creating duplicate or conflicting data. | ||
| What happens if you change the data file? Let’s update Harper’s age from 3 to 4 in `dogs.json.` | ||
|
|
||
| 1. **Version Control**: Include your data files in version control to ensure consistency across deployments. | ||
| ```json | ||
| { | ||
| "id": 1, | ||
| "name": "Harper", | ||
| "breed": "Labrador", | ||
| "age": 4, | ||
| "tricks": ["sit"] | ||
| } | ||
| ``` | ||
|
|
||
| 1. **Environment-Specific Data**: Consider using different data files for different environments (development, staging, production). | ||
| When you save the file, Harper will notice the change and reload. The next time you query the endpoint, Harper’s age will be updated. | ||
|
|
||
| 1. **Data Validation**: Ensure your data files are valid JSON or YAML and match your table schemas before deployment. | ||
| The Data Loader is designed to be safe and repeatable. If a record already exists, it will only update when the file is newer than the record. This means you can re-run deployments without worrying about duplicates. | ||
|
|
||
| 1. **Sensitive Data**: Avoid including sensitive data like passwords or API keys directly in data files. Use environment variables or secure configuration management instead. | ||
| ### Adding More Tables | ||
|
|
||
| ## Example Component Structure | ||
| If your app grows and you want to seed more than just dogs, you can create additional files. For example, a `settings.yaml` file: | ||
heskew marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ``` | ||
| my-component/ | ||
| ├── config.yaml | ||
| ├── data/ | ||
| │ ├── users.json | ||
| │ ├── roles.json | ||
| │ └── settings.json | ||
| ├── schemas.graphql | ||
| └── roles.yaml | ||
| ```yaml | ||
| database: myapp | ||
| table: Settings | ||
| records: | ||
| - id: 1 | ||
| setting_name: app_name | ||
| setting_value: Dog Tracker | ||
| - id: 2 | ||
| setting_name: version | ||
| setting_value: '1.0.0' | ||
| ``` | ||
|
|
||
| With this structure, your `config.yaml` might look like: | ||
| Then add it to your config: | ||
|
|
||
| ```yaml | ||
| # Load environment variables first | ||
| loadEnv: | ||
| files: '.env' | ||
| dataLoader: | ||
| files: | ||
| - 'data/dogs.json' | ||
| - 'data/settings.yaml' | ||
| ``` | ||
|
|
||
| # Define schemas | ||
| graphqlSchema: | ||
| files: 'schemas.graphql' | ||
| Harper will read both files and load them into their respective tables. | ||
|
|
||
| # Define roles | ||
| roles: | ||
| files: 'roles.yaml' | ||
| ## Key Takeaway | ||
|
|
||
| # Load initial data | ||
| dataLoader: | ||
| files: 'data/*.json' | ||
| With the Data Loader, your app doesn’t start empty. It starts ready to use. You define your schema, write a simple data file, and Harper takes care of loading it. This keeps your applications consistent across environments, safe to redeploy, and quick to get started with. | ||
|
|
||
| # Enable REST endpoints | ||
| rest: true | ||
| ``` | ||
| In just a few steps, we’ve gone from an empty Dog table to a real application with data that’s instantly queryable. | ||
|
|
||
| ## Related Documentation | ||
|
|
||
| - [Built-In Components](../../reference/components/built-in-extensions) | ||
| - [Extensions](../../reference/components/extensions) | ||
| - [Data Loader Reference](../../reference/applications/data-loader) – Complete configuration and format options. | ||
| - [Bulk Operations](../operations-api/bulk-operations) - For loading data via the Operations API | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.