From f9bdb0271415a7559e6bbab60033c3cfc06da5c7 Mon Sep 17 00:00:00 2001 From: spricoder Date: Mon, 1 Jun 2026 16:09:26 +0800 Subject: [PATCH 01/41] docs: add design spec for TsFile Unix-philosophy C++ CLI Read-only inspect/export verbs (ls/schema/stats/head/cat/select) as a single multi-call `tsfile` binary, backed by the existing C++ reader API. --- .../2026-06-01-tsfile-unix-cli-design.md | 217 ++++++++++++++++++ 1 file changed, 217 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md diff --git a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md new file mode 100644 index 000000000..6b0ed5ff6 --- /dev/null +++ b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md @@ -0,0 +1,217 @@ + + +# Design: A Unix-philosophy command-line interface for TsFile (C++) + +- **Date**: 2026-06-01 +- **Module**: `cpp/` +- **Status**: Approved design, pending implementation plan + +## Goal + +Give TsFile a set of composable, pipeable command-line tools — in the Unix +tradition of small programs that read a file and write machine-parseable data +to stdout. The primary gap today is the **read / inspect / export** side: to +answer "what devices, measurements, schema, and data live in this `.tsfile`?" +a user must write code or read the raw byte layout via the Java +`TsFileSketchTool`. A single `tsfile` binary closes that gap and composes with +`awk`, `jq`, `sort`, and database import tools. + +## Scope + +**In scope (v1):** read-only inspection and export verbs — `ls`, `schema`, +`stats`, `head`, `cat`, `select` — shipped as one multi-call C++ binary. + +**Out of scope (v1, possible follow-ups):** write/convert verbs (the Java +`tools` module already imports CSV/Parquet/Arrow → TsFile), a structure-dump +verb at parity with `TsFileSketchTool`, ISO time formatting, and splitting into +multiple `tsfile-*` binaries. + +## Why C++ + +The user wants the most Unix-native form: a single self-contained static binary +with fast startup and no runtime dependency (unlike the JVM-based Java tools or +the Python binding). The C++ read path already exposes everything the verbs +need, so the engine does not change — the work is argument parsing, subcommand +dispatch, and output formatting. + +## Existing building blocks (no engine changes needed) + +`storage::TsFileReader` (`cpp/src/reader/tsfile_reader.h`) already provides: + +| Need | API | +|---|---| +| list devices (tree) | `get_all_device_ids()` / `get_all_devices()` | +| list tables (table) | `get_all_table_schemas()` | +| per-device measurement schema | `get_timeseries_schema(device_id, &out)` | +| per-table schema | `get_table_schema(name)` | +| per-series statistics | `get_timeseries_metadata()` (carries `Statistics`) | +| rows with offset/limit pushdown | `queryByRow(...)` (tree & table overloads) | +| rows by time range / columns | `query(...)` (tree & table overloads) | +| row iteration + column metadata | `ResultSet` + `ResultSetMetadata` (`cpp/src/reader/result_set.h`) | + +`result_set.h` also contains `print_table_result_set`, which already iterates +columns and rows and formats each value by `TSDataType` (INT32/INT64/FLOAT/ +DOUBLE/BOOLEAN/TEXT/STRING). The `tsv`/`table` formatters extend this pattern; +`csv`/`json` reuse the same type-dispatch. + +## Architecture + +### Code location + +A new `cpp/tools/` directory, parallel to `cpp/examples/` (which is the existing +template for "an executable that links `libtsfile`"). + +``` +cpp/tools/ +├── CMakeLists.txt +├── tsfile_cli.cc # main: parse top level, dispatch subcommand +├── cli_args.h / cli_args.cc # minimal hand-rolled option parser (no new deps) +├── output_format.h / output_format.cc # csv / tsv / json(NDJSON) / table formatters +└── commands/ + ├── command.h # subcommand interface: name(), run(args), help() + ├── cmd_ls.cc cmd_schema.cc cmd_stats.cc + └── cmd_head.cc cmd_cat.cc cmd_select.cc +``` + +### CLI shape — single multi-call binary, git-style dispatch + +```sh +tsfile [options] +tsfile --help | --version +tsfile help +``` + +Argument parsing is hand-rolled. The verbs are simple, the project targets +C++11, and a goal is to keep the binary free of new third-party/runtime +dependencies, consistent with the rest of the C++ module. + +### Unix discipline (applies to every command) + +- **Data goes to stdout; diagnostics, progress, and errors go to stderr.** This + is what lets `tsfile cat f.tsfile | jq` work without log noise on stdout. +- **Exit codes are meaningful:** `0` success, `1` usage error, `2` cannot open + / corrupted file, `3` query or runtime error. +- The library currently prints open errors to stdout (`ReadFile::open`, + `cpp/src/file/read_file.cc:52`). Along the CLI path these must go to stderr so + they do not corrupt piped output. (Small, contained fix.) + +### Build / packaging + +- New CMake option `BUILD_TOOLS` (default `ON`), producing + `build//bin/tsfile`. +- `install(TARGETS tsfile ...)` so `make install` ships the binary. +- `build.sh` is left unchanged for v1 (it follows CMake defaults); revisit if a + dedicated flag is wanted. + +## Command surface (v1) + +All verbs are read-only and backed by the existing reader API. + +| Command | Purpose | Backed by | +|---|---|---| +| `ls` | list devices (tree) or tables (table), one name per line | `get_all_device_ids()` / `get_all_table_schemas()` | +| `schema` | per-measurement data type / encoding / compression | `get_timeseries_schema()` / `get_table_schema()` | +| `stats` | per-series row count, time range, chunk count | `get_timeseries_metadata()` (`Statistics`) | +| `head` | first N rows | `queryByRow(..., offset=0, limit=N)` | +| `cat` | all rows of a device/table | `query()` / `queryByRow(..., limit=-1)` | +| `select` | chosen columns + time range + limit/offset | `query(table, cols, start, end, ...)` / tree `query(paths, start, end)` | + +### Common flags + +| Flag | Meaning | +|---|---| +| `-f, --format csv\|tsv\|json\|table` | output format; default is TTY-adaptive (see below) | +| `-d, --device ` | scope to a device (tree model) | +| `-t, --table ` | scope to a table (table model) | +| `-m, --measurements s1,s2` | select columns | +| `-n, --limit N` | max rows (`head` is sugar for `--limit`) | +| `--offset N` | skip leading rows | +| `--start ` / `--end ` | time range; v1 accepts epoch milliseconds | +| `--no-header` | suppress the header row | +| `--model tree\|table` | force a model (override auto-detection) | +| `-h, --help` / `--version` | usage / version | + +## Tree vs. table model handling + +A `.tsfile` is written in one of two data models. The CLI auto-detects and +adapts: + +- **Detection:** `get_all_table_schemas()` non-empty ⇒ **table** model; otherwise + **tree** model. `--model` overrides for edge cases. +- **`ls`:** tree ⇒ one device ID per line; table ⇒ one table name per line. + One item per line keeps it pipe-friendly; per-column detail lives in `schema`. +- **Column semantics differ** (tree: device path + measurement; table: table + + columns), but **the time column is always column 1** in row output + (`ResultSetMetadata` guarantees this). + +## Output formats + +- **`table`** (human): aligned columns. +- **`tsv`** (pipe): tab-separated, header row first (unless `--no-header`). +- **TTY-adaptive default:** when stdout is a terminal, default to `table`; when + piped or redirected, default to `tsv`. `--format` always overrides. This + mirrors the behavior of `git` and `ls`. +- **`csv`:** RFC 4180 quoting (quote fields containing delimiter, quote, or + newline; double embedded quotes). +- **`json`:** **NDJSON** — one JSON object per row, newline-delimited — chosen + for streaming and `jq -c` friendliness over a single large array. +- **Null handling:** empty field in CSV/TSV; `null` in JSON. +- **Timestamps:** v1 emits the raw stored epoch (INT64). `--time-format iso` is + a deliberate follow-up. + +## Error handling & exit codes + +| Exit | Condition | +|---|---| +| `0` | success | +| `1` | usage / argument error (unknown command, bad flag, missing file arg) | +| `2` | file cannot be opened or is corrupted (`E_FILE_OPEN_ERR`, `E_TSFILE_CORRUPTED`) | +| `3` | query / runtime error | + +The reader returns integer error codes; the CLI maps open/corruption codes to +exit `2` and query failures to exit `3`. The stray stdout error print in +`ReadFile::open` is redirected to stderr along the CLI path. + +## Testing + +Google Test, under `cpp/test/tools/` mirroring `cpp/src` test conventions. + +- **Unit:** + - `cli_args` parsing (commands, flags, error cases). + - Each formatter (`csv`, `tsv`, `json`/NDJSON, `table`) against a synthetic + `ResultSet` / `ResultSetMetadata`, including null and quoting edge cases. + - Model detection (table-schema-present ⇒ table; otherwise tree). +- **End-to-end:** in a temp directory, write a small `.tsfile` via the existing + writer (or reuse `cpp/examples/test_cpp.tsfile`), run each command as a + subprocess, and assert both stdout content and exit code. Fixtures are + hermetic (generated under a temp dir, cleaned up). + +## License header + +Every new file (`.cc`, `.h`, `CMakeLists.txt`, this `.md`) carries the Apache +License 2.0 header in the comment style appropriate to the file type, per +repository convention. + +## Open follow-ups (explicitly deferred, not v1) + +- Structure-dump verb at parity with Java `TsFileSketchTool`. +- Write / convert verbs (Java `tools` already covers import). +- `--time-format iso`, and richer `select` predicates beyond a time range. +- Optional split into multiple `tsfile-*` binaries (coreutils-style). From f9d2d8479c397cbf166c2fdf50851967fa650778 Mon Sep 17 00:00:00 2001 From: spricoder Date: Mon, 1 Jun 2026 17:00:08 +0800 Subject: [PATCH 02/41] docs: add implementation plan for TsFile Unix CLI; refine spec 9 TDD tasks (CMake scaffold -> arg parser -> formatters -> ResultSet pump -> ls/schema/stats/head/cat/select -> install/verify). Spec tweaked to match confirmed C++ APIs (table-model schema blanks encoding/compression; stats = count + time range). --- .../plans/2026-06-01-tsfile-unix-cli.md | 2081 +++++++++++++++++ .../2026-06-01-tsfile-unix-cli-design.md | 10 +- 2 files changed, 2089 insertions(+), 2 deletions(-) create mode 100644 docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md diff --git a/docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md b/docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md new file mode 100644 index 000000000..9484e3f78 --- /dev/null +++ b/docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md @@ -0,0 +1,2081 @@ + + +# TsFile Unix-philosophy C++ CLI — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Ship a single `tsfile` C++ binary with read-only, pipe-friendly verbs (`ls`, `schema`, `stats`, `head`, `cat`, `select`) for inspecting and exporting `.tsfile` files. + +**Architecture:** A new `cpp/tools/` directory builds an OBJECT library (`tsfile_cli_obj`) plus a thin `main`. The library is also linked into `TsFile_Test` for unit tests. All command output goes to an injected `std::ostream&` (data→stdout, diagnostics→stderr) so commands are testable in-process. Formatting is split into a pure layer (escaping/aligning/`RowWriter`, no reader dependency, heavily unit-tested) and a `ResultSet` pump layer (e2e-tested against a generated fixture). Everything is backed by the existing `storage::TsFileReader` API — the read engine is not modified. + +**Tech Stack:** C++11, CMake (≥3.11), Google Test 1.12.1, clang-format (Google style). No new third-party/runtime dependencies (argument parsing is hand-rolled). + +**Spec:** `docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md` + +--- + +## Conventions used in every task + +- **License header:** every new file (`.h`, `.cc`, `CMakeLists.txt`) starts with the Apache 2.0 header. For `.h`/`.cc` use the `/* ... */` block form copied verbatim from any existing file (e.g. `cpp/src/file/read_file.h` lines 1-18). For `CMakeLists.txt` use the `#[[ ... ]]` form (see `cpp/examples/CMakeLists.txt` lines 1-18). The code blocks below omit the header for brevity — **prepend it to each new file.** +- **Namespace:** all CLI code lives in `namespace tsfile_cli`. +- **Formatting:** run `./mvnw spotless:apply` (or `clang-format`) before each commit; the build's `-Wall` must stay clean. +- **Build/run from** `cpp/`: `bash build.sh -t=Debug` produces `build/Debug/bin/tsfile` and `build/Debug/lib/TsFile_Test`. + +## File structure (created by this plan) + +``` +cpp/tools/ +├── CMakeLists.txt # OBJECT lib tsfile_cli_obj + executable tsfile +├── tools_main.cc # main(): forwards argv to run_cli +├── cli/ +│ ├── exit_codes.h # kExitOk/kExitUsage/kExitFile/kExitRuntime +│ ├── cli_args.h / cli_args.cc # ParsedArgs + parse_args() +│ └── run_cli.h / run_cli.cc # top-level dispatch, reader open, error→exit mapping +├── format/ +│ ├── output_format.h / .cc # pure: resolve_format, escapes, type names, RowWriter +│ └── result_set_format.h / .cc # ResultSet pump: cell_to_string, write_result_set +└── commands/ + ├── commands.h # is_table_model + cmd_* declarations + ├── cmd_ls.cc cmd_schema.cc cmd_stats.cc + └── cmd_head.cc cmd_cat.cc cmd_select.cc + +cpp/test/tools/ +├── cli_test_util.h # writes a table-model fixture .tsfile to a temp path +├── cli_args_test.cc +├── output_format_test.cc +└── command_e2e_test.cc +``` + +Modified files: +- `cpp/CMakeLists.txt` — add `option(BUILD_TOOLS ...)` and `add_subdirectory(tools)`. +- `cpp/test/CMakeLists.txt` — glob `tools/*_test.cc`, link `tsfile_cli_obj`. +- `cpp/src/file/read_file.cc:52-55` — route open-error prints to `stderr`. + +--- + +## Task sequencing + +Tasks are ordered so each ends green and committable: + +1. CMake scaffold + `main` + `run_cli` skeleton (`--version`/`--help`) +2. `parse_args` (cli_args) +3. Pure output formatting (`output_format`) +4. `ResultSet` pump (`result_set_format`) +5. Model detection + `cmd_ls` +6. `cmd_schema` +7. `cmd_stats` +8. `cmd_head` / `cmd_cat` / `cmd_select` (row data) +9. Library stderr fix + `install()` + full-suite run + manual tree-model verification + +Detailed tasks follow in separate sections of this document (one task per `###` heading). Each is self-contained: exact files, complete code, exact commands, expected output. + +--- + +### Task 1: CMake scaffold + `main` + `run_cli` skeleton + +**Files:** +- Create: `cpp/tools/cli/exit_codes.h` +- Create: `cpp/tools/cli/run_cli.h`, `cpp/tools/cli/run_cli.cc` +- Create: `cpp/tools/tools_main.cc` +- Create: `cpp/tools/CMakeLists.txt` +- Modify: `cpp/CMakeLists.txt` (add option + subdir) +- Modify: `cpp/test/CMakeLists.txt` (glob tools tests, link object lib) +- Test: `cpp/test/tools/cli_args_test.cc` (skeleton-level: version/help) + +- [ ] **Step 1: Write the failing test** — `cpp/test/tools/cli_args_test.cc` + +```cpp +#include +#include +#include "cli/run_cli.h" + +TEST(RunCliTest, VersionFlagPrintsVersionAndReturnsOk) { + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"--version"}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE(out.str().find("tsfile"), std::string::npos); + EXPECT_TRUE(err.str().empty()); +} + +TEST(RunCliTest, NoArgsPrintsUsageToErrAndReturnsUsageError) { + std::ostringstream out, err; + int code = tsfile_cli::run_cli({}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("Usage"), std::string::npos); +} + +TEST(RunCliTest, UnknownCommandIsUsageError) { + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"frobnicate", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("Unknown command"), std::string::npos); +} +``` + +- [ ] **Step 2: Create `cpp/tools/cli/exit_codes.h`** + +```cpp +#ifndef TSFILE_CLI_EXIT_CODES_H +#define TSFILE_CLI_EXIT_CODES_H +namespace tsfile_cli { +constexpr int kExitOk = 0; // success +constexpr int kExitUsage = 1; // bad arguments / unknown command +constexpr int kExitFile = 2; // cannot open or corrupted file +constexpr int kExitRuntime = 3; // query / runtime error +} // namespace tsfile_cli +#endif // TSFILE_CLI_EXIT_CODES_H +``` + +- [ ] **Step 3: Create `cpp/tools/cli/run_cli.h`** + +```cpp +#ifndef TSFILE_CLI_RUN_CLI_H +#define TSFILE_CLI_RUN_CLI_H +#include +#include +#include +namespace tsfile_cli { +// Entry point used by main() and by tests. argv excludes the program name. +// Data is written to `out`, diagnostics/errors to `err`. Returns an exit code +// from exit_codes.h. +int run_cli(const std::vector& args, std::ostream& out, + std::ostream& err); +} // namespace tsfile_cli +#endif // TSFILE_CLI_RUN_CLI_H +``` + +- [ ] **Step 4: Create `cpp/tools/cli/run_cli.cc`** (skeleton — dispatch filled in later tasks) + +```cpp +#include "cli/run_cli.h" + +#include "cli/exit_codes.h" + +#ifndef TSFILE_CLI_VERSION +#define TSFILE_CLI_VERSION "unknown" +#endif + +namespace tsfile_cli { + +namespace { +void print_usage(std::ostream& os) { + os << "Usage: tsfile [options] \n" + "Commands: ls schema stats head cat select\n" + "Run 'tsfile help ' for command options.\n"; +} +} // namespace + +int run_cli(const std::vector& args, std::ostream& out, + std::ostream& err) { + for (const std::string& a : args) { + if (a == "--version") { + out << "tsfile (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; + return kExitOk; + } + } + if (args.empty() || args[0] == "--help" || args[0] == "-h" || + args[0] == "help") { + print_usage(args.empty() ? err : out); + return args.empty() ? kExitUsage : kExitOk; + } + + const std::string& command = args[0]; + // Dispatch table is extended in Tasks 5-8. + err << "Unknown command: " << command << "\n"; + print_usage(err); + return kExitUsage; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: Create `cpp/tools/tools_main.cc`** + +```cpp +#include +#include +#include + +#include "cli/run_cli.h" + +int main(int argc, char** argv) { + std::vector args(argv + 1, argv + argc); + return tsfile_cli::run_cli(args, std::cout, std::cerr); +} +``` + +- [ ] **Step 6: Create `cpp/tools/CMakeLists.txt`** + +```cmake +message("Running in tools directory") + +# Sources for the CLI logic, excluding main(), compiled once as an OBJECT +# library so both the executable and the test target can reuse them. +file(GLOB_RECURSE TSFILE_CLI_SRCS + "cli/*.cc" + "format/*.cc" + "commands/*.cc") + +add_library(tsfile_cli_obj OBJECT ${TSFILE_CLI_SRCS}) + +# Headers: this dir (for "cli/..", "format/..", "commands/..") + the SDK src. +target_include_directories(tsfile_cli_obj PUBLIC + ${CMAKE_CURRENT_SOURCE_DIR} + ${PROJECT_SOURCE_DIR}/src) +if (ENABLE_ANTLR4) + target_include_directories(tsfile_cli_obj PUBLIC + ${PROJECT_SOURCE_DIR}/third_party/antlr4-cpp-runtime-4/runtime/src) +endif () + +target_compile_definitions(tsfile_cli_obj PRIVATE + TSFILE_CLI_VERSION="${TsFile_CPP_VERSION}") + +# The shipped binary. Target name differs from the `tsfile` library target to +# avoid a collision; OUTPUT_NAME makes the file `tsfile`. +add_executable(tsfile_cli tools_main.cc $) +target_include_directories(tsfile_cli PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}) +target_link_libraries(tsfile_cli tsfile) +set_target_properties(tsfile_cli PROPERTIES + OUTPUT_NAME tsfile + RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/bin) +``` + +- [ ] **Step 7: Modify `cpp/CMakeLists.txt`** — add the option after the other `option(...)` lines (near line 171) and the subdir before `add_subdirectory(test)` (so `tsfile_cli_obj` exists for the test target). Insert: + +```cmake +option(BUILD_TOOLS "Build the tsfile command-line tools" ON) +message("cmake using: BUILD_TOOLS=${BUILD_TOOLS}") +``` + +and change the tail of the file from: + +```cmake +add_subdirectory(src) +if (BUILD_TEST) +``` + +to: + +```cmake +add_subdirectory(src) +if (BUILD_TOOLS) + add_subdirectory(tools) +endif () +if (BUILD_TEST) +``` + +- [ ] **Step 8: Modify `cpp/test/CMakeLists.txt`** — add tools test glob after the existing `file(GLOB_RECURSE TEST_SRCS ...)` block (after line 114): + +```cmake +if (BUILD_TOOLS) + file(GLOB_RECURSE TOOLS_TEST_SRCS "tools/*_test.cc") + list(APPEND TEST_SRCS ${TOOLS_TEST_SRCS}) +endif () +``` + +and extend the test target's link + includes. Change: + +```cmake +add_executable(TsFile_Test ${TEST_SRCS}) +target_link_libraries( + TsFile_Test + GTest::gtest_main + GTest::gmock + tsfile +) +``` + +to: + +```cmake +add_executable(TsFile_Test ${TEST_SRCS}) +if (BUILD_TOOLS) + target_include_directories(TsFile_Test PRIVATE ${CMAKE_SOURCE_DIR}/tools) +endif () +target_link_libraries( + TsFile_Test + GTest::gtest_main + GTest::gmock + tsfile +) +if (BUILD_TOOLS) + target_link_libraries(TsFile_Test tsfile_cli_obj) +endif () +``` + +- [ ] **Step 9: Build and run the tests** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -20` +Expected: build succeeds; `build/Debug/bin/tsfile` and `build/Debug/lib/TsFile_Test` exist. + +Run: `cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=RunCliTest.*` +Expected: 3 tests PASS. + +Run: `cd cpp && ./build/Debug/bin/tsfile --version` +Expected: prints `tsfile (Apache TsFile C++) 2.2.1.dev` and exits 0. + +- [ ] **Step 10: Commit** + +```bash +git add cpp/tools cpp/test/tools/cli_args_test.cc cpp/CMakeLists.txt cpp/test/CMakeLists.txt +git commit -m "feat(cpp-tools): scaffold tsfile CLI binary with run_cli skeleton" +``` + +--- + +### Task 2: `parse_args` (cli_args) + +**Files:** +- Create: `cpp/tools/cli/cli_args.h`, `cpp/tools/cli/cli_args.cc` +- Test: append to `cpp/test/tools/cli_args_test.cc` + +- [ ] **Step 1: Write the failing tests** — append to `cpp/test/tools/cli_args_test.cc` + +```cpp +#include "cli/cli_args.h" + +TEST(ParseArgsTest, CommandAndFilePositional) { + auto p = tsfile_cli::parse_args({"ls", "data.tsfile"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "ls"); + EXPECT_EQ(p.file, "data.tsfile"); +} + +TEST(ParseArgsTest, FormatFlagParsed) { + auto p = tsfile_cli::parse_args({"cat", "-f", "json", "data.tsfile"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.format, tsfile_cli::ParsedArgs::Format::kJson); +} + +TEST(ParseArgsTest, MeasurementsSplitOnComma) { + auto p = tsfile_cli::parse_args( + {"select", "-m", "s1,s2,s3", "data.tsfile"}); + ASSERT_EQ(p.measurements.size(), 3u); + EXPECT_EQ(p.measurements[1], "s2"); +} + +TEST(ParseArgsTest, LimitOffsetAndTimeRange) { + auto p = tsfile_cli::parse_args( + {"head", "-n", "5", "--offset", "2", "--start", "100", "--end", "200", + "data.tsfile"}); + EXPECT_EQ(p.limit, 5); + EXPECT_EQ(p.offset, 2); + EXPECT_TRUE(p.has_start); + EXPECT_EQ(p.start, 100); + EXPECT_TRUE(p.has_end); + EXPECT_EQ(p.end, 200); +} + +TEST(ParseArgsTest, UnknownFlagIsError) { + auto p = tsfile_cli::parse_args({"ls", "--bogus", "data.tsfile"}); + EXPECT_FALSE(p.error.empty()); +} + +TEST(ParseArgsTest, BadFormatValueIsError) { + auto p = tsfile_cli::parse_args({"cat", "-f", "yaml", "data.tsfile"}); + EXPECT_FALSE(p.error.empty()); +} + +TEST(ParseArgsTest, MissingFileIsAllowedAtParseTime) { + // File presence is validated by run_cli, not parse_args. + auto p = tsfile_cli::parse_args({"ls"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "ls"); + EXPECT_TRUE(p.file.empty()); +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Run: `cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.*` +Expected: compile failure (`cli_args.h` missing) — that counts as red. + +- [ ] **Step 3: Create `cpp/tools/cli/cli_args.h`** + +```cpp +#ifndef TSFILE_CLI_CLI_ARGS_H +#define TSFILE_CLI_CLI_ARGS_H +#include +#include +#include +namespace tsfile_cli { +struct ParsedArgs { + enum class Format { kAuto, kCsv, kTsv, kJson, kTable }; + std::string command; + std::string file; + std::string device; // -d / --device (tree model) + std::string table; // -t / --table (table model) + std::vector measurements; // -m / --measurements (comma list) + long long limit = -1; // -n / --limit (<0 = unlimited) + long long offset = 0; // --offset + long long start = LLONG_MIN; // --start (epoch ms) + long long end = LLONG_MAX; // --end (epoch ms) + bool has_start = false; + bool has_end = false; + Format format = Format::kAuto; // -f / --format + bool no_header = false; // --no-header + std::string model; // --model "tree"|"table"|"" + bool help = false; + bool version = false; + std::string error; // non-empty => parse error message +}; + +// Parses args (program name already stripped). On bad input, returns a +// ParsedArgs whose .error is set; otherwise .error is empty. Does NOT validate +// that a file was supplied — run_cli does that per command. +ParsedArgs parse_args(const std::vector& args); +} // namespace tsfile_cli +#endif // TSFILE_CLI_CLI_ARGS_H +``` + +- [ ] **Step 4: Create `cpp/tools/cli/cli_args.cc`** + +```cpp +#include "cli/cli_args.h" + +#include +#include + +namespace tsfile_cli { + +namespace { +std::vector split_csv(const std::string& s) { + std::vector out; + std::string item; + std::istringstream iss(s); + while (std::getline(iss, item, ',')) { + if (!item.empty()) out.push_back(item); + } + return out; +} + +bool parse_ll(const std::string& s, long long& out) { + if (s.empty()) return false; + char* endp = nullptr; + long long v = std::strtoll(s.c_str(), &endp, 10); + if (endp == nullptr || *endp != '\0') return false; + out = v; + return true; +} + +bool parse_format(const std::string& s, ParsedArgs::Format& out) { + if (s == "csv") out = ParsedArgs::Format::kCsv; + else if (s == "tsv") out = ParsedArgs::Format::kTsv; + else if (s == "json") out = ParsedArgs::Format::kJson; + else if (s == "table") out = ParsedArgs::Format::kTable; + else return false; + return true; +} +} // namespace + +ParsedArgs parse_args(const std::vector& args) { + ParsedArgs p; + if (args.empty()) return p; + p.command = args[0]; + + // Flags requiring a value; the lambda fetches the next token. + size_t i = 1; + auto need_value = [&](const std::string& flag, std::string& dst) -> bool { + if (i + 1 >= args.size()) { + p.error = "Missing value for " + flag; + return false; + } + dst = args[++i]; + return true; + }; + + for (; i < args.size(); ++i) { + const std::string& a = args[i]; + std::string val; + if (a == "-f" || a == "--format") { + if (!need_value(a, val)) return p; + if (!parse_format(val, p.format)) { + p.error = "Invalid format: " + val + " (use csv|tsv|json|table)"; + return p; + } + } else if (a == "-d" || a == "--device") { + if (!need_value(a, p.device)) return p; + } else if (a == "-t" || a == "--table") { + if (!need_value(a, p.table)) return p; + } else if (a == "-m" || a == "--measurements") { + if (!need_value(a, val)) return p; + p.measurements = split_csv(val); + } else if (a == "-n" || a == "--limit") { + if (!need_value(a, val)) return p; + if (!parse_ll(val, p.limit)) { p.error = "Invalid --limit: " + val; return p; } + } else if (a == "--offset") { + if (!need_value(a, val)) return p; + if (!parse_ll(val, p.offset)) { p.error = "Invalid --offset: " + val; return p; } + } else if (a == "--start") { + if (!need_value(a, val)) return p; + if (!parse_ll(val, p.start)) { p.error = "Invalid --start: " + val; return p; } + p.has_start = true; + } else if (a == "--end") { + if (!need_value(a, val)) return p; + if (!parse_ll(val, p.end)) { p.error = "Invalid --end: " + val; return p; } + p.has_end = true; + } else if (a == "--model") { + if (!need_value(a, val)) return p; + if (val != "tree" && val != "table") { + p.error = "Invalid --model: " + val + " (use tree|table)"; + return p; + } + p.model = val; + } else if (a == "--no-header") { + p.no_header = true; + } else if (a == "-h" || a == "--help") { + p.help = true; + } else if (a == "--version") { + p.version = true; + } else if (!a.empty() && a[0] == '-') { + p.error = "Unknown flag: " + a; + return p; + } else { + // First bare token is the file path; extra positionals are an error. + if (p.file.empty()) p.file = a; + else { p.error = "Unexpected argument: " + a; return p; } + } + } + return p; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: Build and run tests to verify they pass** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.*:RunCliTest.*` +Expected: all PASS. + +- [ ] **Step 6: Commit** + +```bash +git add cpp/tools/cli/cli_args.h cpp/tools/cli/cli_args.cc cpp/test/tools/cli_args_test.cc +git commit -m "feat(cpp-tools): add hand-rolled CLI argument parser" +``` + +--- + +### Task 3: Pure output formatting (`output_format`) + +**Files:** +- Create: `cpp/tools/format/output_format.h`, `cpp/tools/format/output_format.cc` +- Test: `cpp/test/tools/output_format_test.cc` + +This layer has **no dependency on the reader**: it operates on pre-stringified +cells plus a parallel vector of column types (used only to decide JSON quoting). + +- [ ] **Step 1: Write the failing tests** — `cpp/test/tools/output_format_test.cc` + +```cpp +#include + +#include +#include + +#include "common/db_common.h" +#include "format/output_format.h" + +using tsfile_cli::OutputFormat; +using tsfile_cli::ParsedArgs; +using tsfile_cli::RowWriter; + +TEST(ResolveFormatTest, AutoUsesTableOnTtyTsvOtherwise) { + EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, true), + OutputFormat::kTable); + EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, false), + OutputFormat::kTsv); + EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kJson, true), + OutputFormat::kJson); +} + +TEST(CsvEscapeTest, QuotesWhenSpecialCharsPresent) { + EXPECT_EQ(tsfile_cli::csv_escape("plain"), "plain"); + EXPECT_EQ(tsfile_cli::csv_escape("a,b"), "\"a,b\""); + EXPECT_EQ(tsfile_cli::csv_escape("she said \"hi\""), + "\"she said \"\"hi\"\"\""); + EXPECT_EQ(tsfile_cli::csv_escape("line\nbreak"), "\"line\nbreak\""); +} + +TEST(JsonEscapeTest, EscapesQuotesBackslashAndControls) { + EXPECT_EQ(tsfile_cli::json_escape("a\"b\\c"), "a\\\"b\\\\c"); + EXPECT_EQ(tsfile_cli::json_escape("tab\there"), "tab\\there"); +} + +TEST(TypeNameTest, KnownTypesMapToNames) { + EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::INT64), "INT64"); + EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::STRING), "STRING"); + EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::BOOLEAN), "BOOLEAN"); +} + +TEST(RowWriterTest, TsvWritesHeaderThenRows) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kTsv, {"time", "s1"}, + {common::INT64, common::INT64}, /*no_header=*/false); + w.write({"1", "10"}, {false, false}); + w.write({"2", ""}, {false, true}); + w.finish(); + EXPECT_EQ(out.str(), "time\ts1\n1\t10\n2\t\n"); +} + +TEST(RowWriterTest, NoHeaderSuppressesHeader) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kTsv, {"name"}, {common::STRING}, true); + w.write({"table1"}, {false}); + w.finish(); + EXPECT_EQ(out.str(), "table1\n"); +} + +TEST(RowWriterTest, CsvEscapesCells) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kCsv, {"name"}, {common::STRING}, false); + w.write({"a,b"}, {false}); + w.finish(); + EXPECT_EQ(out.str(), "name\n\"a,b\"\n"); +} + +TEST(RowWriterTest, JsonNumbersUnquotedStringsQuotedNullEmitted) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kJson, {"time", "name"}, + {common::INT64, common::STRING}, false); + w.write({"5", "dev1"}, {false, false}); + w.write({"6", ""}, {false, true}); + w.finish(); + EXPECT_EQ(out.str(), + "{\"time\":5,\"name\":\"dev1\"}\n" + "{\"time\":6,\"name\":null}\n"); +} + +TEST(RowWriterTest, TableAlignsColumns) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kTable, {"name", "type"}, + {common::STRING, common::STRING}, false); + w.write({"s1", "INT64"}, {false, false}); + w.write({"longname", "BOOLEAN"}, {false, false}); + w.finish(); + EXPECT_EQ(out.str(), + "name type\n" + "s1 INT64\n" + "longname BOOLEAN\n"); +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Run: `cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=*Format*:RowWriterTest.*:*EscapeTest*:TypeNameTest.*` +Expected: compile failure (`format/output_format.h` missing) — red. + +- [ ] **Step 3: Create `cpp/tools/format/output_format.h`** + +```cpp +#ifndef TSFILE_CLI_OUTPUT_FORMAT_H +#define TSFILE_CLI_OUTPUT_FORMAT_H + +#include +#include +#include + +#include "cli/cli_args.h" +#include "common/db_common.h" + +namespace tsfile_cli { + +enum class OutputFormat { kCsv, kTsv, kJson, kTable }; + +// kAuto resolves to kTable on a TTY, kTsv otherwise. Other values pass through. +OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty); + +// Stable display name for every TSDataType value (does not assert). +const char* tsdatatype_name(common::TSDataType t); + +std::string csv_escape(const std::string& field); +std::string json_escape(const std::string& s); + +// Writes rows in the chosen format. Cells are pre-stringified; `types` is used +// only by the JSON formatter to decide whether a value is emitted bare +// (numeric/boolean) or quoted (everything else). For kTable, rows are buffered +// and flushed (column-aligned) by finish(). +class RowWriter { + public: + RowWriter(std::ostream& out, OutputFormat fmt, + std::vector header, + std::vector types, bool no_header); + void write(const std::vector& cells, + const std::vector& is_null); + void finish(); + + private: + void ensure_header(); // streaming formats: lazily emit header + bool is_numeric(size_t col) const; // JSON: bare vs quoted + + std::ostream& out_; + OutputFormat fmt_; + std::vector header_; + std::vector types_; + bool no_header_; + bool header_done_ = false; + std::vector> rows_; // kTable buffer + std::vector> rows_null_; // kTable buffer +}; + +} // namespace tsfile_cli +#endif // TSFILE_CLI_OUTPUT_FORMAT_H +``` + +- [ ] **Step 4: Create `cpp/tools/format/output_format.cc`** + +```cpp +#include "format/output_format.h" + +#include +#include + +namespace tsfile_cli { + +OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty) { + switch (f) { + case ParsedArgs::Format::kCsv: return OutputFormat::kCsv; + case ParsedArgs::Format::kTsv: return OutputFormat::kTsv; + case ParsedArgs::Format::kJson: return OutputFormat::kJson; + case ParsedArgs::Format::kTable: return OutputFormat::kTable; + case ParsedArgs::Format::kAuto: + default: + return stdout_is_tty ? OutputFormat::kTable : OutputFormat::kTsv; + } +} + +const char* tsdatatype_name(common::TSDataType t) { + switch (t) { + case common::BOOLEAN: return "BOOLEAN"; + case common::INT32: return "INT32"; + case common::INT64: return "INT64"; + case common::FLOAT: return "FLOAT"; + case common::DOUBLE: return "DOUBLE"; + case common::TEXT: return "TEXT"; + case common::VECTOR: return "VECTOR"; + case common::TIMESTAMP: return "TIMESTAMP"; + case common::DATE: return "DATE"; + case common::BLOB: return "BLOB"; + case common::STRING: return "STRING"; + case common::NULL_TYPE: return "NULL"; + default: return "UNKNOWN"; + } +} + +std::string csv_escape(const std::string& field) { + bool needs_quote = field.find_first_of(",\"\n\r") != std::string::npos; + if (!needs_quote) return field; + std::string out = "\""; + for (char c : field) { + if (c == '"') out += "\"\""; + else out += c; + } + out += "\""; + return out; +} + +std::string json_escape(const std::string& s) { + std::string out; + out.reserve(s.size() + 2); + for (unsigned char c : s) { + switch (c) { + case '"': out += "\\\""; break; + case '\\': out += "\\\\"; break; + case '\b': out += "\\b"; break; + case '\f': out += "\\f"; break; + case '\n': out += "\\n"; break; + case '\r': out += "\\r"; break; + case '\t': out += "\\t"; break; + default: + if (c < 0x20) { + char buf[8]; + std::snprintf(buf, sizeof(buf), "\\u%04x", c); + out += buf; + } else { + out += static_cast(c); + } + } + } + return out; +} + +RowWriter::RowWriter(std::ostream& out, OutputFormat fmt, + std::vector header, + std::vector types, bool no_header) + : out_(out), + fmt_(fmt), + header_(std::move(header)), + types_(std::move(types)), + no_header_(no_header) {} + +bool RowWriter::is_numeric(size_t col) const { + if (col >= types_.size()) return false; + switch (types_[col]) { + case common::BOOLEAN: + case common::INT32: + case common::INT64: + case common::FLOAT: + case common::DOUBLE: + case common::TIMESTAMP: + return true; + default: + return false; + } +} + +void RowWriter::ensure_header() { + if (header_done_) return; + header_done_ = true; + if (no_header_) return; + const char sep = (fmt_ == OutputFormat::kCsv) ? ',' : '\t'; + for (size_t i = 0; i < header_.size(); ++i) { + if (i) out_ << sep; + out_ << (fmt_ == OutputFormat::kCsv ? csv_escape(header_[i]) : header_[i]); + } + out_ << "\n"; +} + +void RowWriter::write(const std::vector& cells, + const std::vector& is_null) { + if (fmt_ == OutputFormat::kTable) { + rows_.push_back(cells); + rows_null_.push_back(is_null); + return; + } + if (fmt_ == OutputFormat::kJson) { + out_ << "{"; + for (size_t i = 0; i < header_.size(); ++i) { + if (i) out_ << ","; + out_ << "\"" << json_escape(header_[i]) << "\":"; + if (i < is_null.size() && is_null[i]) { + out_ << "null"; + } else if (is_numeric(i)) { + out_ << (i < cells.size() ? cells[i] : "null"); + } else { + out_ << "\"" << json_escape(i < cells.size() ? cells[i] : "") + << "\""; + } + } + out_ << "}\n"; + return; + } + // csv / tsv + ensure_header(); + const char sep = (fmt_ == OutputFormat::kCsv) ? ',' : '\t'; + for (size_t i = 0; i < cells.size(); ++i) { + if (i) out_ << sep; + bool null_cell = i < is_null.size() && is_null[i]; + if (null_cell) continue; // empty field + out_ << (fmt_ == OutputFormat::kCsv ? csv_escape(cells[i]) : cells[i]); + } + out_ << "\n"; +} + +void RowWriter::finish() { + if (fmt_ != OutputFormat::kTable) return; + const size_t ncols = header_.size(); + std::vector width(ncols, 0); + if (!no_header_) { + for (size_t i = 0; i < ncols; ++i) width[i] = header_[i].size(); + } + for (const auto& row : rows_) { + for (size_t i = 0; i < ncols && i < row.size(); ++i) { + width[i] = std::max(width[i], row[i].size()); + } + } + auto emit = [&](const std::vector& cells, + const std::vector& nulls) { + for (size_t i = 0; i < ncols; ++i) { + std::string cell = + (i < cells.size() && !(i < nulls.size() && nulls[i])) ? cells[i] + : ""; + out_ << cell; + if (i + 1 < ncols) { + out_ << std::string(width[i] - cell.size() + 2, ' '); + } + } + out_ << "\n"; + }; + if (!no_header_) { + std::vector no_nulls(ncols, false); + emit(header_, no_nulls); + } + for (size_t r = 0; r < rows_.size(); ++r) emit(rows_[r], rows_null_[r]); +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: Build and run tests to verify they pass** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=*Format*:RowWriterTest.*:*EscapeTest*:TypeNameTest.*` +Expected: all PASS. + +- [ ] **Step 6: Commit** + +```bash +git add cpp/tools/format/output_format.h cpp/tools/format/output_format.cc cpp/test/tools/output_format_test.cc +git commit -m "feat(cpp-tools): add pure output formatters (csv/tsv/json/table)" +``` + +--- + +### Task 4: `ResultSet` pump (`result_set_format`) + +**Files:** +- Create: `cpp/tools/format/result_set_format.h`, `cpp/tools/format/result_set_format.cc` + +This layer converts a live `storage::ResultSet` into formatted rows. It is +exercised end-to-end by the command tests (Tasks 5-8); it has no standalone unit +test because constructing a `ResultSet` requires a real file. Keep the typed +extraction here and out of the pure layer. + +- [ ] **Step 1: Create `cpp/tools/format/result_set_format.h`** + +```cpp +#ifndef TSFILE_CLI_RESULT_SET_FORMAT_H +#define TSFILE_CLI_RESULT_SET_FORMAT_H + +#include +#include + +#include "common/db_common.h" +#include "format/output_format.h" +#include "reader/result_set.h" + +namespace tsfile_cli { + +// Stringifies one cell (column index is 1-based, per ResultSetMetadata). +// Caller must have checked is_null() first. +std::string cell_to_string(storage::ResultSet* rs, uint32_t col_index, + common::TSDataType type); + +// Pumps every row of `rs` into `out` using `fmt`. Reads column names/types from +// the result set metadata. Returns 0 on success or a non-zero error code if the +// underlying ResultSet::next() fails. +int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out); + +} // namespace tsfile_cli +#endif // TSFILE_CLI_RESULT_SET_FORMAT_H +``` + +- [ ] **Step 2: Create `cpp/tools/format/result_set_format.cc`** + +```cpp +#include "format/result_set_format.h" + +#include +#include +#include + +#include "utils/errno_define.h" // common::E_OK + +namespace tsfile_cli { + +std::string cell_to_string(storage::ResultSet* rs, uint32_t i, + common::TSDataType type) { + std::ostringstream ss; + switch (type) { + case common::BOOLEAN: + return rs->get_value(i) ? "true" : "false"; + case common::INT32: + ss << rs->get_value(i); + return ss.str(); + case common::INT64: + case common::TIMESTAMP: + ss << rs->get_value(i); + return ss.str(); + case common::FLOAT: + ss << rs->get_value(i); + return ss.str(); + case common::DOUBLE: + ss << rs->get_value(i); + return ss.str(); + case common::DATE: { + std::tm d = rs->get_value(i); + char buf[16]; + std::snprintf(buf, sizeof(buf), "%04d-%02d-%02d", d.tm_year + 1900, + d.tm_mon + 1, d.tm_mday); + return buf; + } + case common::TEXT: + case common::STRING: + case common::BLOB: { + common::String* s = rs->get_value(i); + return s == nullptr ? std::string() : s->to_std_string(); + } + default: + return ""; + } +} + +int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out) { + auto meta = rs->get_metadata(); + const uint32_t ncol = meta->get_column_count(); + std::vector header; + std::vector types; + header.reserve(ncol); + types.reserve(ncol); + for (uint32_t i = 1; i <= ncol; ++i) { + header.push_back(meta->get_column_name(i)); + types.push_back(meta->get_column_type(i)); + } + + RowWriter writer(out, fmt, header, types, no_header); + bool has_next = false; + int code = common::E_OK; + while ((code = rs->next(has_next)) == common::E_OK && has_next) { + std::vector cells(ncol); + std::vector nulls(ncol, false); + for (uint32_t i = 1; i <= ncol; ++i) { + if (rs->is_null(i)) { + nulls[i - 1] = true; + } else { + cells[i - 1] = cell_to_string(rs, i, types[i - 1]); + } + } + writer.write(cells, nulls); + } + writer.finish(); + return code; +} + +} // namespace tsfile_cli +``` + +> **Note:** `common::E_OK` is defined in `cpp/src/utils/errno_define.h` (and is +> also pulled in transitively by `reader/result_set.h`). The explicit include +> above keeps the source self-documenting. + +- [ ] **Step 3: Build to verify it compiles** (no test yet; covered in Task 5) + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5` +Expected: build succeeds (the new `.cc` is picked up by the tools glob). + +- [ ] **Step 4: Commit** + +```bash +git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc +git commit -m "feat(cpp-tools): add ResultSet-to-rows pump layer" +``` + +--- + +### Task 5: Model detection + `cmd_ls` + reader-open dispatch + +**Files:** +- Create: `cpp/tools/commands/commands.h` +- Create: `cpp/tools/commands/cmd_ls.cc` +- Replace: `cpp/tools/cli/run_cli.cc` (full dispatch + reader open) +- Create: `cpp/test/tools/cli_test_util.h` +- Create: `cpp/test/tools/command_e2e_test.cc` + +- [ ] **Step 1: Create `cpp/tools/commands/commands.h`** + +```cpp +#ifndef TSFILE_CLI_COMMANDS_H +#define TSFILE_CLI_COMMANDS_H + +#include + +#include "cli/cli_args.h" +#include "format/output_format.h" + +namespace storage { +class TsFileReader; +} + +namespace tsfile_cli { + +// Returns true if the file should be treated as table-model. Honors +// args.model ("tree"/"table"); otherwise detects via table schemas presence. +bool is_table_model(const ParsedArgs& args, storage::TsFileReader& reader); + +// Every command writes data to `out`, diagnostics to `err`, and returns an +// exit code from exit_codes.h. +int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_select(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); + +} // namespace tsfile_cli +#endif // TSFILE_CLI_COMMANDS_H +``` + +- [ ] **Step 2: Create `cpp/tools/commands/cmd_ls.cc`** + +```cpp +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +bool is_table_model(const ParsedArgs& args, storage::TsFileReader& reader) { + if (args.model == "tree") return false; + if (args.model == "table") return true; + return !reader.get_all_table_schemas().empty(); +} + +int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + std::vector names; + if (is_table_model(args, reader)) { + for (auto& ts : reader.get_all_table_schemas()) { + if (ts) names.push_back(ts->get_table_name()); + } + } else { + for (auto& dev : reader.get_all_device_ids()) { + if (dev) names.push_back(dev->get_device_name()); + } + } + RowWriter w(out, fmt, {"name"}, {common::STRING}, args.no_header); + for (const std::string& n : names) { + w.write({n}, {false}); + } + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 3: Replace `cpp/tools/cli/run_cli.cc`** with the full version + +```cpp +#include "cli/run_cli.h" + +#include +#include + +#include "cli/cli_args.h" +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "format/output_format.h" +#include "reader/tsfile_reader.h" + +#ifdef _WIN32 +#include +#define TSFILE_ISATTY _isatty +#define TSFILE_FILENO _fileno +#else +#include +#define TSFILE_ISATTY isatty +#define TSFILE_FILENO fileno +#endif + +#ifndef TSFILE_CLI_VERSION +#define TSFILE_CLI_VERSION "unknown" +#endif + +namespace tsfile_cli { + +namespace { +void print_usage(std::ostream& os) { + os << "Usage: tsfile [options] \n" + "Commands:\n" + " ls list devices (tree) or tables (table)\n" + " schema per-measurement data type/encoding/compression\n" + " stats per-series row count and time range\n" + " head first N rows (use -n)\n" + " cat all rows of a device/table\n" + " select choose columns (-m), time range (--start/--end), " + "limit/offset\n" + "Options: -f/--format csv|tsv|json|table, -d/--device, -t/--table,\n" + " -m/--measurements a,b, -n/--limit, --offset, --start, --end,\n" + " --no-header, --model tree|table, -h/--help, --version\n"; +} + +bool is_known_command(const std::string& c) { + static const std::set kCmds = {"ls", "schema", "stats", + "head", "cat", "select"}; + return kCmds.count(c) != 0; +} +} // namespace + +int run_cli(const std::vector& args, std::ostream& out, + std::ostream& err) { + ParsedArgs p = parse_args(args); + + if (p.version || (!args.empty() && args[0] == "--version")) { + out << "tsfile (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; + return kExitOk; + } + if (args.empty()) { + print_usage(err); + return kExitUsage; + } + if (p.command == "help" || p.command == "--help" || p.command == "-h" || + (p.help && p.file.empty() && !is_known_command(p.command))) { + print_usage(out); + return kExitOk; + } + if (!p.error.empty()) { + err << "Error: " << p.error << "\n"; + print_usage(err); + return kExitUsage; + } + if (!is_known_command(p.command)) { + err << "Unknown command: " << p.command << "\n"; + print_usage(err); + return kExitUsage; + } + if (p.file.empty()) { + err << "Error: missing argument\n"; + return kExitUsage; + } + + storage::libtsfile_init(); + storage::TsFileReader reader; + int open_ret = reader.open(p.file); + if (open_ret != 0) { + err << "Error: cannot open or corrupted file: " << p.file << "\n"; + return kExitFile; + } + + bool stdout_tty = TSFILE_ISATTY(TSFILE_FILENO(stdout)) != 0; + OutputFormat fmt = resolve_format(p.format, stdout_tty); + + int code; + if (p.command == "ls") { + code = cmd_ls(p, reader, fmt, out, err); + } else { + // Filled in by Tasks 6-8 (schema/stats/head/cat/select). + err << "Error: command not yet implemented: " << p.command << "\n"; + code = kExitUsage; + } + + reader.close(); + return code; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 4: Create `cpp/test/tools/cli_test_util.h`** (table-model fixture writer) + +```cpp +#ifndef TSFILE_CLI_TEST_UTIL_H +#define TSFILE_CLI_TEST_UTIL_H + +#include + +#include +#include + +namespace tsfile_cli_test { + +// Writes a small table-model fixture and returns its path. Table "table1": +// TAG columns id1,id2 (STRING) + FIELD column s1 (INT64); 5 rows, ts=0..4, +// s1 = row*10. +inline std::string write_table_fixture( + const std::string& path = "tsfile_cli_fixture.tsfile") { + storage::libtsfile_init(); + std::string table_name = "table1"; + + storage::WriteFile file; + int flags = O_WRONLY | O_CREAT | O_TRUNC; +#ifdef _WIN32 + flags |= O_BINARY; +#endif + file.create(path, flags, 0666); + + auto* schema = new storage::TableSchema( + table_name, + { + common::ColumnSchema("id1", common::STRING, common::UNCOMPRESSED, + common::PLAIN, common::ColumnCategory::TAG), + common::ColumnSchema("id2", common::STRING, common::UNCOMPRESSED, + common::PLAIN, common::ColumnCategory::TAG), + common::ColumnSchema("s1", common::INT64, common::UNCOMPRESSED, + common::PLAIN, common::ColumnCategory::FIELD), + }); + + auto* writer = new storage::TsFileTableWriter(&file, schema); + storage::Tablet tablet( + table_name, {"id1", "id2", "s1"}, + {common::STRING, common::STRING, common::INT64}, + {common::ColumnCategory::TAG, common::ColumnCategory::TAG, + common::ColumnCategory::FIELD}, + 10); + for (int row = 0; row < 5; ++row) { + tablet.add_timestamp(row, static_cast(row)); + tablet.add_value(row, "id1", "id1_field_1"); + tablet.add_value(row, "id2", "id2_field_2"); + tablet.add_value(row, "s1", static_cast(row * 10)); + } + writer->write_table(tablet); + writer->flush(); + writer->close(); + + delete writer; + delete schema; + return path; +} + +} // namespace tsfile_cli_test +#endif // TSFILE_CLI_TEST_UTIL_H +``` + +> **If the fixture fails to compile** (a transitively-included type is missing), +> add the explicit header — `common/tablet.h` for `Tablet`, `file/write_file.h` +> for `WriteFile`, `common/schema.h` for `TableSchema`/`ColumnSchema`. The +> `examples/cpp_examples/demo_write.cpp` compiles with just the table-writer +> include, so start minimal. + +- [ ] **Step 5: Create `cpp/test/tools/command_e2e_test.cc`** + +```cpp +#include + +#include +#include +#include + +#include "cli/run_cli.h" +#include "cli_test_util.h" + +namespace { +struct Fixture { + std::string path = tsfile_cli_test::write_table_fixture(); + ~Fixture() { std::remove(path.c_str()); } +}; +} // namespace + +TEST(CliE2E, LsListsTableNameTsv) { + Fixture f; + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"ls", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "name\ntable1\n"); + EXPECT_TRUE(err.str().empty()); +} + +TEST(CliE2E, LsNoHeaderJustName) { + Fixture f; + std::ostringstream out, err; + int code = + tsfile_cli::run_cli({"ls", "-f", "tsv", "--no-header", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "table1\n"); +} + +TEST(CliE2E, OpenMissingFileReturnsFileError) { + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"ls", "definitely_missing.tsfile"}, out, err); + EXPECT_EQ(code, 2); + EXPECT_FALSE(err.str().empty()); +} +``` + +- [ ] **Step 6: Build and run tests** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -8 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*` +Expected: 3 tests PASS. + +Run: `cd cpp && ./build/Debug/bin/tsfile ls -f tsv examples/test_cpp.tsfile` +Expected: prints `name` then `table1` (the bundled example is table-model). + +- [ ] **Step 7: Commit** + +```bash +git add cpp/tools/commands/commands.h cpp/tools/commands/cmd_ls.cc cpp/tools/cli/run_cli.cc cpp/test/tools/cli_test_util.h cpp/test/tools/command_e2e_test.cc +git commit -m "feat(cpp-tools): implement model detection and 'ls' command" +``` + +--- + +### Task 6: `cmd_schema` (+ encoding/compression name helpers) + +**Files:** +- Modify: `cpp/tools/format/output_format.h` / `.cc` (add `tsencoding_name`, `compression_name`) +- Create: `cpp/tools/commands/cmd_schema.cc` +- Modify: `cpp/tools/cli/run_cli.cc` (dispatch `schema`) +- Modify: `cpp/test/tools/output_format_test.cc`, `cpp/test/tools/command_e2e_test.cc` + +`schema` emits a uniform 5-column shape `target, measurement, datatype, encoding, +compression`. Name + type come from `get_timeseries_metadata()` (works for both +models). Encoding/compression are enriched from `get_timeseries_schema()` for +tree-model files; for table-model files those two columns are blank (no public +getter on `TableSchema`). + +- [ ] **Step 1: Add failing unit tests** — append to `cpp/test/tools/output_format_test.cc` + +```cpp +TEST(EncodingNameTest, KnownEncodings) { + EXPECT_STREQ(tsfile_cli::tsencoding_name(common::PLAIN), "PLAIN"); + EXPECT_STREQ(tsfile_cli::tsencoding_name(common::TS_2DIFF), "TS_2DIFF"); + EXPECT_STREQ(tsfile_cli::tsencoding_name(common::SPRINTZ), "SPRINTZ"); +} + +TEST(CompressionNameTest, KnownCompressors) { + EXPECT_STREQ(tsfile_cli::compression_name(common::UNCOMPRESSED), + "UNCOMPRESSED"); + EXPECT_STREQ(tsfile_cli::compression_name(common::SNAPPY), "SNAPPY"); + EXPECT_STREQ(tsfile_cli::compression_name(common::LZ4), "LZ4"); +} +``` + +- [ ] **Step 2: Add declarations** to `cpp/tools/format/output_format.h` (after `tsdatatype_name`) + +```cpp +const char* tsencoding_name(common::TSEncoding e); +const char* compression_name(common::CompressionType c); +``` + +- [ ] **Step 3: Add definitions** to `cpp/tools/format/output_format.cc` (after `tsdatatype_name`) + +```cpp +const char* tsencoding_name(common::TSEncoding e) { + switch (e) { + case common::PLAIN: return "PLAIN"; + case common::DICTIONARY: return "DICTIONARY"; + case common::RLE: return "RLE"; + case common::DIFF: return "DIFF"; + case common::TS_2DIFF: return "TS_2DIFF"; + case common::BITMAP: return "BITMAP"; + case common::GORILLA_V1: return "GORILLA_V1"; + case common::REGULAR: return "REGULAR"; + case common::GORILLA: return "GORILLA"; + case common::ZIGZAG: return "ZIGZAG"; + case common::FREQ: return "FREQ"; + case common::SPRINTZ: return "SPRINTZ"; + default: return "UNKNOWN"; + } +} + +const char* compression_name(common::CompressionType c) { + switch (c) { + case common::UNCOMPRESSED: return "UNCOMPRESSED"; + case common::SNAPPY: return "SNAPPY"; + case common::GZIP: return "GZIP"; + case common::LZO: return "LZO"; + case common::SDT: return "SDT"; + case common::PAA: return "PAA"; + case common::PLA: return "PLA"; + case common::LZ4: return "LZ4"; + default: return "UNKNOWN"; + } +} +``` + +- [ ] **Step 4: Create `cpp/tools/commands/cmd_schema.cc`** + +```cpp +#include +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/schema.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + const bool table = is_table_model(args, reader); + RowWriter w(out, fmt, + {"target", "measurement", "datatype", "encoding", "compression"}, + {common::STRING, common::STRING, common::STRING, common::STRING, + common::STRING}, + args.no_header); + + storage::DeviceTimeseriesMetadataMap meta = reader.get_timeseries_metadata(); + for (auto& kv : meta) { + std::string target = kv.first ? kv.first->get_device_name() : ""; + + // Tree-model enrichment: measurement -> (encoding, compression). + std::map> enc_comp; + if (!table && kv.first) { + std::vector ms; + if (reader.get_timeseries_schema(kv.first, ms) == 0) { + for (auto& m : ms) { + enc_comp[m.measurement_name_] = {tsencoding_name(m.encoding_), + compression_name(m.compression_type_)}; + } + } + } + + for (auto& ts : kv.second) { + if (!ts) continue; + std::string m = ts->get_measurement_name().to_std_string(); + std::string dt = tsdatatype_name(ts->get_data_type()); + std::string enc, comp; + auto it = enc_comp.find(m); + if (it != enc_comp.end()) { + enc = it->second.first; + comp = it->second.second; + } + w.write({target, m, dt, enc, comp}, + {false, false, false, enc.empty(), comp.empty()}); + } + } + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: Wire dispatch** — in `cpp/tools/cli/run_cli.cc`, replace the `ls`/else block: + +```cpp + int code; + if (p.command == "ls") { + code = cmd_ls(p, reader, fmt, out, err); + } else { +``` + +with: + +```cpp + int code; + if (p.command == "ls") { + code = cmd_ls(p, reader, fmt, out, err); + } else if (p.command == "schema") { + code = cmd_schema(p, reader, fmt, out, err); + } else { +``` + +- [ ] **Step 6: Add e2e test** — append to `cpp/test/tools/command_e2e_test.cc` + +```cpp +TEST(CliE2E, SchemaShowsFieldColumnAndType) { + Fixture f; + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"schema", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE(out.str().find( + "target\tmeasurement\tdatatype\tencoding\tcompression"), + std::string::npos); + EXPECT_NE(out.str().find("s1"), std::string::npos); + EXPECT_NE(out.str().find("INT64"), std::string::npos); +} +``` + +> **If `SchemaShowsFieldColumnAndType` shows no rows** (i.e. +> `get_timeseries_metadata()` returns empty for a table-model file in this build), +> fall back to deriving name+type from a zero-row probe: +> `reader.queryByRow(table_name, all_measurement_names, /*offset=*/0, +> /*limit=*/0, rs)` and read `rs->get_metadata()`. Keep the 5-column output +> shape; leave encoding/compression blank. + +- [ ] **Step 7: Build and run tests** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:EncodingNameTest.*:CompressionNameTest.*` +Expected: all PASS. + +- [ ] **Step 8: Commit** + +```bash +git add cpp/tools/format/output_format.h cpp/tools/format/output_format.cc cpp/tools/commands/cmd_schema.cc cpp/tools/cli/run_cli.cc cpp/test/tools/output_format_test.cc cpp/test/tools/command_e2e_test.cc +git commit -m "feat(cpp-tools): implement 'schema' command" +``` + +--- + +### Task 7: `cmd_stats` + +**Files:** +- Create: `cpp/tools/commands/cmd_stats.cc` +- Modify: `cpp/tools/cli/run_cli.cc` (dispatch `stats`) +- Modify: `cpp/test/tools/command_e2e_test.cc` + +`stats` emits `target, measurement, count, start_time, end_time` from each +series' `Statistic` (via `get_timeseries_metadata()`). + +- [ ] **Step 1: Create `cpp/tools/commands/cmd_stats.cc`** + +```cpp +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/statistic.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w(out, fmt, + {"target", "measurement", "count", "start_time", "end_time"}, + {common::STRING, common::STRING, common::INT64, common::INT64, + common::INT64}, + args.no_header); + + storage::DeviceTimeseriesMetadataMap meta = reader.get_timeseries_metadata(); + for (auto& kv : meta) { + std::string target = kv.first ? kv.first->get_device_name() : ""; + for (auto& ts : kv.second) { + if (!ts) continue; + std::string m = ts->get_measurement_name().to_std_string(); + storage::Statistic* st = ts->get_statistic(); + if (st != nullptr) { + w.write({target, m, std::to_string(st->get_count()), + std::to_string(st->start_time_), + std::to_string(st->end_time_)}, + {false, false, false, false, false}); + } else { + w.write({target, m, "", "", ""}, + {false, false, true, true, true}); + } + } + } + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 2: Wire dispatch** — in `cpp/tools/cli/run_cli.cc`, add a branch after the `schema` branch: + +```cpp + } else if (p.command == "stats") { + code = cmd_stats(p, reader, fmt, out, err); +``` + +(Place it between the `schema` branch and the final `else`.) + +- [ ] **Step 3: Add e2e test** — append to `cpp/test/tools/command_e2e_test.cc` + +```cpp +TEST(CliE2E, StatsReportsCountAndTimeRange) { + Fixture f; + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"stats", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE(out.str().find( + "target\tmeasurement\tcount\tstart_time\tend_time"), + std::string::npos); + // s1 has 5 rows with timestamps 0..4. + EXPECT_NE(out.str().find("s1\t5\t0\t4"), std::string::npos); +} +``` + +- [ ] **Step 4: Build and run tests** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*` +Expected: all PASS (including the new `StatsReportsCountAndTimeRange`). + +> **If `s1\t5\t0\t4` is not found**, print the raw output +> (`./build/Debug/bin/tsfile stats -f tsv examples/test_cpp.tsfile`) and adjust +> the substring to the actual whitespace/columns — the count(5) and range(0..4) +> values themselves are guaranteed by the fixture. + +- [ ] **Step 5: Commit** + +```bash +git add cpp/tools/commands/cmd_stats.cc cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc +git commit -m "feat(cpp-tools): implement 'stats' command" +``` + +--- + +### Task 8: `cmd_head` / `cmd_cat` / `cmd_select` (row data) + +**Files:** +- Modify: `cpp/tools/format/result_set_format.h` / `.cc` (add offset/limit) +- Modify: `cpp/tools/commands/commands.h` (declare `run_row_query`) +- Create: `cpp/tools/commands/row_query.cc` +- Create: `cpp/tools/commands/cmd_head.cc`, `cmd_cat.cc`, `cmd_select.cc` +- Modify: `cpp/tools/cli/run_cli.cc` (dispatch head/cat/select) +- Modify: `cpp/test/tools/command_e2e_test.cc` + +All three row commands share `run_row_query`, which opens a `ResultSet` (time +range honored via `--start/--end`) and pumps it with client-side offset/limit. +`head` defaults `limit` to 10; `cat`/`select` use the parsed `--limit` +(default unlimited). + +- [ ] **Step 1: Add offset/limit to `write_result_set`** — change the declaration in `cpp/tools/format/result_set_format.h`: + +```cpp +int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out, long long offset = 0, + long long limit = -1); +``` + +and update the definition's loop in `cpp/tools/format/result_set_format.cc` (replace the existing `while` loop and the `RowWriter writer(...)` line onward): + +```cpp + RowWriter writer(out, fmt, header, types, no_header); + bool has_next = false; + int code = common::E_OK; + long long skipped = 0, emitted = 0; + while ((code = rs->next(has_next)) == common::E_OK && has_next) { + if (skipped < offset) { + ++skipped; + continue; + } + if (limit >= 0 && emitted >= limit) break; + std::vector cells(ncol); + std::vector nulls(ncol, false); + for (uint32_t i = 1; i <= ncol; ++i) { + if (rs->is_null(i)) { + nulls[i - 1] = true; + } else { + cells[i - 1] = cell_to_string(rs, i, types[i - 1]); + } + } + writer.write(cells, nulls); + ++emitted; + } + writer.finish(); + return code; +``` + +- [ ] **Step 2: Declare `run_row_query`** in `cpp/tools/commands/commands.h` (before the `cmd_*` declarations): + +```cpp +// Shared by head/cat/select: opens a row ResultSet (honoring --start/--end and +// --device/--table/--measurements) and writes it with client-side offset/limit. +int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err, + long long offset, long long limit); +``` + +- [ ] **Step 3: Create `cpp/tools/commands/row_query.cc`** + +```cpp +#include +#include +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/device_id.h" +#include "common/schema.h" +#include "format/result_set_format.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err, + long long offset, long long limit) { + const int64_t start = + args.has_start ? static_cast(args.start) + : std::numeric_limits::min(); + const int64_t end = args.has_end ? static_cast(args.end) + : std::numeric_limits::max(); + + storage::ResultSet* rs = nullptr; + int qret = 0; + + if (is_table_model(args, reader)) { + std::string table_name = args.table; + if (table_name.empty()) { + auto schemas = reader.get_all_table_schemas(); + if (schemas.empty() || !schemas[0]) { + err << "Error: no table found in file\n"; + return kExitRuntime; + } + table_name = schemas[0]->get_table_name(); + } + std::vector cols = args.measurements; + if (cols.empty()) { + auto ts = reader.get_table_schema(table_name); + if (ts) cols = ts->get_measurement_names(); + } + qret = reader.query(table_name, cols, start, end, rs); + } else { + std::vector devices; + if (!args.device.empty()) { + devices.push_back(args.device); + } else { + for (auto& d : reader.get_all_device_ids()) { + if (d) devices.push_back(d->get_device_name()); + } + } + std::vector paths; + for (const std::string& dev : devices) { + std::vector ms = args.measurements; + if (ms.empty()) { + auto did = std::make_shared(dev); + std::vector sch; + if (reader.get_timeseries_schema(did, sch) == 0) { + for (auto& m : sch) ms.push_back(m.measurement_name_); + } + } + for (const std::string& m : ms) paths.push_back(dev + "." + m); + } + if (paths.empty()) { + err << "Error: no time series found\n"; + return kExitRuntime; + } + qret = reader.query(paths, start, end, rs); + } + + if (qret != 0 || rs == nullptr) { + err << "Error: query failed (code " << qret << ")\n"; + if (rs != nullptr) reader.destroy_query_data_set(rs); + return kExitRuntime; + } + + int wret = write_result_set(rs, fmt, args.no_header, out, offset, limit); + reader.destroy_query_data_set(rs); + return wret == 0 ? kExitOk : kExitRuntime; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 4: Create `cpp/tools/commands/cmd_head.cc`** + +```cpp +#include "commands/commands.h" + +namespace tsfile_cli { +int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + long long limit = args.limit < 0 ? 10 : args.limit; + return run_row_query(args, reader, fmt, out, err, args.offset, limit); +} +} // namespace tsfile_cli +``` + +- [ ] **Step 5: Create `cpp/tools/commands/cmd_cat.cc`** + +```cpp +#include "commands/commands.h" + +namespace tsfile_cli { +int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + return run_row_query(args, reader, fmt, out, err, args.offset, args.limit); +} +} // namespace tsfile_cli +``` + +- [ ] **Step 6: Create `cpp/tools/commands/cmd_select.cc`** + +```cpp +#include "commands/commands.h" + +namespace tsfile_cli { +int cmd_select(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + return run_row_query(args, reader, fmt, out, err, args.offset, args.limit); +} +} // namespace tsfile_cli +``` + +- [ ] **Step 7: Wire dispatch** — in `cpp/tools/cli/run_cli.cc`, add three branches before the final `else`: + +```cpp + } else if (p.command == "head") { + code = cmd_head(p, reader, fmt, out, err); + } else if (p.command == "cat") { + code = cmd_cat(p, reader, fmt, out, err); + } else if (p.command == "select") { + code = cmd_select(p, reader, fmt, out, err); +``` + +- [ ] **Step 8: Add e2e tests** — append to `cpp/test/tools/command_e2e_test.cc` + +```cpp +namespace { +size_t count_lines(const std::string& s) { + size_t n = 0; + for (char c : s) if (c == '\n') ++n; + return n; +} +} // namespace + +TEST(CliE2E, HeadProjectsAndLimits) { + Fixture f; + std::ostringstream out, err; + int code = + tsfile_cli::run_cli({"head", "-m", "s1", "-n", "2", "-f", "tsv", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "time\ts1\n0\t0\n1\t10\n"); +} + +TEST(CliE2E, CatReturnsAllRows) { + Fixture f; + std::ostringstream out, err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "-f", "tsv", f.path}, out, + err); + EXPECT_EQ(code, 0); + // header + 5 data rows + EXPECT_EQ(count_lines(out.str()), 6u); + EXPECT_NE(out.str().find("time\ts1\n"), std::string::npos); +} + +TEST(CliE2E, SelectWithTimeRange) { + Fixture f; + std::ostringstream out, err; + int code = + tsfile_cli::run_cli({"select", "-m", "s1", "--start", "2", "--end", "3", + "-f", "tsv", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); +} + +TEST(CliE2E, SelectJsonIsNdjson) { + Fixture f; + std::ostringstream out, err; + int code = + tsfile_cli::run_cli({"select", "-m", "s1", "--start", "0", "--end", "0", + "-f", "json", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); +} +``` + +- [ ] **Step 9: Build and run tests** + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -8 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*` +Expected: all CliE2E tests PASS. + +> **If a row-order or column-order assertion fails**, print the actual output +> (`./build/Debug/bin/tsfile head -m s1 -n 2 -f tsv examples/test_cpp.tsfile`) +> and align the expected string. The values (ts 0..4, s1 = ts*10) are fixed by +> the fixture; only column/row ordering could differ. + +- [ ] **Step 10: Commit** + +```bash +git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc cpp/tools/commands/commands.h cpp/tools/commands/row_query.cc cpp/tools/commands/cmd_head.cc cpp/tools/commands/cmd_cat.cc cpp/tools/commands/cmd_select.cc cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc +git commit -m "feat(cpp-tools): implement 'head', 'cat', and 'select' row commands" +``` + +--- + +### Task 9: stderr fix, `install()`, full-suite run, manual verification + +**Files:** +- Modify: `cpp/src/file/read_file.cc` (route open-error prints to stderr) +- Modify: `cpp/tools/CMakeLists.txt` (add `install`) + +- [ ] **Step 1: Route open errors to stderr** — in `cpp/src/file/read_file.cc`, change the two `std::cout` lines inside the `if (fd_ < 0)` block (around lines 52-55) to `std::cerr`: + +```cpp + fd_ = ::open(file_path_.c_str(), O_RDONLY); + if (fd_ < 0) { + std::cerr << "open file " << file_path << " error :" << fd_ + << std::endl; + std::cerr << "open error" << errno << " " << strerror(errno) + << std::endl; + return E_FILE_OPEN_ERR; + } +``` + +Rationale: a CLI that emits diagnostics on stdout would corrupt `tsfile cat f | jq`. Errors belong on stderr. + +- [ ] **Step 2: Run the FULL test suite** to confirm the library change causes no regression + +Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test 2>&1 | tail -15` +Expected: all suites PASS (existing reader/file tests + the new `RunCliTest`, `ParseArgsTest`, `RowWriterTest`, `*NameTest`, `*EscapeTest`, `CliE2E`). + +- [ ] **Step 3: Add `install()`** to the end of `cpp/tools/CMakeLists.txt` + +```cmake +install(TARGETS tsfile_cli RUNTIME DESTINATION bin) +``` + +- [ ] **Step 4: Manual verification against the bundled example** (table-model file) + +Run each and confirm behavior: + +```bash +cd cpp +BIN=./build/Debug/bin/tsfile +F=examples/test_cpp.tsfile +$BIN ls -f tsv $F # -> name / table1 +$BIN schema -f tsv $F # -> header + rows incl. s1 INT64 +$BIN stats -f tsv $F # -> count/start/end per series +$BIN head -n 3 -f tsv $F # -> header + 3 rows +$BIN cat -f csv $F | head -n 3 # -> CSV, pipe-clean (no log noise on stdout) +$BIN select -m s1 -f json $F # -> NDJSON: one {"time":..,"s1":..} per line +$BIN cat $F # -> aligned table form (stdout is a TTY) +echo "exit on missing:"; $BIN ls nope.tsfile; echo "rc=$?" # rc=2, error on stderr +``` + +Expected: data on stdout, diagnostics on stderr, exit codes per the table; the TTY run shows aligned columns while the piped run shows TSV/CSV. + +- [ ] **Step 5: Tree-model manual check (only if a tree-model `.tsfile` is available)** + +The automated e2e fixture is table-model. If you have a tree-model file (e.g. produced by `TsFileWriter::write_tree`), verify the tree branch: + +```bash +$BIN ls -f tsv # -> device names, one per line +$BIN schema -f tsv # -> datatype + encoding + compression filled +$BIN cat -d -m -f tsv +``` + +If unavailable, note it in the PR description as untested-by-CI and rely on the shared `run_row_query`/formatter coverage from the table-model tests. + +- [ ] **Step 6: Format check** + +Run: `cd /Users/zhanghongyin/iotdb/tsfile && ./mvnw spotless:apply -P with-cpp 2>&1 | tail -5 && ./mvnw spotless:check -P with-cpp 2>&1 | tail -5` +Expected: clang-format applies cleanly; check passes. (Or run `clang-format -i` over `cpp/tools/**` and `cpp/test/tools/**` if invoking Maven is impractical locally.) + +- [ ] **Step 7: Commit** + +```bash +git add cpp/src/file/read_file.cc cpp/tools/CMakeLists.txt +git commit -m "feat(cpp-tools): install tsfile binary; route open errors to stderr" +``` + +--- + +## Plan self-review (spec coverage) + +| Spec requirement | Covered by | +|---|---| +| Single multi-call `tsfile` binary, git-style dispatch | Task 1 (CMake `OUTPUT_NAME tsfile`, run_cli dispatch) | +| `ls` / `schema` / `stats` / `head` / `cat` / `select` | Tasks 5 / 6 / 7 / 8 | +| Hand-rolled arg parsing, no new deps | Task 2 | +| Data→stdout, diagnostics→stderr | Injected `out`/`err` everywhere; Task 9 lib fix | +| Exit codes 0/1/2/3 | `exit_codes.h` (Task 1); mapped in run_cli (Tasks 1, 5, 8) | +| TTY-adaptive default; `--format csv/tsv/json/table` | `resolve_format` (Task 3); run_cli isatty (Task 5) | +| CSV RFC-4180 quoting; NDJSON; null handling | `csv_escape`/`RowWriter`/`json_escape` (Task 3) | +| tree/table auto-detect + `--model` override | `is_table_model` (Task 5) | +| schema blanks encoding/compression for table model | `cmd_schema` enrichment branch (Task 6) | +| Timestamps as raw epoch | `cell_to_string` INT64 path (Task 4) | +| `BUILD_TOOLS` option; `install()` | Task 1 (option), Task 9 (install) | +| Tests: cli_args, formatters, model detect, e2e | Tasks 2, 3, 5-8 | +| License headers on new files | Conventions section + every Create step | + +**Placeholder scan:** no `TBD`/`TODO`/"implement later" remain; the "filled in by later tasks" branch in run_cli is replaced concretely in Tasks 6-8. **Type consistency:** `ParsedArgs`, `OutputFormat`, `RowWriter` ctor (`out, fmt, header, types, no_header`), `write_result_set(rs, fmt, no_header, out, offset, limit)`, and the `cmd_*`/`run_row_query` signatures are used identically across all tasks. + +**Known residual risks (validated during execution, not blockers):** +1. `get_timeseries_metadata()` yielding rows for table-model files — Task 6/7 notes give a fallback. +2. Exact column/row ordering in row-command output — Tasks 7/8 notes give the adjust-the-string fallback; values are fixture-guaranteed. +3. Fixture compile relying on transitive includes — Task 5 note lists the explicit headers to add. diff --git a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md index 6b0ed5ff6..a8412d7f8 100644 --- a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md +++ b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md @@ -127,8 +127,8 @@ All verbs are read-only and backed by the existing reader API. | Command | Purpose | Backed by | |---|---|---| | `ls` | list devices (tree) or tables (table), one name per line | `get_all_device_ids()` / `get_all_table_schemas()` | -| `schema` | per-measurement data type / encoding / compression | `get_timeseries_schema()` / `get_table_schema()` | -| `stats` | per-series row count, time range, chunk count | `get_timeseries_metadata()` (`Statistics`) | +| `schema` | per-measurement data type / encoding / compression | `get_timeseries_schema()` (tree) / `get_table_schema()` + `get_timeseries_metadata()` (table) | +| `stats` | per-series row count and time range | `get_timeseries_metadata()` (`Statistic`) | | `head` | first N rows | `queryByRow(..., offset=0, limit=N)` | | `cat` | all rows of a device/table | `query()` / `queryByRow(..., limit=-1)` | | `select` | chosen columns + time range + limit/offset | `query(table, cols, start, end, ...)` / tree `query(paths, start, end)` | @@ -160,6 +160,12 @@ adapts: - **Column semantics differ** (tree: device path + measurement; table: table + columns), but **the time column is always column 1** in row output (`ResultSetMetadata` guarantees this). +- **`schema` field availability:** tree-model files expose data type, encoding, + and compression per measurement (via `get_timeseries_schema`). Table-model + files expose column name and data type (via `get_timeseries_metadata`), but + `TableSchema` has no public encoding/compression getter, so those two columns + are emitted blank for table-model files. The output keeps a uniform 5-column + shape (`target, measurement, datatype, encoding, compression`) across models. ## Output formats From 91baa3080f45dc1ec2014c3e0046499a4e877208 Mon Sep 17 00:00:00 2001 From: spricoder Date: Tue, 2 Jun 2026 18:41:59 +0800 Subject: [PATCH 03/41] Update TsFile CLI redesign spec --- .../2026-06-01-tsfile-unix-cli-design.md | 474 ++++++++++++------ 1 file changed, 311 insertions(+), 163 deletions(-) diff --git a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md index a8412d7f8..9710e6dd0 100644 --- a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md +++ b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md @@ -17,207 +17,355 @@ under the License. --> -# Design: A Unix-philosophy command-line interface for TsFile (C++) +# Design: TsFile C++ CLI 重设计 -- **Date**: 2026-06-01 -- **Module**: `cpp/` -- **Status**: Approved design, pending implementation plan +- **日期**:2026-06-02 +- **模块**:`cpp/` +- **状态**:设计已批准,待编写实现计划 +- **调研依据**: + `/Users/zhanghongyin/reasearchNotes/research/tsfile/Report.md` 第 5.3 节, + 以及 + `/Users/zhanghongyin/reasearchNotes/research/tsfile/调研报告/各文件格式CLI工具调研.md` -## Goal +## 目标 -Give TsFile a set of composable, pipeable command-line tools — in the Unix -tradition of small programs that read a file and write machine-parseable data -to stdout. The primary gap today is the **read / inspect / export** side: to -answer "what devices, measurements, schema, and data live in this `.tsfile`?" -a user must write code or read the raw byte layout via the Java -`TsFileSketchTool`. A single `tsfile` binary closes that gap and composes with -`awk`, `jq`, `sort`, and database import tools. +为 TsFile 提供一个单二进制、可组合、适合管道使用的 C++ 命令行工具: -## Scope +```sh +tsfile [options] +tsfile --help | --version +tsfile help +``` -**In scope (v1):** read-only inspection and export verbs — `ls`, `schema`, -`stats`, `head`, `cat`, `select` — shipped as one multi-call C++ binary. +这个 CLI 要让用户能像查看其他自描述数据文件一样查看 `.tsfile`:发现命名空间、查看 +schema 和元数据、预览行、流式导出行、统计行数、抽样行,而不需要自己写 reader 代码。 -**Out of scope (v1, possible follow-ups):** write/convert verbs (the Java -`tools` module already imports CSV/Parquet/Arrow → TsFile), a structure-dump -verb at parity with `TsFileSketchTool`, ISO time formatting, and splitting into -multiple `tsfile-*` binaries. +本次重设计保留 v1 的整体方向,但让命令面更贴近 Parquet 及相近数据格式的工具谱系。 +可见变化是:删除 `select` 动词,新增 `meta`、`count`、`sample`,并把投影、时间范围、 +limit、offset 下沉为行输出命令的共享参数。 -## Why C++ +## 调研结论对设计的约束 -The user wants the most Unix-native form: a single self-contained static binary -with fast startup and no runtime dependency (unlike the JVM-based Java tools or -the Python binding). The C++ read path already exposes everything the verbs -need, so the engine does not change — the work is argument parsing, subcommand -dispatch, and output formatting. +TsFile 同时有两个身份: -## Existing building blocks (no engine changes needed) +1. **像 Parquet 的文件形态**:封存、不可变、自描述、列式,带 footer 元数据、偏移和统计量。 + 因此 Parquet CLI 是最重要的命令设计参照。 +2. **像 HDF5/netCDF 的命名空间**:TsFile 不总是单表文件;tree 模型下有多 device, + table 模型下有多 table。因此它需要一个 `ls` 式命名空间命令。 -`storage::TsFileReader` (`cpp/src/reader/tsfile_reader.h`) already provides: +CLI 调研把不可变数据文件的只读工具谱系统一为: -| Need | API | -|---|---| -| list devices (tree) | `get_all_device_ids()` / `get_all_devices()` | -| list tables (table) | `get_all_table_schemas()` | -| per-device measurement schema | `get_timeseries_schema(device_id, &out)` | -| per-table schema | `get_table_schema(name)` | -| per-series statistics | `get_timeseries_metadata()` (carries `Statistics`) | -| rows with offset/limit pushdown | `queryByRow(...)` (tree & table overloads) | -| rows by time range / columns | `query(...)` (tree & table overloads) | -| row iteration + column metadata | `ResultSet` + `ResultSetMetadata` (`cpp/src/reader/result_set.h`) | +```text +schema | meta(/footer/stats) | head(/cat) | count | sample +``` + +Parquet 是最完整模板:Apache `parquet-cli` 提供 `schema`、`meta`、`footer`、`head`、 +`cat` 以及索引/统计命令;Rust `pqrs` 补齐了特别有用的 `rowcount` 和 `sample`。 +ORC 与 Avro 也印证同一模式:官方工具提供 `meta`/`data`/`count`、`getschema`/ +`getmeta`/`cat`/`count`。HDF5 和 netCDF 则提供命名空间与 header 经验:`h5ls`、 +`h5dump -H`、`ncdump -h` 的价值在于不用打开应用就能查看文件内部结构。 + +映射到 TsFile 后,除了五动词谱系,还需要额外保留 `ls`,因为 TsFile 文件内部存在 +device/table 命名空间。 + +## 范围 + +本次重设计包含: + +- 一个名为 `tsfile` 的多命令二进制。 +- 只读命令:`ls`、`schema`、`meta`、`stats`、`head`、`cat`、`count`、`sample`。 +- 输出格式、模型选择、列投影、行数限制、offset、时间范围等共享参数。 +- 基于现有 `storage::TsFileReader` 读路径实现。 +- 遵守 Unix 风格:数据输出到 stdout,诊断和错误输出到 stderr,便于接入 `awk`、`jq`、 + `sort`、导入工具和 shell 管道。 + +本次重设计不包含: + +- 写入、转换、合并、重写命令。 +- 与 Java `TsFileSketchTool` 完全等价的字节结构 dump。 +- FUSE 挂载、DuckDB/ClickHouse/VisiData connector 或 SQL replacement scan。 +- ISO 时间格式化,以及超出时间范围和 measurement 投影的复杂谓词。 +- 拆分为多个 `tsfile-*` 二进制。 + +## 命令谱系 + +命令集合对齐 Parquet/ORC/Avro 的只读谱系,并吸收 HDF5/netCDF 的命名空间查看能力。 + +| 动词 | 谱系来源 | 目的 | 主要 reader 支撑 | +|---|---|---|---| +| `ls` | `h5ls`、`ncdump -h`;Parquet 通常不需要 | tree 模型列 device,table 模型列 table,一行一个名字 | `get_all_device_ids()`、`get_all_table_schemas()` | +| `schema` | `parquet-cli schema`、Avro `getschema`、SQL `DESCRIBE` | 输出序列或列的类型信息 | `get_timeseries_schema()`、`get_table_schema()` | +| `meta` | `parquet-cli meta/footer`、Avro `getmeta`、DuckDB metadata 函数 | 输出文件级摘要:模型、版本、命名空间规模、全局时间范围、Bloom filter、文件大小 | reader 元数据和文件系统元数据 | +| `stats` | `parquet-cli column-index/check-stats`、ORC statistics、SQL `SUMMARIZE` | 输出每条序列的 count、时间范围、min、max、first、last、sum | `get_timeseries_metadata()` 统计量 | +| `head` | `parquet-cli head`、`pqrs head`、SQL `LIMIT` | 输出前 N 行 | 共享 row query 路径 | +| `cat` | `parquet-cli cat/scan`、Avro `cat`/`tojson`、ORC `data` | 流式输出匹配行 | 共享 row query 路径 | +| `count` | `pqrs rowcount`、ORC `count`、Avro `count`、SQL `count(*)` | 不扫描数据页,输出序列或作用域内行数 | `get_timeseries_metadata()` 统计量 | +| `sample` | `pqrs sample`、SQL sampling | 输出可复现样本行 | 共享 row query 路径加确定性抽样 | + +`select` 不再作为独立动词。它实际承载的是投影、时间过滤、limit 和 offset;这些能力应作为 +`head`、`cat`、`sample` 的共享参数存在。这也更接近 Parquet 工具把列选择挂到 +行输出命令上的习惯。 + +## 命令语义 -`result_set.h` also contains `print_table_result_set`, which already iterates -columns and rows and formats each value by `TSDataType` (INT32/INT64/FLOAT/ -DOUBLE/BOOLEAN/TEXT/STRING). The `tsv`/`table` formatters extend this pattern; -`csv`/`json` reuse the same type-dispatch. +### `ls` -## Architecture +`ls` 输出顶层逻辑命名空间: -### Code location +- tree 模型:每行一个 device ID; +- table 模型:每行一个 table name。 -A new `cpp/tools/` directory, parallel to `cpp/examples/` (which is the existing -template for "an executable that links `libtsfile`"). +默认输出刻意保持简单稳定,便于管道处理。measurement 或 column 级细节由 `schema` 负责。 +### `schema` + +`schema` 输出统一的逻辑 schema 表: + +```text +target, measurement, datatype, encoding, compression ``` -cpp/tools/ -├── CMakeLists.txt -├── tsfile_cli.cc # main: parse top level, dispatch subcommand -├── cli_args.h / cli_args.cc # minimal hand-rolled option parser (no new deps) -├── output_format.h / output_format.cc # csv / tsv / json(NDJSON) / table formatters -└── commands/ - ├── command.h # subcommand interface: name(), run(args), help() - ├── cmd_ls.cc cmd_schema.cc cmd_stats.cc - └── cmd_head.cc cmd_cat.cc cmd_select.cc + +tree 模型下,`target` 是 device,`measurement` 是测点。table 模型下,`target` 是 table, +`measurement` 是列名。若当前公开 API 能拿到 datatype 但拿不到 encoding/compression, +CSV/TSV 输出空字段,JSON 输出 `null`。 + +### `meta` + +`meta` 输出无需解码数据页即可回答的文件级信息。目标字段为: + +```text +file, model, version, device_count, table_count, series_count, +start_time, end_time, bloom_filter, file_size_bytes ``` -### CLI shape — single multi-call binary, git-style dispatch +它是 TsFile 对 Parquet `meta`/`footer` 的对应命令:先快速了解文件,再决定是否继续查看 +schema、stats 或行数据。若某个文件级字段当前公开 reader API 无法直接暴露,实现时应输出 +空值而不是扫描数据页。 -```sh -tsfile [options] -tsfile --help | --version -tsfile help +### `stats` + +`stats` 输出每条序列的统计量: + +```text +target, measurement, count, start_time, end_time, +min, max, first, last, sum ``` -Argument parsing is hand-rolled. The verbs are simple, the project targets -C++11, and a goal is to keep the binary free of new third-party/runtime -dependencies, consistent with the rest of the C++ module. +这直接暴露 TsFile 的格式优势:Chunk/Page 级统计量包含 count 和数值摘要,很多查看问题 +不需要读取或解码数据页。 + +### `head` 与 `cat` + +`head` 和 `cat` 是行输出命令: + +- `head` 默认输出前 10 行,并接受 `-n, --limit` 覆盖行数。 +- `cat` 默认流式输出全部匹配行,除非显式指定 limit。 +- 两者都通过共享 row query 路径接受投影(`--measurements`)和时间范围(`--start`、 + `--end`)。 + +`head` 是面向用户的便捷命令,本质上等价于带默认 limit 的 `cat`。 -### Unix discipline (applies to every command) +### `count` -- **Data goes to stdout; diagnostics, progress, and errors go to stderr.** This - is what lets `tsfile cat f.tsfile | jq` work without log noise on stdout. -- **Exit codes are meaningful:** `0` success, `1` usage error, `2` cannot open - / corrupted file, `3` query or runtime error. -- The library currently prints open errors to stdout (`ReadFile::open`, - `cpp/src/file/read_file.cc:52`). Along the CLI path these must go to stderr so - they do not corrupt piped output. (Small, contained fix.) +`count` 从统计量中读取行数,不通过 row iterator 扫描数据。这是 TsFile 可以优于常见 +Parquet CLI 表面的地方:`parquet-cli` 没有独立 row-count 命令,而 TsFile 的统计量能 +低成本回答 count。 -### Build / packaging +作用域规则: -- New CMake option `BUILD_TOOLS` (default `ON`), producing - `build//bin/tsfile`. -- `install(TARGETS tsfile ...)` so `make install` ships the binary. -- `build.sh` is left unchanged for v1 (it follows CMake defaults); revisit if a - dedicated flag is wanted. +- 不指定作用域:输出所有序列的 count,并在适合的格式中给出总数; +- `--device`:输出某个 tree-model device 下的 count; +- `--table`:输出某个 table-model table 下的 count。 -## Command surface (v1) +### `sample` -All verbs are read-only and backed by the existing reader API. +`sample` 通过共享 row query 和 formatter 输出 N 条样本行,默认 N 为 10,并接受 +`--seed` 保证可复现。 -| Command | Purpose | Backed by | +实现可以使用 reservoir sampling 或确定性 skip 策略。设计要求是:同一文件、作用域、 +投影、时间范围、limit 和 seed 下,输出稳定。 + +## 共享参数 + +| 参数 | 含义 | 适用命令 | |---|---|---| -| `ls` | list devices (tree) or tables (table), one name per line | `get_all_device_ids()` / `get_all_table_schemas()` | -| `schema` | per-measurement data type / encoding / compression | `get_timeseries_schema()` (tree) / `get_table_schema()` + `get_timeseries_metadata()` (table) | -| `stats` | per-series row count and time range | `get_timeseries_metadata()` (`Statistic`) | -| `head` | first N rows | `queryByRow(..., offset=0, limit=N)` | -| `cat` | all rows of a device/table | `query()` / `queryByRow(..., limit=-1)` | -| `select` | chosen columns + time range + limit/offset | `query(table, cols, start, end, ...)` / tree `query(paths, start, end)` | +| `-f, --format csv\|tsv\|json\|table` | 输出格式;默认随 stdout 是否为 TTY 自适应 | 全部 | +| `-d, --device ` | 限定 tree-model device | 行输出命令、`schema`、`stats`、`count` | +| `-t, --table ` | 限定 table-model table | 行输出命令、`schema`、`stats`、`count` | +| `-m, --measurements a,b,c` | measurement 或 column 投影 | `head`、`cat`、`sample` | +| `-n, --limit N` | 最大输出行数;`head` 用它作为行数 | `head`、`cat`、`sample` | +| `--offset N` | 跳过开头 N 行 | `head`、`cat` | +| `--start ` / `--end ` | epoch milliseconds 时间范围,闭区间 | `head`、`cat`、`sample` | +| `--seed N` | 可复现抽样种子 | `sample` | +| `--no-header` | 不输出表头 | 表格类输出 | +| `--model tree\|table` | 强制模型,覆盖自动检测 | 全部 | +| `-h, --help` / `--version` | 帮助和版本 | 顶层和单命令 | + +参数与命令不匹配时按 usage error 处理。例如在非 `sample` 命令使用 `--seed`,或在 +`sample` 命令使用 `--offset`,应返回退出码 `1`,并向 stderr 输出明确错误信息。 + +## Tree 与 table 模型 + +模型检测规则保持自动化: + +```text +get_all_table_schemas() non-empty => table model +otherwise => tree model +``` -### Common flags +`--model tree|table` 可覆盖自动检测。 -| Flag | Meaning | -|---|---| -| `-f, --format csv\|tsv\|json\|table` | output format; default is TTY-adaptive (see below) | -| `-d, --device ` | scope to a device (tree model) | -| `-t, --table ` | scope to a table (table model) | -| `-m, --measurements s1,s2` | select columns | -| `-n, --limit N` | max rows (`head` is sugar for `--limit`) | -| `--offset N` | skip leading rows | -| `--start ` / `--end ` | time range; v1 accepts epoch milliseconds | -| `--no-header` | suppress the header row | -| `--model tree\|table` | force a model (override auto-detection) | -| `-h, --help` / `--version` | usage / version | - -## Tree vs. table model handling - -A `.tsfile` is written in one of two data models. The CLI auto-detects and -adapts: - -- **Detection:** `get_all_table_schemas()` non-empty ⇒ **table** model; otherwise - **tree** model. `--model` overrides for edge cases. -- **`ls`:** tree ⇒ one device ID per line; table ⇒ one table name per line. - One item per line keeps it pipe-friendly; per-column detail lives in `schema`. -- **Column semantics differ** (tree: device path + measurement; table: table + - columns), but **the time column is always column 1** in row output - (`ResultSetMetadata` guarantees this). -- **`schema` field availability:** tree-model files expose data type, encoding, - and compression per measurement (via `get_timeseries_schema`). Table-model - files expose column name and data type (via `get_timeseries_metadata`), but - `TableSchema` has no public encoding/compression getter, so those two columns - are emitted blank for table-model files. The output keeps a uniform 5-column - shape (`target, measurement, datatype, encoding, compression`) across models. - -## Output formats - -- **`table`** (human): aligned columns. -- **`tsv`** (pipe): tab-separated, header row first (unless `--no-header`). -- **TTY-adaptive default:** when stdout is a terminal, default to `table`; when - piped or redirected, default to `tsv`. `--format` always overrides. This - mirrors the behavior of `git` and `ls`. -- **`csv`:** RFC 4180 quoting (quote fields containing delimiter, quote, or - newline; double embedded quotes). -- **`json`:** **NDJSON** — one JSON object per row, newline-delimited — chosen - for streaming and `jq -c` friendliness over a single large array. -- **Null handling:** empty field in CSV/TSV; `null` in JSON. -- **Timestamps:** v1 emits the raw stored epoch (INT64). `--time-format iso` is - a deliberate follow-up. - -## Error handling & exit codes - -| Exit | Condition | +统一命令面下的行为: + +- `ls` 在 tree 文件中列 device,在 table 文件中列 table。 +- `schema`、`stats`、`count` 可用 `--device` 或 `--table` 收窄作用域。 +- 行输出始终把时间列视为第一列。 +- tree 模型行输出使用 device + measurements;table 模型行输出使用 table + columns。 + +## 输出格式 + +保留 v1 formatter 设计: + +- `table`:面向人的对齐表格;stdout 是终端时默认使用。 +- `tsv`:tab 分隔;stdout 被 pipe 或 redirect 时默认使用。 +- `csv`:按 RFC 4180 引号规则输出。字段包含分隔符、引号或换行时加引号,内部引号双写。 +- `json`:NDJSON,一行一个 JSON object。 + +null 在 CSV/TSV 中输出为空字段,在 JSON 中输出为 `null`。时间戳输出存储中的 epoch +milliseconds 整数。ISO 时间格式是后续工作。 + +数据输出到 stdout;诊断、usage、错误输出到 stderr。 + +## 退出码 + +| 退出码 | 条件 | |---|---| -| `0` | success | -| `1` | usage / argument error (unknown command, bad flag, missing file arg) | -| `2` | file cannot be opened or is corrupted (`E_FILE_OPEN_ERR`, `E_TSFILE_CORRUPTED`) | -| `3` | query / runtime error | +| `0` | 成功 | +| `1` | usage 或参数错误 | +| `2` | 文件打不开或文件损坏 | +| `3` | 查询或运行时错误 | + +`ReadFile::open` 中当前会向 stdout 打印打开错误(`cpp/src/file/read_file.cc`)。CLI +路径必须避免污染 stdout,应改为向 stderr 输出诊断。 + +## 架构与 v1 迁移 + +当前未提交的 v1 实现已经形成合理边界: + +```text +cpp/tools/ +├── CMakeLists.txt +├── tools_main.cc +├── cli/ +│ ├── cli_args.h +│ ├── cli_args.cc +│ ├── run_cli.h +│ ├── run_cli.cc +│ └── exit_codes.h +├── format/ +│ ├── output_format.h +│ ├── output_format.cc +│ ├── result_set_format.h +│ └── result_set_format.cc +└── commands/ + ├── commands.h + ├── row_query.cc + ├── cmd_ls.cc + ├── cmd_schema.cc + ├── cmd_stats.cc + ├── cmd_head.cc + ├── cmd_cat.cc + └── cmd_select.cc +``` + +重设计后的目标结构是在上述基础上调整 commands: + +```text +cpp/tools/commands/ +├── commands.h +├── row_query.cc +├── cmd_ls.cc +├── cmd_schema.cc +├── cmd_meta.cc +├── cmd_stats.cc +├── cmd_head.cc +├── cmd_cat.cc +├── cmd_count.cc +└── cmd_sample.cc +``` + +迁移项: + +- 删除 `cmd_select.cc`。 +- 新增 `cmd_meta.cc`、`cmd_count.cc`、`cmd_sample.cc`。 +- 在 `ParsedArgs` 中新增 `seed`,并在 `cli_args.cc` 解析 `--seed`。 +- 更新 `run_cli.cc` 的命令注册、help 文案和命令校验。 +- 更新 `commands.h` 声明。 +- 保留 `row_query.cc` 作为 `head`、`cat`、`sample` 的共享行读取路径。 +- 保留 formatter 模块;仅在新命令的结果形状需要时复用通用 row/table 输出能力。 +- 不改 storage engine。新增命令全部使用现有 reader 元数据或现有 row query API。 + +不引入第三方参数解析库。当前手写 parser 足以覆盖这个命令面,也保持 C++ 模块的低依赖。 + +## 构建与发布 + +`cpp/CMakeLists.txt` 在工具开启时包含 `cpp/tools/`,构建链接 `libtsfile` 的 `tsfile` +可执行文件。 + +该二进制随 C++ 产物安装。`cpp/examples/` 继续保留示例定位;CLI 放在 `cpp/tools/`, +因为它是面向用户的工具,不是示例代码。 + +## 测试 + +测试放在 `cpp/test/tools/`,使用 Google Test。 + +单元测试覆盖: + +- `cli_args`:命令与参数解析,包括 `--seed`、未知命令、错误参数值、缺失文件参数、 + 命令与参数不匹配。 +- formatter:`csv`、`tsv`、`json`、`table`,覆盖 null、包含分隔符的字符串、引号、 + 换行。 +- 模型检测:存在 table schema 即 table,否则 tree;`--model` 覆盖两者。 +- `meta`:聚合文件级字段,不触发数据页扫描。 +- `count`:基于 `Statistic.count`,不通过 row iterator。 +- `sample`:固定 seed 下输出可复现。 + +端到端测试覆盖: + +- 生成或复用一个小 `.tsfile` fixture。 +- 通过构建出的 `tsfile` 二进制对子进程运行每个命令。 +- 断言退出码、stdout 形状和 stderr 行为。 +- TTY 自适应格式通过单元测试覆盖;子进程测试显式覆盖 `--format`。 + +测试只验证 CLI 行为和真实 reader 路径,不新增 storage engine 行为。 + +## 被拒绝的方案 + +### 保留 `select` 动词 + +拒绝。`select` 让 CLI 更像 SQL,但和 `cat`、`head` 重叠。它真正提供的是投影和过滤, +因此应落到共享参数上。Parquet 风格工具把列选择放在行输出命令上,TsFile 也应如此。 -The reader returns integer error codes; the CLI maps open/corruption codes to -exit `2` and query failures to exit `3`. The stray stdout error print in -`ReadFile::open` is redirected to stderr along the CLI path. +### 把 `count` 折叠进 `stats` 或 `meta` -## Testing +拒绝。`count` 足够常用,且 TsFile 可以从统计量低成本回答。显式保留 `count` 能让这个 +格式优势更容易被用户发现。 -Google Test, under `cpp/test/tools/` mirroring `cpp/src` test conventions. +### 为了完全模仿 Parquet 删除 `ls` -- **Unit:** - - `cli_args` parsing (commands, flags, error cases). - - Each formatter (`csv`, `tsv`, `json`/NDJSON, `table`) against a synthetic - `ResultSet` / `ResultSetMetadata`, including null and quoting edge cases. - - Model detection (table-schema-present ⇒ table; otherwise tree). -- **End-to-end:** in a temp directory, write a small `.tsfile` via the existing - writer (or reuse `cpp/examples/test_cpp.tsfile`), run each command as a - subprocess, and assert both stdout content and exit code. Fixtures are - hermetic (generated under a temp dir, cleaned up). +拒绝。TsFile 不总是单逻辑表。多 device 和多 table 命名空间使 `ls` 成为用户经常需要的 +第一个命令,就像 HDF5 中 `h5ls` 很自然一样。 -## License header +### 现在实现写入或转换命令 -Every new file (`.cc`, `.h`, `CMakeLists.txt`, this `.md`) carries the Apache -License 2.0 header in the comment style appropriate to the file type, per -repository convention. +拒绝。本阶段只读命令风险更低,也正好对应调研结论:TsFile 不是完全没有 CLI,而是动词 +不齐、没有统一分发器、还不能被通用查看器直接看见。 -## Open follow-ups (explicitly deferred, not v1) +## 后续工作 -- Structure-dump verb at parity with Java `TsFileSketchTool`. -- Write / convert verbs (Java `tools` already covers import). -- `--time-format iso`, and richer `select` predicates beyond a time range. -- Optional split into multiple `tsfile-*` binaries (coreutils-style). +- 与 Java `TsFileSketchTool` 对齐的结构 dump 命令。 +- ISO 时间格式化。 +- 超出时间范围和 measurement 投影的复杂谓词。 +- 写入、转换、合并、重写命令。 +- DuckDB、ClickHouse、VisiData reader,让 TsFile 进入多格式查询/查看工具。 +- 如果项目选择通过文件系统路径暴露 TsFile,设计只读 FUSE 命名空间或 TableFS 视图。 From 758d7e55cf5f9f13fc477efec9ee38440cf02d1d Mon Sep 17 00:00:00 2001 From: spricoder Date: Tue, 2 Jun 2026 18:56:52 +0800 Subject: [PATCH 04/41] Add TsFile CLI redesign implementation plan --- .../plans/2026-06-02-tsfile-cli-redesign.md | 1349 +++++++++++++++++ 1 file changed, 1349 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md diff --git a/docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md b/docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md new file mode 100644 index 000000000..27877ca20 --- /dev/null +++ b/docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md @@ -0,0 +1,1349 @@ + + +# TsFile CLI Redesign Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** 将当前 C++ `tsfile` v1 CLI 从 `ls/schema/stats/head/cat/select` 调整为 `ls/schema/meta/stats/head/cat/count/sample`,并让投影、时间范围、limit/offset 作为行输出命令的共享参数工作。 + +**Architecture:** 保留现有 `cpp/tools/` 分层:`cli/` 负责参数解析与分发,`commands/` 负责读 metadata 或 row query,`format/` 负责 `RowWriter` 和 `ResultSet` 输出。新增 metadata/stat helper 复用 `Statistic` 格式化逻辑,新增 sampled result-set writer 复用现有 cell extraction,避免在命令层复制行输出代码。 + +**Tech Stack:** C++11/C++14 兼容代码,CMake `BUILD_TOOLS`,Google Test,现有 `storage::TsFileReader`、`storage::Statistic`、`RowWriter`、`write_result_set`。 + +--- + +## 执行前提 + +- 工作目录:`/Users/zhanghongyin/iotdb/tsfile` +- 执行实现前先确认 `git status --short`,不要 stage `.codegraph/` 或与本计划无关的改动。 +- C++ 验证命令从 `cpp/` 目录运行: + +```bash +bash build.sh -t=Debug +./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:StatTableTest.*:ResultSetSampleTest.* +``` + +## 文件结构 + +现有文件继续保留职责: + +- `cpp/tools/cli/cli_args.h` / `cpp/tools/cli/cli_args.cc`:解析命令、flag、数值参数。 +- `cpp/tools/cli/run_cli.cc`:顶层 usage、命令白名单、命令/flag 组合校验、reader open、分发。 +- `cpp/tools/commands/commands.h`:命令函数和共享 helper 声明。 +- `cpp/tools/commands/row_query.cc`:`head`、`cat`、`sample` 共用的 query 构造。 +- `cpp/tools/format/output_format.*`:`RowWriter` 和标量格式转换。 +- `cpp/tools/format/result_set_format.*`:从 `ResultSet` 抽取行并写出。 +- `cpp/test/tools/*_test.cc`:CLI 单元测试和 in-process E2E 测试。 + +新增文件: + +- `cpp/tools/commands/stat_table.h`:定义 `SeriesStatRow`、`FileSummary`,声明 metadata/stat 收集与统计值格式化 helper。 +- `cpp/tools/commands/stat_table.cc`:实现 `collect_series_stats`、`collect_file_summary`、`statistic_value_cells`,供 `stats`、`count`、`meta` 共用。 +- `cpp/tools/commands/cmd_meta.cc`:实现 `tsfile meta`。 +- `cpp/tools/commands/cmd_count.cc`:实现 `tsfile count`。 +- `cpp/tools/commands/cmd_sample.cc`:实现 `tsfile sample`。 +- `cpp/test/tools/stat_table_test.cc`:直接测试 `Statistic` 值格式化和汇总 helper 的稳定行为。 +- `cpp/test/tools/result_set_sample_test.cc`:测试抽样 writer 的确定性行为。 + +删除文件: + +- `cpp/tools/commands/cmd_select.cc`:`select` 能力并入 `cat/head/sample` 的共享参数。 + +--- + +### Task 1: 命令面、参数解析和 flag 组合校验 + +**Files:** +- Modify: `cpp/tools/cli/cli_args.h` +- Modify: `cpp/tools/cli/cli_args.cc` +- Modify: `cpp/tools/cli/run_cli.cc` +- Modify: `cpp/test/tools/cli_args_test.cc` + +- [ ] **Step 1: 写失败测试,覆盖 `--seed`、新命令和删除 `select`** + +在 `cpp/test/tools/cli_args_test.cc` 末尾追加: + +```cpp +TEST(ParseArgsTest, SeedFlagParsed) { + auto p = tsfile_cli::parse_args( + {"sample", "-m", "s1", "-n", "3", "--seed", "42", "data.tsfile"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "sample"); + EXPECT_EQ(p.limit, 3); + EXPECT_TRUE(p.has_seed); + EXPECT_EQ(p.seed, 42); +} + +TEST(ParseArgsTest, BadSeedValueIsError) { + auto p = tsfile_cli::parse_args( + {"sample", "--seed", "not_a_number", "data.tsfile"}); + EXPECT_FALSE(p.error.empty()); + EXPECT_NE(p.error.find("Invalid --seed"), std::string::npos); +} + +TEST(RunCliTest, SelectIsNoLongerKnownCommand) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"select", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("Unknown command"), std::string::npos); +} + +TEST(RunCliTest, SeedOnCatIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"cat", "--seed", "7", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--seed is only valid for sample"), + std::string::npos); +} + +TEST(RunCliTest, OffsetOnSampleIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"sample", "--offset", "2", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--offset is not valid for sample"), + std::string::npos); +} +``` + +- [ ] **Step 2: 运行测试确认失败** + +Run: + +```bash +cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.SeedFlagParsed:ParseArgsTest.BadSeedValueIsError:RunCliTest.SelectIsNoLongerKnownCommand:RunCliTest.SeedOnCatIsUsageError:RunCliTest.OffsetOnSampleIsUsageError +``` + +Expected: 编译或测试失败,至少包含 `ParsedArgs` 没有 `seed` / `has_seed`,或 `select` 仍是已知命令。 + +- [ ] **Step 3: 在 `ParsedArgs` 中加入 seed 字段** + +在 `cpp/tools/cli/cli_args.h` 的 `ParsedArgs` 内、`has_end` 后加入: + +```cpp + long long seed = 0; + bool has_seed = false; +``` + +- [ ] **Step 4: 解析 `--seed`** + +在 `cpp/tools/cli/cli_args.cc` 的 `parse_args` 循环中,把下面分支放在 `--end` 分支之后、`--model` 分支之前: + +```cpp + } else if (a == "--seed") { + if (!need_value(a, val)) { + return p; + } + if (!parse_ll(val, p.seed)) { + p.error = "Invalid --seed: " + val; + return p; + } + p.has_seed = true; +``` + +- [ ] **Step 5: 更新 `run_cli.cc` 的 usage、白名单和 flag 组合校验** + +在 `cpp/tools/cli/run_cli.cc` 中: + +1. 将 usage 的 Commands 段替换为: + +```cpp + " ls list devices (tree) or tables (table)\n" + " schema per-measurement data type/encoding/compression\n" + " meta file-level summary without data-page scans\n" + " stats per-series statistics\n" + " head first N rows (default 10, use -n)\n" + " cat matching rows of a device/table\n" + " count per-series row counts from statistics\n" + " sample sampled rows (default 10, use -n and --seed)\n" +``` + +2. 将 Options 段替换为: + +```cpp + "Options: -f/--format csv|tsv|json|table, -d/--device, -t/--table,\n" + " -m/--measurements a,b, -n/--limit, --offset, --start,\n" + " --end, --seed, --no-header, --model tree|table,\n" + " -h/--help, --version\n"; +``` + +3. 将 `is_known_command` 的集合替换为: + +```cpp + static const std::set kCmds = { + "ls", "schema", "meta", "stats", + "head", "cat", "count", "sample"}; +``` + +4. 在匿名 namespace 中新增: + +```cpp +bool validate_command_flags(const ParsedArgs& p, std::ostream& err) { + if (p.has_seed && p.command != "sample") { + err << "Error: --seed is only valid for sample\n"; + return false; + } + if (p.command == "sample" && p.offset != 0) { + err << "Error: --offset is not valid for sample\n"; + return false; + } + if (!p.device.empty() && !p.table.empty()) { + err << "Error: use either --device or --table, not both\n"; + return false; + } + if (p.limit < -1) { + err << "Error: --limit must be >= 0\n"; + return false; + } + if (p.offset < 0) { + err << "Error: --offset must be >= 0\n"; + return false; + } + if (p.has_start && p.has_end && p.start > p.end) { + err << "Error: --start must be <= --end\n"; + return false; + } + return true; +} +``` + +5. 在 `if (p.file.empty())` 检查之后、`storage::libtsfile_init();` 之前加入: + +```cpp + if (!validate_command_flags(p, err)) { + print_usage(err); + return kExitUsage; + } +``` + +- [ ] **Step 6: 运行测试确认通过** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.*:RunCliTest.* +``` + +Expected: build succeeds; selected tests pass. + +- [ ] **Step 7: 提交** + +```bash +git add cpp/tools/cli/cli_args.h cpp/tools/cli/cli_args.cc cpp/tools/cli/run_cli.cc cpp/test/tools/cli_args_test.cc +git commit -m "Update tsfile CLI command surface" +``` + +--- + +### Task 2: 统计 helper 与 `stats` 扩展字段 + +**Files:** +- Create: `cpp/tools/commands/stat_table.h` +- Create: `cpp/tools/commands/stat_table.cc` +- Modify: `cpp/tools/commands/cmd_stats.cc` +- Create: `cpp/test/tools/stat_table_test.cc` +- Modify: `cpp/test/tools/command_e2e_test.cc` + +- [ ] **Step 1: 写失败测试,直接覆盖统计值格式化** + +新增 `cpp/test/tools/stat_table_test.cc`: + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/stat_table.h" + +#include + +#include "common/statistic.h" + +TEST(StatTableTest, Int64StatisticCellsContainValueSummaries) { + storage::Int64Statistic st; + st.update(1, static_cast(10)); + st.update(3, static_cast(30)); + tsfile_cli::StatisticCells cells = tsfile_cli::statistic_value_cells(&st); + EXPECT_EQ(cells.values[0], "10"); + EXPECT_EQ(cells.values[1], "30"); + EXPECT_EQ(cells.values[2], "10"); + EXPECT_EQ(cells.values[3], "30"); + EXPECT_EQ(cells.values[4], "40"); + EXPECT_EQ(cells.is_null, std::vector({false, false, false, false, false})); +} + +TEST(StatTableTest, BooleanStatisticLeavesMinMaxNull) { + storage::BooleanStatistic st; + st.update(1, true); + st.update(2, false); + tsfile_cli::StatisticCells cells = tsfile_cli::statistic_value_cells(&st); + EXPECT_TRUE(cells.is_null[0]); + EXPECT_TRUE(cells.is_null[1]); + EXPECT_EQ(cells.values[2], "true"); + EXPECT_EQ(cells.values[3], "false"); + EXPECT_EQ(cells.values[4], "1"); +} +``` + +- [ ] **Step 2: 运行测试确认失败** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=StatTableTest.* +``` + +Expected: build fails because `commands/stat_table.h` does not exist. + +- [ ] **Step 3: 创建 `stat_table.h`** + +新增 `cpp/tools/commands/stat_table.h`: + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_STAT_TABLE_H +#define TSFILE_CLI_STAT_TABLE_H + +#include +#include + +#include "cli/cli_args.h" + +namespace storage { +class Statistic; +class TsFileReader; +} // namespace storage + +namespace tsfile_cli { + +struct StatisticCells { + std::vector values; + std::vector is_null; +}; + +struct SeriesStatRow { + std::string target; + std::string measurement; + long long count = 0; + long long start_time = 0; + long long end_time = 0; + StatisticCells value_cells; +}; + +struct FileSummary { + std::string file; + std::string model; + long long device_count = 0; + long long table_count = 0; + long long series_count = 0; + long long start_time = 0; + long long end_time = 0; + bool has_time_range = false; + long long file_size_bytes = 0; +}; + +StatisticCells statistic_value_cells(storage::Statistic* st); +std::vector collect_series_stats( + const ParsedArgs& args, storage::TsFileReader& reader); +FileSummary collect_file_summary(const ParsedArgs& args, + storage::TsFileReader& reader); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_STAT_TABLE_H +``` + +- [ ] **Step 4: 创建 `stat_table.cc`** + +新增 `cpp/tools/commands/stat_table.cc`,核心实现如下: + +```cpp +#include "commands/stat_table.h" + +#include +#include +#include +#include + +#include "commands/commands.h" +#include "common/statistic.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { +namespace { + +template +std::string value_to_string(T value) { + std::ostringstream ss; + ss << value; + return ss.str(); +} + +std::string bool_to_string(bool value) { return value ? "true" : "false"; } + +std::string string_to_std(const common::String& value) { + return value.to_std_string(); +} + +long long file_size(const std::string& path) { + std::ifstream in(path.c_str(), std::ios::binary | std::ios::ate); + if (!in.good()) { + return 0; + } + return static_cast(in.tellg()); +} + +} // namespace + +StatisticCells statistic_value_cells(storage::Statistic* st) { + StatisticCells cells; + cells.values.assign(5, ""); + cells.is_null.assign(5, true); + if (st == nullptr || st->get_count() == 0) { + return cells; + } + + switch (st->get_type()) { + case common::BOOLEAN: { + auto* s = static_cast(st); + cells.values = {"", "", bool_to_string(s->first_value_), + bool_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {true, true, false, false, false}; + break; + } + case common::INT32: + case common::DATE: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::INT64: + case common::TIMESTAMP: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::FLOAT: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::DOUBLE: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::STRING: { + auto* s = static_cast(st); + cells.values = {string_to_std(s->min_value_), + string_to_std(s->max_value_), + string_to_std(s->first_value_), + string_to_std(s->last_value_), ""}; + cells.is_null = {false, false, false, false, true}; + break; + } + case common::TEXT: { + auto* s = static_cast(st); + cells.values = {"", "", string_to_std(s->first_value_), + string_to_std(s->last_value_), ""}; + cells.is_null = {true, true, false, false, true}; + break; + } + default: + break; + } + return cells; +} + +std::vector collect_series_stats( + const ParsedArgs& args, storage::TsFileReader& reader) { + std::vector rows; + storage::DeviceTimeseriesMetadataMap meta = + reader.get_timeseries_metadata(); + for (auto& kv : meta) { + std::string target = kv.first ? kv.first->get_device_name() : ""; + if (!args.device.empty() && target != args.device) { + continue; + } + if (!args.table.empty() && kv.first && + kv.first->get_table_name() != args.table) { + continue; + } + for (auto& ts : kv.second) { + if (!ts) { + continue; + } + std::string measurement = + ts->get_measurement_name().to_std_string(); + if (!args.measurements.empty() && + std::find(args.measurements.begin(), args.measurements.end(), + measurement) == args.measurements.end()) { + continue; + } + storage::Statistic* st = ts->get_statistic(); + SeriesStatRow row; + row.target = target; + row.measurement = measurement; + if (st != nullptr) { + row.count = st->get_count(); + row.start_time = st->start_time_; + row.end_time = st->end_time_; + row.value_cells = statistic_value_cells(st); + } else { + row.value_cells.values.assign(5, ""); + row.value_cells.is_null.assign(5, true); + } + rows.push_back(row); + } + } + return rows; +} + +FileSummary collect_file_summary(const ParsedArgs& args, + storage::TsFileReader& reader) { + FileSummary s; + s.file = args.file; + s.model = is_table_model(args, reader) ? "table" : "tree"; + s.device_count = + static_cast(reader.get_all_device_ids().size()); + s.table_count = + static_cast(reader.get_all_table_schemas().size()); + s.file_size_bytes = file_size(args.file); + + ParsedArgs all = args; + all.device.clear(); + all.table.clear(); + all.measurements.clear(); + std::vector rows = collect_series_stats(all, reader); + s.series_count = static_cast(rows.size()); + long long min_start = std::numeric_limits::max(); + long long max_end = std::numeric_limits::min(); + for (const SeriesStatRow& row : rows) { + if (row.count <= 0) { + continue; + } + min_start = std::min(min_start, row.start_time); + max_end = std::max(max_end, row.end_time); + s.has_time_range = true; + } + if (s.has_time_range) { + s.start_time = min_start; + s.end_time = max_end; + } + return s; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: 用 helper 改写 `cmd_stats.cc`** + +将 `cpp/tools/commands/cmd_stats.cc` 的命令体改为输出 10 列: + +```cpp +#include "commands/stat_table.h" + +int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w( + out, fmt, + {"target", "measurement", "count", "start_time", "end_time", + "min", "max", "first", "last", "sum"}, + {common::STRING, common::STRING, common::INT64, common::INT64, + common::INT64, common::STRING, common::STRING, common::STRING, + common::STRING, common::STRING}, + args.no_header); + + std::vector rows = collect_series_stats(args, reader); + for (const SeriesStatRow& row : rows) { + std::vector cells = { + row.target, row.measurement, std::to_string(row.count), + std::to_string(row.start_time), std::to_string(row.end_time)}; + cells.insert(cells.end(), row.value_cells.values.begin(), + row.value_cells.values.end()); + + std::vector nulls = {false, false, false, + row.count == 0, row.count == 0}; + nulls.insert(nulls.end(), row.value_cells.is_null.begin(), + row.value_cells.is_null.end()); + w.write(cells, nulls); + } + w.finish(); + return kExitOk; +} +``` + +- [ ] **Step 6: 更新 E2E 断言新 stats 表头和值** + +在 `cpp/test/tools/command_e2e_test.cc` 中,将 `StatsReportsCountAndTimeRange` 的表头断言替换为: + +```cpp + EXPECT_NE(out.str().find( + "target\tmeasurement\tcount\tstart_time\tend_time\tmin\tmax\tfirst\tlast\tsum"), + std::string::npos); + EXPECT_NE(out.str().find("s1\t5\t0\t4\t0\t40\t0\t40\t100"), + std::string::npos); +``` + +- [ ] **Step 7: 运行测试确认通过** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=StatTableTest.*:CliE2E.StatsReportsCountAndTimeRange +``` + +Expected: build succeeds; selected tests pass. + +- [ ] **Step 8: 提交** + +```bash +git add cpp/tools/commands/stat_table.h cpp/tools/commands/stat_table.cc cpp/tools/commands/cmd_stats.cc cpp/test/tools/stat_table_test.cc cpp/test/tools/command_e2e_test.cc +git commit -m "Add tsfile CLI statistic helpers" +``` + +--- + +### Task 3: 实现 `meta` 和 `count` + +**Files:** +- Create: `cpp/tools/commands/cmd_meta.cc` +- Create: `cpp/tools/commands/cmd_count.cc` +- Modify: `cpp/tools/commands/commands.h` +- Modify: `cpp/tools/cli/run_cli.cc` +- Modify: `cpp/test/tools/command_e2e_test.cc` + +- [ ] **Step 1: 写失败 E2E 测试** + +在 `cpp/test/tools/command_e2e_test.cc` 末尾追加: + +```cpp +TEST(CliE2E, MetaReportsFileSummary) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"meta", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_TRUE(err.str().empty()); + EXPECT_NE(out.str().find( + "file\tmodel\tversion\tdevice_count\ttable_count\tseries_count\tstart_time\tend_time\tbloom_filter\tfile_size_bytes"), + std::string::npos); + EXPECT_NE(out.str().find("\ttable\t"), std::string::npos); + EXPECT_NE(out.str().find("\t1\t"), std::string::npos); +} + +TEST(CliE2E, CountReportsSeriesCountsAndTotal) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"count", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_TRUE(err.str().empty()); + EXPECT_NE(out.str().find("target\tmeasurement\tcount"), std::string::npos); + EXPECT_NE(out.str().find("table1\ts1\t5"), std::string::npos); + EXPECT_NE(out.str().find("total\t\t"), std::string::npos); +} +``` + +- [ ] **Step 2: 运行测试确认失败** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary:CliE2E.CountReportsSeriesCountsAndTotal +``` + +Expected: build or tests fail because `meta` and `count` are not dispatched. + +- [ ] **Step 3: 更新命令声明** + +在 `cpp/tools/commands/commands.h` 中,删除 `cmd_select` 声明,并在 `cmd_schema` 与 `cmd_stats` 附近加入: + +```cpp +int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +``` + +- [ ] **Step 4: 新增 `cmd_meta.cc`** + +创建 `cpp/tools/commands/cmd_meta.cc`: + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/commands.h" + +#include "cli/exit_codes.h" +#include "commands/stat_table.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w( + out, fmt, + {"file", "model", "version", "device_count", "table_count", + "series_count", "start_time", "end_time", "bloom_filter", + "file_size_bytes"}, + {common::STRING, common::STRING, common::STRING, common::INT64, + common::INT64, common::INT64, common::INT64, common::INT64, + common::STRING, common::INT64}, + args.no_header); + + FileSummary s = collect_file_summary(args, reader); + w.write({s.file, + s.model, + "", + std::to_string(s.device_count), + std::to_string(s.table_count), + std::to_string(s.series_count), + s.has_time_range ? std::to_string(s.start_time) : "", + s.has_time_range ? std::to_string(s.end_time) : "", + "", + std::to_string(s.file_size_bytes)}, + {false, false, true, false, false, false, + !s.has_time_range, !s.has_time_range, true, false}); + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: 新增 `cmd_count.cc`** + +创建 `cpp/tools/commands/cmd_count.cc`: + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/commands.h" + +#include "cli/exit_codes.h" +#include "commands/stat_table.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w(out, fmt, {"target", "measurement", "count"}, + {common::STRING, common::STRING, common::INT64}, + args.no_header); + + long long total = 0; + std::vector rows = collect_series_stats(args, reader); + for (const SeriesStatRow& row : rows) { + total += row.count; + w.write({row.target, row.measurement, std::to_string(row.count)}, + {false, false, false}); + } + w.write({"total", "", std::to_string(total)}, {false, true, false}); + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 6: 更新分发** + +在 `cpp/tools/cli/run_cli.cc` 的命令分发链中: + +```cpp + } else if (p.command == "schema") { + code = cmd_schema(p, reader, fmt, out, err); + } else if (p.command == "meta") { + code = cmd_meta(p, reader, fmt, out, err); + } else if (p.command == "stats") { + code = cmd_stats(p, reader, fmt, out, err); +``` + +并在 `cat` 后加入: + +```cpp + } else if (p.command == "count") { + code = cmd_count(p, reader, fmt, out, err); +``` + +- [ ] **Step 7: 运行测试确认通过** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary:CliE2E.CountReportsSeriesCountsAndTotal +``` + +Expected: build succeeds; selected tests pass. + +- [ ] **Step 8: 提交** + +```bash +git add cpp/tools/commands/cmd_meta.cc cpp/tools/commands/cmd_count.cc cpp/tools/commands/commands.h cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc +git commit -m "Add tsfile meta and count commands" +``` + +--- + +### Task 4: 实现 deterministic `sample` + +**Files:** +- Modify: `cpp/tools/format/result_set_format.h` +- Modify: `cpp/tools/format/result_set_format.cc` +- Create: `cpp/tools/commands/cmd_sample.cc` +- Modify: `cpp/tools/commands/commands.h` +- Modify: `cpp/tools/cli/run_cli.cc` +- Create: `cpp/test/tools/result_set_sample_test.cc` +- Modify: `cpp/test/tools/command_e2e_test.cc` + +- [ ] **Step 1: 写失败 E2E 测试** + +在 `cpp/test/tools/command_e2e_test.cc` 末尾追加: + +```cpp +TEST(CliE2E, SampleIsReproducibleWithSeed) { + Fixture f; + std::ostringstream out1; + std::ostringstream err1; + std::ostringstream out2; + std::ostringstream err2; + + int code1 = tsfile_cli::run_cli({"sample", "-m", "s1", "-n", "3", + "--seed", "7", "-f", "tsv", f.path}, + out1, err1); + int code2 = tsfile_cli::run_cli({"sample", "-m", "s1", "-n", "3", + "--seed", "7", "-f", "tsv", f.path}, + out2, err2); + + EXPECT_EQ(code1, 0); + EXPECT_EQ(code2, 0); + EXPECT_TRUE(err1.str().empty()); + EXPECT_TRUE(err2.str().empty()); + EXPECT_EQ(out1.str(), out2.str()); + EXPECT_EQ(count_lines(out1.str()), 4u); + EXPECT_NE(out1.str().find("time\ts1\n"), std::string::npos); +} +``` + +- [ ] **Step 2: 运行测试确认失败** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed +``` + +Expected: build or test fails because `sample` is not dispatched. + +- [ ] **Step 3: 声明 sampled writer** + +在 `cpp/tools/format/result_set_format.h` 中追加: + +```cpp +int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, + bool no_header, std::ostream& out, + long long limit, unsigned long long seed); +``` + +- [ ] **Step 4: 实现 sampled writer** + +在 `cpp/tools/format/result_set_format.cc` 中新增 include: + +```cpp +#include +``` + +在 `write_result_set` 之后新增: + +```cpp +namespace { + +struct BufferedRow { + std::vector cells; + std::vector nulls; +}; + +BufferedRow read_current_row(storage::ResultSet* rs, + const std::vector& types) { + BufferedRow row; + const uint32_t ncol = static_cast(types.size()); + row.cells.assign(ncol, ""); + row.nulls.assign(ncol, false); + for (uint32_t i = 1; i <= ncol; ++i) { + if (rs->is_null(i)) { + row.nulls[i - 1] = true; + } else { + row.cells[i - 1] = cell_to_string(rs, i, types[i - 1]); + } + } + return row; +} + +} // namespace + +int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, + bool no_header, std::ostream& out, + long long limit, unsigned long long seed) { + if (limit < 0) { + limit = 10; + } + auto meta = rs->get_metadata(); + const uint32_t ncol = meta->get_column_count(); + std::vector header; + std::vector types; + header.reserve(ncol); + types.reserve(ncol); + for (uint32_t i = 1; i <= ncol; ++i) { + header.push_back(meta->get_column_name(i)); + types.push_back(meta->get_column_type(i)); + } + + std::vector reservoir; + reservoir.reserve(static_cast(limit)); + std::mt19937_64 rng(seed); + bool has_next = false; + int code = common::E_OK; + long long seen = 0; + while ((code = rs->next(has_next)) == common::E_OK && has_next) { + BufferedRow row = read_current_row(rs, types); + if (limit == 0) { + ++seen; + continue; + } + if (static_cast(reservoir.size()) < limit) { + reservoir.push_back(row); + } else { + std::uniform_int_distribution dist(0, seen); + long long idx = dist(rng); + if (idx < limit) { + reservoir[static_cast(idx)] = row; + } + } + ++seen; + } + + RowWriter writer(out, fmt, header, types, no_header); + for (const BufferedRow& row : reservoir) { + writer.write(row.cells, row.nulls); + } + writer.finish(); + return code; +} +``` + +- [ ] **Step 5: 新增 `cmd_sample.cc`** + +创建 `cpp/tools/commands/cmd_sample.cc`: + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/commands.h" + +#include +#include +#include +#include + +#include "cli/exit_codes.h" +#include "common/device_id.h" +#include "common/schema.h" +#include "format/result_set_format.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + const int64_t start = args.has_start ? static_cast(args.start) + : std::numeric_limits::min(); + const int64_t end = args.has_end ? static_cast(args.end) + : std::numeric_limits::max(); + storage::ResultSet* rs = nullptr; + int qret = 0; + + if (is_table_model(args, reader)) { + std::string table_name = args.table; + if (table_name.empty()) { + auto schemas = reader.get_all_table_schemas(); + if (schemas.empty() || !schemas[0]) { + err << "Error: no table found in file\n"; + return kExitRuntime; + } + table_name = schemas[0]->get_table_name(); + } + std::vector cols = args.measurements; + if (cols.empty()) { + auto ts = reader.get_table_schema(table_name); + if (ts) { + cols = ts->get_measurement_names(); + } + } + qret = reader.query(table_name, cols, start, end, rs); + } else { + std::vector devices; + if (!args.device.empty()) { + devices.push_back(args.device); + } else { + for (auto& d : reader.get_all_device_ids()) { + if (d) { + devices.push_back(d->get_device_name()); + } + } + } + std::vector paths; + for (const std::string& dev : devices) { + std::vector ms = args.measurements; + if (ms.empty()) { + auto did = std::make_shared(dev); + std::vector sch; + if (reader.get_timeseries_schema(did, sch) == 0) { + for (auto& m : sch) { + ms.push_back(m.measurement_name_); + } + } + } + for (const std::string& m : ms) { + paths.push_back(dev + "." + m); + } + } + if (paths.empty()) { + err << "Error: no time series found\n"; + return kExitRuntime; + } + qret = reader.query(paths, start, end, rs); + } + + if (qret != 0 || rs == nullptr) { + err << "Error: query failed (code " << qret << ")\n"; + if (rs != nullptr) { + reader.destroy_query_data_set(rs); + } + return kExitRuntime; + } + + const long long limit = args.limit < 0 ? 10 : args.limit; + const unsigned long long seed = + args.has_seed ? static_cast(args.seed) : 0ULL; + int wret = write_result_set_sampled(rs, fmt, args.no_header, out, limit, + seed); + reader.destroy_query_data_set(rs); + return wret == 0 ? kExitOk : kExitRuntime; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 6: 更新声明和分发** + +在 `cpp/tools/commands/commands.h` 加入: + +```cpp +int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +``` + +在 `cpp/tools/cli/run_cli.cc` 的分发链中 `count` 后加入: + +```cpp + } else if (p.command == "sample") { + code = cmd_sample(p, reader, fmt, out, err); +``` + +- [ ] **Step 7: 运行测试确认通过** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed +``` + +Expected: build succeeds; selected test passes. + +- [ ] **Step 8: 提交** + +```bash +git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc cpp/tools/commands/cmd_sample.cc cpp/tools/commands/commands.h cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc +git commit -m "Add deterministic tsfile sample command" +``` + +--- + +### Task 5: 移除 `select` 并把时间范围测试迁到 `cat` + +**Files:** +- Delete: `cpp/tools/commands/cmd_select.cc` +- Modify: `cpp/test/tools/cli_args_test.cc` +- Modify: `cpp/test/tools/command_e2e_test.cc` + +- [ ] **Step 1: 更新解析测试里的旧命令名** + +在 `cpp/test/tools/cli_args_test.cc` 中,将 `MeasurementsSplitOnComma` 的输入从: + +```cpp +auto p = + tsfile_cli::parse_args({"select", "-m", "s1,s2,s3", "data.tsfile"}); +``` + +改为: + +```cpp +auto p = + tsfile_cli::parse_args({"cat", "-m", "s1,s2,s3", "data.tsfile"}); +``` + +- [ ] **Step 2: 将 `select` E2E 改为 `cat`** + +在 `cpp/test/tools/command_e2e_test.cc` 中,把 `SelectWithTimeRange` 改名为 `CatWithTimeRange`,命令从 `select` 改为 `cat`: + +```cpp +TEST(CliE2E, CatWithTimeRange) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "2", + "--end", "3", "-f", "tsv", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); +} +``` + +把 `SelectJsonIsNdjson` 改名为 `CatJsonIsNdjson`,命令从 `select` 改为 `cat`: + +```cpp +TEST(CliE2E, CatJsonIsNdjson) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "0", + "--end", "0", "-f", "json", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); +} +``` + +- [ ] **Step 3: 删除 `cmd_select.cc`** + +Run: + +```bash +rm cpp/tools/commands/cmd_select.cc +``` + +- [ ] **Step 4: 运行测试确认通过** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.MeasurementsSplitOnComma:RunCliTest.SelectIsNoLongerKnownCommand:CliE2E.CatWithTimeRange:CliE2E.CatJsonIsNdjson +``` + +Expected: build succeeds; selected tests pass. + +- [ ] **Step 5: 提交** + +```bash +git add cpp/test/tools/cli_args_test.cc cpp/test/tools/command_e2e_test.cc cpp/tools/commands/cmd_select.cc +git commit -m "Remove tsfile select command" +``` + +--- + +### Task 6: 全量验证、help 文案快照和最终提交检查 + +**Files:** +- Modify: `docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md` only if execution notes need correction during implementation. + +- [ ] **Step 1: 跑完整 CLI 相关测试** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* +``` + +Expected: build succeeds; selected tests pass. + +- [ ] **Step 2: 跑完整 C++ 测试可执行文件** + +Run: + +```bash +cd cpp && ./build/Debug/lib/TsFile_Test +``` + +Expected: all tests pass. If unrelated existing tests fail, capture the exact failing test names and output before deciding whether to narrow verification. + +- [ ] **Step 3: 手动检查 CLI help 不再出现 `select`** + +Run: + +```bash +cd cpp && ./build/Debug/bin/tsfile --help +``` + +Expected: stdout contains `meta`, `count`, `sample`; stdout does not contain `select`. + +- [ ] **Step 4: 检查 whitespace 和暂存范围** + +Run: + +```bash +git diff --check +git status --short +``` + +Expected: `git diff --check` exits 0. `git status --short` shows only this CLI redesign work and any pre-existing unrelated files remain unstaged. + +- [ ] **Step 5: 最终提交** + +如果 Task 6 只产生测试/文档微调,提交它们: + +```bash +git add docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md +git commit -m "Document tsfile CLI redesign execution notes" +``` + +如果 Task 6 没有产生文件改动,不创建空提交。 + +## 覆盖检查 + +- `select` 删除:Task 1、Task 5。 +- `meta`:Task 3。 +- `count`:Task 3。 +- `sample` 与 `--seed`:Task 1、Task 4。 +- `stats` 扩展到 min/max/first/last/sum:Task 2。 +- 共享参数投影、时间范围、limit/offset:现有 `row_query.cc` 保留,Task 5 用 `cat` E2E 覆盖时间范围。 +- 输出格式与 stdout/stderr:现有 formatter 测试保留,Task 6 跑完整相关测试。 +- 构建、安装和 CMake glob:现有 `cpp/tools/CMakeLists.txt` 使用 `GLOB_RECURSE`,新增 `.cc` 自动纳入,Task 6 通过 build 验证。 From a392a56fc3b2be7382ca3e6c1cdd3335c67233fd Mon Sep 17 00:00:00 2001 From: spricoder Date: Tue, 2 Jun 2026 19:12:00 +0800 Subject: [PATCH 05/41] Update tsfile CLI command surface --- cpp/test/tools/cli_args_test.cc | 159 ++++++++++++++++++++++++++ cpp/tools/cli/cli_args.cc | 191 ++++++++++++++++++++++++++++++++ cpp/tools/cli/cli_args.h | 57 ++++++++++ cpp/tools/cli/run_cli.cc | 179 ++++++++++++++++++++++++++++++ 4 files changed, 586 insertions(+) create mode 100644 cpp/test/tools/cli_args_test.cc create mode 100644 cpp/tools/cli/cli_args.cc create mode 100644 cpp/tools/cli/cli_args.h create mode 100644 cpp/tools/cli/run_cli.cc diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc new file mode 100644 index 000000000..103e71996 --- /dev/null +++ b/cpp/test/tools/cli_args_test.cc @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "cli/cli_args.h" + +#include + +#include + +#include "cli/run_cli.h" + +TEST(RunCliTest, VersionFlagPrintsVersionAndReturnsOk) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"--version"}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE(out.str().find("tsfile"), std::string::npos); + EXPECT_TRUE(err.str().empty()); +} + +TEST(RunCliTest, NoArgsPrintsUsageToErrAndReturnsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("Usage"), std::string::npos); +} + +TEST(RunCliTest, UnknownCommandIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"frobnicate", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("Unknown command"), std::string::npos); +} + +TEST(ParseArgsTest, CommandAndFilePositional) { + auto p = tsfile_cli::parse_args({"ls", "data.tsfile"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "ls"); + EXPECT_EQ(p.file, "data.tsfile"); +} + +TEST(ParseArgsTest, FormatFlagParsed) { + auto p = tsfile_cli::parse_args({"cat", "-f", "json", "data.tsfile"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.format, tsfile_cli::ParsedArgs::Format::kJson); +} + +TEST(ParseArgsTest, MeasurementsSplitOnComma) { + auto p = + tsfile_cli::parse_args({"select", "-m", "s1,s2,s3", "data.tsfile"}); + ASSERT_EQ(p.measurements.size(), 3u); + EXPECT_EQ(p.measurements[1], "s2"); +} + +TEST(ParseArgsTest, LimitOffsetAndTimeRange) { + auto p = + tsfile_cli::parse_args({"head", "-n", "5", "--offset", "2", "--start", + "100", "--end", "200", "data.tsfile"}); + EXPECT_EQ(p.limit, 5); + EXPECT_EQ(p.offset, 2); + EXPECT_TRUE(p.has_start); + EXPECT_EQ(p.start, 100); + EXPECT_TRUE(p.has_end); + EXPECT_EQ(p.end, 200); +} + +TEST(ParseArgsTest, UnknownFlagIsError) { + auto p = tsfile_cli::parse_args({"ls", "--bogus", "data.tsfile"}); + EXPECT_FALSE(p.error.empty()); +} + +TEST(ParseArgsTest, BadFormatValueIsError) { + auto p = tsfile_cli::parse_args({"cat", "-f", "yaml", "data.tsfile"}); + EXPECT_FALSE(p.error.empty()); +} + +TEST(ParseArgsTest, MissingFileIsAllowedAtParseTime) { + auto p = tsfile_cli::parse_args({"ls"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "ls"); + EXPECT_TRUE(p.file.empty()); +} + +TEST(ParseArgsTest, SeedFlagParsed) { + auto p = tsfile_cli::parse_args( + {"sample", "-m", "s1", "-n", "3", "--seed", "42", "data.tsfile"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "sample"); + EXPECT_EQ(p.limit, 3); + EXPECT_TRUE(p.has_seed); + EXPECT_EQ(p.seed, 42); +} + +TEST(ParseArgsTest, BadSeedValueIsError) { + auto p = tsfile_cli::parse_args( + {"sample", "--seed", "not_a_number", "data.tsfile"}); + EXPECT_FALSE(p.error.empty()); + EXPECT_NE(p.error.find("Invalid --seed"), std::string::npos); +} + +TEST(RunCliTest, SelectIsNoLongerKnownCommand) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"select", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("Unknown command"), std::string::npos); +} + +TEST(RunCliTest, SeedOnCatIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = + tsfile_cli::run_cli({"cat", "--seed", "7", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--seed is only valid for sample"), + std::string::npos); +} + +TEST(RunCliTest, OffsetOnSampleIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = + tsfile_cli::run_cli({"sample", "--offset", "2", "x.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--offset is not valid for sample"), + std::string::npos); +} + +TEST(RunCliTest, NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen) { + for (const char* command : {"meta", "count", "sample"}) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {command, "definitely_missing.tsfile"}, out, err); + EXPECT_EQ(code, 1) << command; + EXPECT_NE(err.str().find("command not implemented yet"), + std::string::npos) + << command; + EXPECT_NE(err.str().find(command), std::string::npos) << command; + } +} diff --git a/cpp/tools/cli/cli_args.cc b/cpp/tools/cli/cli_args.cc new file mode 100644 index 000000000..15a587491 --- /dev/null +++ b/cpp/tools/cli/cli_args.cc @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "cli/cli_args.h" + +#include +#include + +namespace tsfile_cli { +namespace { + +std::vector split_csv(const std::string& s) { + std::vector out; + std::string item; + std::istringstream iss(s); + while (std::getline(iss, item, ',')) { + if (!item.empty()) { + out.push_back(item); + } + } + return out; +} + +bool parse_ll(const std::string& s, long long& out) { + if (s.empty()) { + return false; + } + char* endp = nullptr; + long long v = std::strtoll(s.c_str(), &endp, 10); + if (endp == nullptr || *endp != '\0') { + return false; + } + out = v; + return true; +} + +bool parse_format(const std::string& s, ParsedArgs::Format& out) { + if (s == "csv") { + out = ParsedArgs::Format::kCsv; + } else if (s == "tsv") { + out = ParsedArgs::Format::kTsv; + } else if (s == "json") { + out = ParsedArgs::Format::kJson; + } else if (s == "table") { + out = ParsedArgs::Format::kTable; + } else { + return false; + } + return true; +} + +} // namespace + +ParsedArgs parse_args(const std::vector& args) { + ParsedArgs p; + if (args.empty()) { + return p; + } + p.command = args[0]; + if (p.command == "--version") { + p.version = true; + } + if (p.command == "--help" || p.command == "-h") { + p.help = true; + } + + size_t i = 1; + auto need_value = [&](const std::string& flag, std::string& dst) -> bool { + if (i + 1 >= args.size()) { + p.error = "Missing value for " + flag; + return false; + } + dst = args[++i]; + return true; + }; + + for (; i < args.size(); ++i) { + const std::string& a = args[i]; + std::string val; + if (a == "-f" || a == "--format") { + if (!need_value(a, val)) { + return p; + } + if (!parse_format(val, p.format)) { + p.error = + "Invalid format: " + val + " (use csv|tsv|json|table)"; + return p; + } + } else if (a == "-d" || a == "--device") { + if (!need_value(a, p.device)) { + return p; + } + } else if (a == "-t" || a == "--table") { + if (!need_value(a, p.table)) { + return p; + } + } else if (a == "-m" || a == "--measurements") { + if (!need_value(a, val)) { + return p; + } + p.measurements = split_csv(val); + } else if (a == "-n" || a == "--limit") { + if (!need_value(a, val)) { + return p; + } + if (!parse_ll(val, p.limit)) { + p.error = "Invalid --limit: " + val; + return p; + } + } else if (a == "--offset") { + if (!need_value(a, val)) { + return p; + } + if (!parse_ll(val, p.offset)) { + p.error = "Invalid --offset: " + val; + return p; + } + } else if (a == "--start") { + if (!need_value(a, val)) { + return p; + } + if (!parse_ll(val, p.start)) { + p.error = "Invalid --start: " + val; + return p; + } + p.has_start = true; + } else if (a == "--end") { + if (!need_value(a, val)) { + return p; + } + if (!parse_ll(val, p.end)) { + p.error = "Invalid --end: " + val; + return p; + } + p.has_end = true; + } else if (a == "--seed") { + if (!need_value(a, val)) { + return p; + } + if (!parse_ll(val, p.seed)) { + p.error = "Invalid --seed: " + val; + return p; + } + p.has_seed = true; + } else if (a == "--model") { + if (!need_value(a, val)) { + return p; + } + if (val != "tree" && val != "table") { + p.error = "Invalid --model: " + val + " (use tree|table)"; + return p; + } + p.model = val; + } else if (a == "--no-header") { + p.no_header = true; + } else if (a == "-h" || a == "--help") { + p.help = true; + } else if (a == "--version") { + p.version = true; + } else if (!a.empty() && a[0] == '-') { + p.error = "Unknown flag: " + a; + return p; + } else { + if (p.file.empty()) { + p.file = a; + } else { + p.error = "Unexpected argument: " + a; + return p; + } + } + } + return p; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/cli/cli_args.h b/cpp/tools/cli/cli_args.h new file mode 100644 index 000000000..e979d7391 --- /dev/null +++ b/cpp/tools/cli/cli_args.h @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_CLI_ARGS_H +#define TSFILE_CLI_CLI_ARGS_H + +#include +#include +#include + +namespace tsfile_cli { + +struct ParsedArgs { + enum class Format { kAuto, kCsv, kTsv, kJson, kTable }; + + std::string command; + std::string file; + std::string device; + std::string table; + std::vector measurements; + long long limit = -1; + long long offset = 0; + long long start = LLONG_MIN; + long long end = LLONG_MAX; + bool has_start = false; + bool has_end = false; + long long seed = 0; + bool has_seed = false; + Format format = Format::kAuto; + bool no_header = false; + std::string model; + bool help = false; + bool version = false; + std::string error; +}; + +ParsedArgs parse_args(const std::vector& args); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_CLI_ARGS_H diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc new file mode 100644 index 000000000..def598f3e --- /dev/null +++ b/cpp/tools/cli/run_cli.cc @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "cli/run_cli.h" + +#include +#include + +#include "cli/cli_args.h" +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "format/output_format.h" +#include "reader/tsfile_reader.h" + +#ifdef _WIN32 +#include +#define TSFILE_ISATTY _isatty +#define TSFILE_FILENO _fileno +#else +#include +#define TSFILE_ISATTY isatty +#define TSFILE_FILENO fileno +#endif + +#ifndef TSFILE_CLI_VERSION +#define TSFILE_CLI_VERSION "unknown" +#endif + +namespace tsfile_cli { +namespace { + +void print_usage(std::ostream& os) { + os << "Usage: tsfile [options] \n" + "Commands:\n" + " ls list devices (tree) or tables (table)\n" + " schema per-measurement data type/encoding/compression\n" + " meta file metadata summary\n" + " stats per-series row count and time range\n" + " head first N rows (use -n)\n" + " cat all rows of a device/table\n" + " count row count\n" + " sample deterministic sample rows (use -n and --seed)\n" + "Options: -f/--format csv|tsv|json|table, -d/--device, -t/--table,\n" + " -m/--measurements a,b, -n/--limit, --offset, --seed,\n" + " --start, --end,\n" + " --no-header, --model tree|table, -h/--help, --version\n"; +} + +bool is_known_command(const std::string& c) { + static const std::set kCmds = { + "ls", "schema", "meta", "stats", + "head", "cat", "count", "sample"}; + return kCmds.count(c) != 0; +} + +bool is_unimplemented_command(const std::string& c) { + static const std::set kCmds = {"meta", "count", "sample"}; + return kCmds.count(c) != 0; +} + +bool validate_command_flags(const ParsedArgs& p, std::ostream& err) { + if (p.has_seed && p.command != "sample") { + err << "Error: --seed is only valid for sample\n"; + return false; + } + if (p.command == "sample" && p.offset != 0) { + err << "Error: --offset is not valid for sample\n"; + return false; + } + if (!p.device.empty() && !p.table.empty()) { + err << "Error: --device and --table cannot be used together\n"; + return false; + } + if (p.limit < -1) { + err << "Error: --limit must be >= -1\n"; + return false; + } + if (p.offset < 0) { + err << "Error: --offset must be >= 0\n"; + return false; + } + if (p.has_start && p.has_end && p.start > p.end) { + err << "Error: --start must be <= --end\n"; + return false; + } + return true; +} + +} // namespace + +int run_cli(const std::vector& args, std::ostream& out, + std::ostream& err) { + ParsedArgs p = parse_args(args); + + if (p.version) { + out << "tsfile (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; + return kExitOk; + } + if (args.empty()) { + print_usage(err); + return kExitUsage; + } + if (p.command == "help" || p.command == "--help" || p.command == "-h" || + (p.help && p.file.empty())) { + print_usage(out); + return kExitOk; + } + if (!p.error.empty()) { + err << "Error: " << p.error << "\n"; + print_usage(err); + return kExitUsage; + } + if (!is_known_command(p.command)) { + err << "Unknown command: " << p.command << "\n"; + print_usage(err); + return kExitUsage; + } + if (p.file.empty()) { + err << "Error: missing argument\n"; + return kExitUsage; + } + if (!validate_command_flags(p, err)) { + print_usage(err); + return kExitUsage; + } + if (is_unimplemented_command(p.command)) { + err << "Error: command not implemented yet: " << p.command << "\n"; + print_usage(err); + return kExitUsage; + } + + storage::libtsfile_init(); + storage::TsFileReader reader; + int open_ret = reader.open(p.file); + if (open_ret != 0) { + err << "Error: cannot open or corrupted file: " << p.file << "\n"; + return kExitFile; + } + + bool stdout_tty = TSFILE_ISATTY(TSFILE_FILENO(stdout)) != 0; + OutputFormat fmt = resolve_format(p.format, stdout_tty); + + int code; + if (p.command == "ls") { + code = cmd_ls(p, reader, fmt, out, err); + } else if (p.command == "schema") { + code = cmd_schema(p, reader, fmt, out, err); + } else if (p.command == "stats") { + code = cmd_stats(p, reader, fmt, out, err); + } else if (p.command == "head") { + code = cmd_head(p, reader, fmt, out, err); + } else if (p.command == "cat") { + code = cmd_cat(p, reader, fmt, out, err); + } else { + err << "Unknown command: " << p.command << "\n"; + code = kExitUsage; + } + + reader.close(); + return code; +} + +} // namespace tsfile_cli From 45b6aa36ecf5a785689bee26b9ac3897031b88de Mon Sep 17 00:00:00 2001 From: spricoder Date: Tue, 2 Jun 2026 19:52:43 +0800 Subject: [PATCH 06/41] Route ReadFile::open errors to stderr --- cpp/src/file/read_file.cc | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/cpp/src/file/read_file.cc b/cpp/src/file/read_file.cc index dd1c42dad..9dc41dbf6 100644 --- a/cpp/src/file/read_file.cc +++ b/cpp/src/file/read_file.cc @@ -21,6 +21,8 @@ #include #include + +#include #ifdef _WIN32 #include #include @@ -49,9 +51,9 @@ int ReadFile::open(const std::string& file_path) { file_path_ = file_path; fd_ = ::open(file_path_.c_str(), O_RDONLY); if (fd_ < 0) { - std::cout << "open file " << file_path << " error :" << fd_ + std::cerr << "open file " << file_path << " error :" << fd_ << std::endl; - std::cout << "open error" << errno << " " << strerror(errno) + std::cerr << "open error" << errno << " " << strerror(errno) << std::endl; return E_FILE_OPEN_ERR; } From 935528823c9f7f15c50e5c3435dc4cdead5fd7ed Mon Sep 17 00:00:00 2001 From: spricoder Date: Tue, 2 Jun 2026 19:53:31 +0800 Subject: [PATCH 07/41] Add tsfile CLI commands, formatters, build, and tests --- cpp/CMakeLists.txt | 7 +- cpp/test/CMakeLists.txt | 36 ++- cpp/test/tools/cli_args_test.cc | 3 +- cpp/test/tools/cli_test_util.h | 83 +++++++ cpp/test/tools/command_e2e_test.cc | 155 +++++++++++++ cpp/test/tools/output_format_test.cc | 123 ++++++++++ cpp/tools/CMakeLists.txt | 47 ++++ cpp/tools/cli/exit_codes.h | 32 +++ cpp/tools/cli/run_cli.h | 34 +++ cpp/tools/commands/cmd_cat.cc | 29 +++ cpp/tools/commands/cmd_head.cc | 30 +++ cpp/tools/commands/cmd_ls.cc | 64 +++++ cpp/tools/commands/cmd_schema.cc | 124 ++++++++++ cpp/tools/commands/cmd_stats.cc | 75 ++++++ cpp/tools/commands/commands.h | 53 +++++ cpp/tools/commands/row_query.cc | 111 +++++++++ cpp/tools/format/output_format.cc | 321 ++++++++++++++++++++++++++ cpp/tools/format/output_format.h | 69 ++++++ cpp/tools/format/result_set_format.cc | 110 +++++++++ cpp/tools/format/result_set_format.h | 41 ++++ cpp/tools/tools_main.cc | 29 +++ 21 files changed, 1572 insertions(+), 4 deletions(-) create mode 100644 cpp/test/tools/cli_test_util.h create mode 100644 cpp/test/tools/command_e2e_test.cc create mode 100644 cpp/test/tools/output_format_test.cc create mode 100644 cpp/tools/CMakeLists.txt create mode 100644 cpp/tools/cli/exit_codes.h create mode 100644 cpp/tools/cli/run_cli.h create mode 100644 cpp/tools/commands/cmd_cat.cc create mode 100644 cpp/tools/commands/cmd_head.cc create mode 100644 cpp/tools/commands/cmd_ls.cc create mode 100644 cpp/tools/commands/cmd_schema.cc create mode 100644 cpp/tools/commands/cmd_stats.cc create mode 100644 cpp/tools/commands/commands.h create mode 100644 cpp/tools/commands/row_query.cc create mode 100644 cpp/tools/format/output_format.cc create mode 100644 cpp/tools/format/output_format.h create mode 100644 cpp/tools/format/result_set_format.cc create mode 100644 cpp/tools/format/result_set_format.h create mode 100644 cpp/tools/tools_main.cc diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index ba2c0c921..9bc892da8 100755 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -171,6 +171,9 @@ endif () option(BUILD_TEST "Build tests" ON) message("cmake using: BUILD_TEST=${BUILD_TEST}") +option(BUILD_TOOLS "Build the tsfile command-line tools" ON) +message("cmake using: BUILD_TOOLS=${BUILD_TOOLS}") + option(ENABLE_ANTLR4 "Enable ANTLR4 runtime" ON) message("cmake using: ENABLE_ANTLR4=${ENABLE_ANTLR4}") @@ -244,6 +247,9 @@ add_subdirectory(third_party) set(CMAKE_CXX_FLAGS "${SAVED_CXX_FLAGS}") add_subdirectory(src) +if (BUILD_TOOLS) + add_subdirectory(tools) +endif () if (BUILD_TEST) add_subdirectory(test) if (TESTS_ENABLED) @@ -254,4 +260,3 @@ else() endif () add_subdirectory(examples) - diff --git a/cpp/test/CMakeLists.txt b/cpp/test/CMakeLists.txt index f5d084f8f..29c2dd5b6 100644 --- a/cpp/test/CMakeLists.txt +++ b/cpp/test/CMakeLists.txt @@ -63,6 +63,26 @@ if (${DOWNLOADED}) ) set(gtest_force_shared_crt ON CACHE BOOL "" FORCE) FetchContent_MakeAvailable(googletest) + # AppleClang searches /usr/local/include before CMake's generated -isystem + # paths. Force the vendored GTest headers ahead of any system installation. + foreach (GTEST_TARGET gtest gtest_main gmock gmock_main) + if (TARGET ${GTEST_TARGET}) + set_target_properties(${GTEST_TARGET} PROPERTIES SYSTEM OFF) + target_include_directories(${GTEST_TARGET} BEFORE PRIVATE + ${googletest_SOURCE_DIR}/googletest/include + ${googletest_SOURCE_DIR}/googletest + ${googletest_SOURCE_DIR}/googlemock/include + ${googletest_SOURCE_DIR}/googlemock) + if (APPLE AND NOT MSVC) + target_compile_options(${GTEST_TARGET} BEFORE PRIVATE + -iquote${googletest_SOURCE_DIR}/googletest/include + -iquote${googletest_SOURCE_DIR}/googletest + -I${googletest_SOURCE_DIR}/googletest/include + -I${googletest_SOURCE_DIR}/googletest + -std=c++14) + endif () + endif () + endforeach () set(TESTS_ENABLED ON PARENT_SCOPE) else () message(WARNING "Failed to download googletest from all provided URLs, setting TESTS_ENABLED to OFF") @@ -139,6 +159,11 @@ if (ENABLE_ZLIB) list(APPEND TEST_SRCS ${ZLIB_TEST_SRCS}) endif() +if (BUILD_TOOLS) + file(GLOB_RECURSE TOOLS_TEST_SRCS "tools/*_test.cc") + list(APPEND TEST_SRCS ${TOOLS_TEST_SRCS}) +endif () + if (${COV_ENABLED}) message("Enable code cov...") add_compile_options(-fprofile-arcs -ftest-coverage) @@ -150,12 +175,21 @@ if (ENABLE_ANTLR4) endif() add_executable(TsFile_Test ${TEST_SRCS}) +if (BUILD_TOOLS) + target_include_directories(TsFile_Test PRIVATE ${CMAKE_SOURCE_DIR}/tools) +endif () +if (APPLE AND NOT MSVC) + target_compile_options(TsFile_Test PRIVATE -std=c++14) +endif () target_link_libraries( TsFile_Test GTest::gtest_main GTest::gmock tsfile ) +if (BUILD_TOOLS) + target_link_libraries(TsFile_Test tsfile_cli_obj) +endif () set_target_properties(TsFile_Test PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${LIB_TSFILE_SDK_DIR}) @@ -185,4 +219,4 @@ if(WIN32) gtest_discover_tests(TsFile_Test DISCOVERY_MODE PRE_TEST DISCOVERY_TIMEOUT 120) else() gtest_discover_tests(TsFile_Test) -endif() \ No newline at end of file +endif() diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 103e71996..43a976c56 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -64,8 +64,7 @@ TEST(ParseArgsTest, FormatFlagParsed) { } TEST(ParseArgsTest, MeasurementsSplitOnComma) { - auto p = - tsfile_cli::parse_args({"select", "-m", "s1,s2,s3", "data.tsfile"}); + auto p = tsfile_cli::parse_args({"cat", "-m", "s1,s2,s3", "data.tsfile"}); ASSERT_EQ(p.measurements.size(), 3u); EXPECT_EQ(p.measurements[1], "s2"); } diff --git a/cpp/test/tools/cli_test_util.h b/cpp/test/tools/cli_test_util.h new file mode 100644 index 000000000..329cda6a8 --- /dev/null +++ b/cpp/test/tools/cli_test_util.h @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_TEST_UTIL_H +#define TSFILE_CLI_TEST_UTIL_H + +#include + +#include + +#include "common/schema.h" +#include "common/tablet.h" +#include "file/write_file.h" +#include "writer/tsfile_table_writer.h" + +namespace tsfile_cli_test { + +inline std::string write_table_fixture( + const std::string& path = "tsfile_cli_fixture.tsfile") { + storage::libtsfile_init(); + std::string table_name = "table1"; + + storage::WriteFile file; + int flags = O_WRONLY | O_CREAT | O_TRUNC; +#ifdef _WIN32 + flags |= O_BINARY; +#endif + file.create(path, flags, 0666); + + auto* schema = new storage::TableSchema( + table_name, + { + common::ColumnSchema("id1", common::STRING, common::UNCOMPRESSED, + common::PLAIN, common::ColumnCategory::TAG), + common::ColumnSchema("id2", common::STRING, common::UNCOMPRESSED, + common::PLAIN, common::ColumnCategory::TAG), + common::ColumnSchema("s1", common::INT64, common::UNCOMPRESSED, + common::PLAIN, common::ColumnCategory::FIELD), + }); + + auto* writer = new storage::TsFileTableWriter(&file, schema); + storage::Tablet tablet( + table_name, {"id1", "id2", "s1"}, + {common::STRING, common::STRING, common::INT64}, + {common::ColumnCategory::TAG, common::ColumnCategory::TAG, + common::ColumnCategory::FIELD}, + 10); + + for (int row = 0; row < 5; ++row) { + tablet.add_timestamp(row, static_cast(row)); + tablet.add_value(row, "id1", "id1_field_1"); + tablet.add_value(row, "id2", "id2_field_2"); + tablet.add_value(row, "s1", static_cast(row * 10)); + } + + writer->write_table(tablet); + writer->flush(); + writer->close(); + + delete writer; + delete schema; + return path; +} + +} // namespace tsfile_cli_test + +#endif // TSFILE_CLI_TEST_UTIL_H diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc new file mode 100644 index 000000000..09289910b --- /dev/null +++ b/cpp/test/tools/command_e2e_test.cc @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include + +#include +#include +#include + +#include "cli/run_cli.h" +#include "cli_test_util.h" + +namespace { + +struct Fixture { + std::string path = tsfile_cli_test::write_table_fixture(); + ~Fixture() { std::remove(path.c_str()); } +}; + +size_t count_lines(const std::string& s) { + size_t n = 0; + for (char c : s) { + if (c == '\n') { + ++n; + } + } + return n; +} + +} // namespace + +TEST(CliE2E, LsListsTableNameTsv) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"ls", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "name\ntable1\n"); + EXPECT_TRUE(err.str().empty()); +} + +TEST(CliE2E, LsNoHeaderJustName) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"ls", "-f", "tsv", "--no-header", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "table1\n"); +} + +TEST(CliE2E, OpenMissingFileReturnsFileError) { + std::ostringstream out; + std::ostringstream err; + int code = + tsfile_cli::run_cli({"ls", "definitely_missing.tsfile"}, out, err); + EXPECT_EQ(code, 2); + EXPECT_FALSE(err.str().empty()); +} + +TEST(CliE2E, SchemaShowsFieldColumnAndType) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"schema", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE( + out.str().find("target\tmeasurement\tdatatype\tencoding\tcompression"), + std::string::npos); + EXPECT_NE(out.str().find("s1"), std::string::npos); + EXPECT_NE(out.str().find("INT64"), std::string::npos); +} + +TEST(CliE2E, SchemaTableMeasurementFilterOnlyShowsRequestedColumn) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"schema", "-m", "s1", "-f", "tsv", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_NE(out.str().find("table1\ts1\tINT64"), std::string::npos); + EXPECT_EQ(out.str().find("table1\tid1"), std::string::npos); + EXPECT_EQ(out.str().find("table1\tid2"), std::string::npos); +} + +TEST(CliE2E, StatsReportsCountAndTimeRange) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"stats", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE( + out.str().find("target\tmeasurement\tcount\tstart_time\tend_time"), + std::string::npos); + EXPECT_NE(out.str().find("s1\t5\t0\t4"), std::string::npos); +} + +TEST(CliE2E, HeadProjectsAndLimits) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"head", "-m", "s1", "-n", "2", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "time\ts1\n0\t0\n1\t10\n"); +} + +TEST(CliE2E, CatReturnsAllRows) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = + tsfile_cli::run_cli({"cat", "-m", "s1", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(count_lines(out.str()), 6u); + EXPECT_NE(out.str().find("time\ts1\n"), std::string::npos); +} + +TEST(CliE2E, CatWithTimeRange) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "2", "--end", + "3", "-f", "tsv", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); +} + +TEST(CliE2E, CatJsonIsNdjson) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "0", "--end", + "0", "-f", "json", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); +} diff --git a/cpp/test/tools/output_format_test.cc b/cpp/test/tools/output_format_test.cc new file mode 100644 index 000000000..6acf865f9 --- /dev/null +++ b/cpp/test/tools/output_format_test.cc @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "format/output_format.h" + +#include + +#include +#include + +#include "common/db_common.h" + +using tsfile_cli::OutputFormat; +using tsfile_cli::ParsedArgs; +using tsfile_cli::RowWriter; + +TEST(ResolveFormatTest, AutoUsesTableOnTtyTsvOtherwise) { + EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, true), + OutputFormat::kTable); + EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, false), + OutputFormat::kTsv); + EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kJson, true), + OutputFormat::kJson); +} + +TEST(CsvEscapeTest, QuotesWhenSpecialCharsPresent) { + EXPECT_EQ(tsfile_cli::csv_escape("plain"), "plain"); + EXPECT_EQ(tsfile_cli::csv_escape("a,b"), "\"a,b\""); + EXPECT_EQ(tsfile_cli::csv_escape("she said \"hi\""), + "\"she said \"\"hi\"\"\""); + EXPECT_EQ(tsfile_cli::csv_escape("line\nbreak"), "\"line\nbreak\""); +} + +TEST(JsonEscapeTest, EscapesQuotesBackslashAndControls) { + EXPECT_EQ(tsfile_cli::json_escape("a\"b\\c"), "a\\\"b\\\\c"); + EXPECT_EQ(tsfile_cli::json_escape("tab\there"), "tab\\there"); +} + +TEST(TypeNameTest, KnownTypesMapToNames) { + EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::INT64), "INT64"); + EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::STRING), "STRING"); + EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::BOOLEAN), "BOOLEAN"); +} + +TEST(EncodingNameTest, KnownEncodings) { + EXPECT_STREQ(tsfile_cli::tsencoding_name(common::PLAIN), "PLAIN"); + EXPECT_STREQ(tsfile_cli::tsencoding_name(common::TS_2DIFF), "TS_2DIFF"); + EXPECT_STREQ(tsfile_cli::tsencoding_name(common::SPRINTZ), "SPRINTZ"); +} + +TEST(CompressionNameTest, KnownCompressors) { + EXPECT_STREQ(tsfile_cli::compression_name(common::UNCOMPRESSED), + "UNCOMPRESSED"); + EXPECT_STREQ(tsfile_cli::compression_name(common::SNAPPY), "SNAPPY"); + EXPECT_STREQ(tsfile_cli::compression_name(common::LZ4), "LZ4"); +} + +TEST(RowWriterTest, TsvWritesHeaderThenRows) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kTsv, {"time", "s1"}, + {common::INT64, common::INT64}, false); + w.write({"1", "10"}, {false, false}); + w.write({"2", ""}, {false, true}); + w.finish(); + EXPECT_EQ(out.str(), "time\ts1\n1\t10\n2\t\n"); +} + +TEST(RowWriterTest, NoHeaderSuppressesHeader) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kTsv, {"name"}, {common::STRING}, true); + w.write({"table1"}, {false}); + w.finish(); + EXPECT_EQ(out.str(), "table1\n"); +} + +TEST(RowWriterTest, CsvEscapesCells) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kCsv, {"name"}, {common::STRING}, false); + w.write({"a,b"}, {false}); + w.finish(); + EXPECT_EQ(out.str(), "name\n\"a,b\"\n"); +} + +TEST(RowWriterTest, JsonNumbersUnquotedStringsQuotedNullEmitted) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kJson, {"time", "name"}, + {common::INT64, common::STRING}, false); + w.write({"5", "dev1"}, {false, false}); + w.write({"6", ""}, {false, true}); + w.finish(); + EXPECT_EQ(out.str(), + "{\"time\":5,\"name\":\"dev1\"}\n" + "{\"time\":6,\"name\":null}\n"); +} + +TEST(RowWriterTest, TableAlignsColumns) { + std::ostringstream out; + RowWriter w(out, OutputFormat::kTable, {"name", "type"}, + {common::STRING, common::STRING}, false); + w.write({"s1", "INT64"}, {false, false}); + w.write({"longname", "BOOLEAN"}, {false, false}); + w.finish(); + EXPECT_EQ(out.str(), + "name type\n" + "s1 INT64\n" + "longname BOOLEAN\n"); +} diff --git a/cpp/tools/CMakeLists.txt b/cpp/tools/CMakeLists.txt new file mode 100644 index 000000000..e1408d67e --- /dev/null +++ b/cpp/tools/CMakeLists.txt @@ -0,0 +1,47 @@ +#[[ +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + https://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +]] + +message("Running in tools directory") + +file(GLOB_RECURSE TSFILE_CLI_SRCS + "cli/*.cc" + "format/*.cc" + "commands/*.cc") + +add_library(tsfile_cli_obj OBJECT ${TSFILE_CLI_SRCS}) +target_include_directories(tsfile_cli_obj PUBLIC + ${CMAKE_CURRENT_SOURCE_DIR} + ${PROJECT_SOURCE_DIR}/src) + +if (ENABLE_ANTLR4) + target_include_directories(tsfile_cli_obj PUBLIC + ${PROJECT_SOURCE_DIR}/third_party/antlr4-cpp-runtime-4/runtime/src) +endif () + +target_compile_definitions(tsfile_cli_obj PRIVATE + TSFILE_CLI_VERSION="${TsFile_CPP_VERSION}") + +add_executable(tsfile_cli tools_main.cc $) +target_include_directories(tsfile_cli PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}) +target_link_libraries(tsfile_cli tsfile) +set_target_properties(tsfile_cli PROPERTIES + OUTPUT_NAME tsfile + RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/bin) + +install(TARGETS tsfile_cli RUNTIME DESTINATION bin) diff --git a/cpp/tools/cli/exit_codes.h b/cpp/tools/cli/exit_codes.h new file mode 100644 index 000000000..0ab6dfdf5 --- /dev/null +++ b/cpp/tools/cli/exit_codes.h @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_EXIT_CODES_H +#define TSFILE_CLI_EXIT_CODES_H + +namespace tsfile_cli { + +constexpr int kExitOk = 0; +constexpr int kExitUsage = 1; +constexpr int kExitFile = 2; +constexpr int kExitRuntime = 3; + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_EXIT_CODES_H diff --git a/cpp/tools/cli/run_cli.h b/cpp/tools/cli/run_cli.h new file mode 100644 index 000000000..79439d152 --- /dev/null +++ b/cpp/tools/cli/run_cli.h @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_RUN_CLI_H +#define TSFILE_CLI_RUN_CLI_H + +#include +#include +#include + +namespace tsfile_cli { + +int run_cli(const std::vector& args, std::ostream& out, + std::ostream& err); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_RUN_CLI_H diff --git a/cpp/tools/commands/cmd_cat.cc b/cpp/tools/commands/cmd_cat.cc new file mode 100644 index 000000000..b1af65d98 --- /dev/null +++ b/cpp/tools/commands/cmd_cat.cc @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/commands.h" + +namespace tsfile_cli { + +int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + return run_row_query(args, reader, fmt, out, err, args.offset, args.limit); +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/cmd_head.cc b/cpp/tools/commands/cmd_head.cc new file mode 100644 index 000000000..06b01908f --- /dev/null +++ b/cpp/tools/commands/cmd_head.cc @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/commands.h" + +namespace tsfile_cli { + +int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + long long limit = args.limit < 0 ? 10 : args.limit; + return run_row_query(args, reader, fmt, out, err, args.offset, limit); +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/cmd_ls.cc b/cpp/tools/commands/cmd_ls.cc new file mode 100644 index 000000000..675151e8a --- /dev/null +++ b/cpp/tools/commands/cmd_ls.cc @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +bool is_table_model(const ParsedArgs& args, storage::TsFileReader& reader) { + if (args.model == "tree") { + return false; + } + if (args.model == "table") { + return true; + } + return !reader.get_all_table_schemas().empty(); +} + +int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + std::vector names; + if (is_table_model(args, reader)) { + for (auto& ts : reader.get_all_table_schemas()) { + if (ts) { + names.push_back(ts->get_table_name()); + } + } + } else { + for (auto& dev : reader.get_all_device_ids()) { + if (dev) { + names.push_back(dev->get_device_name()); + } + } + } + + RowWriter w(out, fmt, {"name"}, {common::STRING}, args.no_header); + for (const std::string& n : names) { + w.write({n}, {false}); + } + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/cmd_schema.cc b/cpp/tools/commands/cmd_schema.cc new file mode 100644 index 000000000..734da1933 --- /dev/null +++ b/cpp/tools/commands/cmd_schema.cc @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/schema.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { +namespace { + +void write_table_schema_rows(const ParsedArgs& args, + storage::TsFileReader& reader, RowWriter& w) { + auto schemas = reader.get_all_table_schemas(); + for (auto& schema : schemas) { + if (!schema) { + continue; + } + if (!args.table.empty() && schema->get_table_name() != args.table) { + continue; + } + std::vector names = schema->get_measurement_names(); + std::vector types = schema->get_data_types(); + for (size_t i = 0; i < names.size(); ++i) { + if (!args.measurements.empty() && + std::find(args.measurements.begin(), args.measurements.end(), + names[i]) == args.measurements.end()) { + continue; + } + const common::TSDataType type = + i < types.size() ? types[i] : common::INVALID_DATATYPE; + w.write({schema->get_table_name(), names[i], tsdatatype_name(type), + "", ""}, + {false, false, false, true, true}); + } + } +} + +} // namespace + +int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w( + out, fmt, + {"target", "measurement", "datatype", "encoding", "compression"}, + {common::STRING, common::STRING, common::STRING, common::STRING, + common::STRING}, + args.no_header); + + if (is_table_model(args, reader)) { + write_table_schema_rows(args, reader, w); + w.finish(); + return kExitOk; + } + + storage::DeviceTimeseriesMetadataMap meta = + reader.get_timeseries_metadata(); + for (auto& kv : meta) { + std::string target = kv.first ? kv.first->get_device_name() : ""; + if (!args.device.empty() && target != args.device) { + continue; + } + + std::map> enc_comp; + if (kv.first) { + std::vector ms; + if (reader.get_timeseries_schema(kv.first, ms) == 0) { + for (auto& m : ms) { + enc_comp[m.measurement_name_] = + std::make_pair(tsencoding_name(m.encoding_), + compression_name(m.compression_type_)); + } + } + } + + for (auto& ts : kv.second) { + if (!ts) { + continue; + } + std::string m = ts->get_measurement_name().to_std_string(); + if (!args.measurements.empty() && + std::find(args.measurements.begin(), args.measurements.end(), + m) == args.measurements.end()) { + continue; + } + std::string enc; + std::string comp; + auto it = enc_comp.find(m); + if (it != enc_comp.end()) { + enc = it->second.first; + comp = it->second.second; + } + w.write( + {target, m, tsdatatype_name(ts->get_data_type()), enc, comp}, + {false, false, false, enc.empty(), comp.empty()}); + } + } + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/cmd_stats.cc b/cpp/tools/commands/cmd_stats.cc new file mode 100644 index 000000000..65b9ed3ba --- /dev/null +++ b/cpp/tools/commands/cmd_stats.cc @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/statistic.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w(out, fmt, + {"target", "measurement", "count", "start_time", "end_time"}, + {common::STRING, common::STRING, common::INT64, common::INT64, + common::INT64}, + args.no_header); + + storage::DeviceTimeseriesMetadataMap meta = + reader.get_timeseries_metadata(); + for (auto& kv : meta) { + std::string target = kv.first ? kv.first->get_device_name() : ""; + if (!args.device.empty() && target != args.device) { + continue; + } + if (!args.table.empty() && kv.first && + kv.first->get_table_name() != args.table) { + continue; + } + for (auto& ts : kv.second) { + if (!ts) { + continue; + } + std::string m = ts->get_measurement_name().to_std_string(); + if (!args.measurements.empty() && + std::find(args.measurements.begin(), args.measurements.end(), + m) == args.measurements.end()) { + continue; + } + storage::Statistic* st = ts->get_statistic(); + if (st != nullptr) { + w.write({target, m, std::to_string(st->get_count()), + std::to_string(st->start_time_), + std::to_string(st->end_time_)}, + {false, false, false, false, false}); + } else { + w.write({target, m, "", "", ""}, + {false, false, true, true, true}); + } + } + } + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/commands.h b/cpp/tools/commands/commands.h new file mode 100644 index 000000000..39a3ac2b0 --- /dev/null +++ b/cpp/tools/commands/commands.h @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_COMMANDS_H +#define TSFILE_CLI_COMMANDS_H + +#include + +#include "cli/cli_args.h" +#include "format/output_format.h" + +namespace storage { +class TsFileReader; +} // namespace storage + +namespace tsfile_cli { + +bool is_table_model(const ParsedArgs& args, storage::TsFileReader& reader); + +int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err, + long long offset, long long limit); + +int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_COMMANDS_H diff --git a/cpp/tools/commands/row_query.cc b/cpp/tools/commands/row_query.cc new file mode 100644 index 000000000..0309108b8 --- /dev/null +++ b/cpp/tools/commands/row_query.cc @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/device_id.h" +#include "common/schema.h" +#include "format/result_set_format.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err, + long long offset, long long limit) { + const int64_t start = args.has_start ? static_cast(args.start) + : std::numeric_limits::min(); + const int64_t end = args.has_end ? static_cast(args.end) + : std::numeric_limits::max(); + + storage::ResultSet* rs = nullptr; + int qret = 0; + + if (is_table_model(args, reader)) { + std::string table_name = args.table; + if (table_name.empty()) { + auto schemas = reader.get_all_table_schemas(); + if (schemas.empty() || !schemas[0]) { + err << "Error: no table found in file\n"; + return kExitRuntime; + } + table_name = schemas[0]->get_table_name(); + } + std::vector cols = args.measurements; + if (cols.empty()) { + auto ts = reader.get_table_schema(table_name); + if (ts) { + cols = ts->get_measurement_names(); + } + } + qret = reader.query(table_name, cols, start, end, rs); + } else { + std::vector devices; + if (!args.device.empty()) { + devices.push_back(args.device); + } else { + for (auto& d : reader.get_all_device_ids()) { + if (d) { + devices.push_back(d->get_device_name()); + } + } + } + + std::vector paths; + for (const std::string& dev : devices) { + std::vector ms = args.measurements; + if (ms.empty()) { + auto did = std::make_shared(dev); + std::vector sch; + if (reader.get_timeseries_schema(did, sch) == 0) { + for (auto& m : sch) { + ms.push_back(m.measurement_name_); + } + } + } + for (const std::string& m : ms) { + paths.push_back(dev + "." + m); + } + } + if (paths.empty()) { + err << "Error: no time series found\n"; + return kExitRuntime; + } + qret = reader.query(paths, start, end, rs); + } + + if (qret != 0 || rs == nullptr) { + err << "Error: query failed (code " << qret << ")\n"; + if (rs != nullptr) { + reader.destroy_query_data_set(rs); + } + return kExitRuntime; + } + + int wret = write_result_set(rs, fmt, args.no_header, out, offset, limit); + reader.destroy_query_data_set(rs); + return wret == 0 ? kExitOk : kExitRuntime; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/format/output_format.cc b/cpp/tools/format/output_format.cc new file mode 100644 index 000000000..fb753689f --- /dev/null +++ b/cpp/tools/format/output_format.cc @@ -0,0 +1,321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "format/output_format.h" + +#include +#include +#include + +namespace tsfile_cli { + +OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty) { + switch (f) { + case ParsedArgs::Format::kCsv: + return OutputFormat::kCsv; + case ParsedArgs::Format::kTsv: + return OutputFormat::kTsv; + case ParsedArgs::Format::kJson: + return OutputFormat::kJson; + case ParsedArgs::Format::kTable: + return OutputFormat::kTable; + case ParsedArgs::Format::kAuto: + default: + return stdout_is_tty ? OutputFormat::kTable : OutputFormat::kTsv; + } +} + +const char* tsdatatype_name(common::TSDataType t) { + switch (t) { + case common::BOOLEAN: + return "BOOLEAN"; + case common::INT32: + return "INT32"; + case common::INT64: + return "INT64"; + case common::FLOAT: + return "FLOAT"; + case common::DOUBLE: + return "DOUBLE"; + case common::TEXT: + return "TEXT"; + case common::VECTOR: + return "VECTOR"; + case common::UNKNOWN: + return "UNKNOWN"; + case common::TIMESTAMP: + return "TIMESTAMP"; + case common::DATE: + return "DATE"; + case common::BLOB: + return "BLOB"; + case common::STRING: + return "STRING"; + case common::NULL_TYPE: + return "NULL"; + case common::INVALID_DATATYPE: + default: + return "INVALID"; + } +} + +const char* tsencoding_name(common::TSEncoding e) { + switch (e) { + case common::PLAIN: + return "PLAIN"; + case common::DICTIONARY: + return "DICTIONARY"; + case common::RLE: + return "RLE"; + case common::DIFF: + return "DIFF"; + case common::TS_2DIFF: + return "TS_2DIFF"; + case common::BITMAP: + return "BITMAP"; + case common::GORILLA_V1: + return "GORILLA_V1"; + case common::REGULAR: + return "REGULAR"; + case common::GORILLA: + return "GORILLA"; + case common::ZIGZAG: + return "ZIGZAG"; + case common::FREQ: + return "FREQ"; + case common::SPRINTZ: + return "SPRINTZ"; + case common::INVALID_ENCODING: + default: + return "UNKNOWN"; + } +} + +const char* compression_name(common::CompressionType c) { + switch (c) { + case common::UNCOMPRESSED: + return "UNCOMPRESSED"; + case common::SNAPPY: + return "SNAPPY"; + case common::GZIP: + return "GZIP"; + case common::LZO: + return "LZO"; + case common::SDT: + return "SDT"; + case common::PAA: + return "PAA"; + case common::PLA: + return "PLA"; + case common::LZ4: + return "LZ4"; + case common::INVALID_COMPRESSION: + default: + return "UNKNOWN"; + } +} + +std::string csv_escape(const std::string& field) { + bool needs_quote = field.find_first_of(",\"\n\r") != std::string::npos; + if (!needs_quote) { + return field; + } + std::string out = "\""; + for (char c : field) { + if (c == '"') { + out += "\"\""; + } else { + out += c; + } + } + out += "\""; + return out; +} + +std::string json_escape(const std::string& s) { + std::string out; + out.reserve(s.size() + 2); + for (unsigned char c : s) { + switch (c) { + case '"': + out += "\\\""; + break; + case '\\': + out += "\\\\"; + break; + case '\b': + out += "\\b"; + break; + case '\f': + out += "\\f"; + break; + case '\n': + out += "\\n"; + break; + case '\r': + out += "\\r"; + break; + case '\t': + out += "\\t"; + break; + default: + if (c < 0x20) { + char buf[8]; + std::snprintf(buf, sizeof(buf), "\\u%04x", c); + out += buf; + } else { + out += static_cast(c); + } + } + } + return out; +} + +RowWriter::RowWriter(std::ostream& out, OutputFormat fmt, + std::vector header, + std::vector types, bool no_header) + : out_(out), + fmt_(fmt), + header_(std::move(header)), + types_(std::move(types)), + no_header_(no_header) {} + +bool RowWriter::is_numeric(size_t col) const { + if (col >= types_.size()) { + return false; + } + switch (types_[col]) { + case common::BOOLEAN: + case common::INT32: + case common::INT64: + case common::FLOAT: + case common::DOUBLE: + case common::TIMESTAMP: + return true; + default: + return false; + } +} + +void RowWriter::ensure_header() { + if (header_done_) { + return; + } + header_done_ = true; + if (no_header_) { + return; + } + const char sep = (fmt_ == OutputFormat::kCsv) ? ',' : '\t'; + for (size_t i = 0; i < header_.size(); ++i) { + if (i) { + out_ << sep; + } + out_ << (fmt_ == OutputFormat::kCsv ? csv_escape(header_[i]) + : header_[i]); + } + out_ << "\n"; +} + +void RowWriter::write(const std::vector& cells, + const std::vector& is_null) { + if (fmt_ == OutputFormat::kTable) { + rows_.push_back(cells); + rows_null_.push_back(is_null); + return; + } + if (fmt_ == OutputFormat::kJson) { + out_ << "{"; + for (size_t i = 0; i < header_.size(); ++i) { + if (i) { + out_ << ","; + } + out_ << "\"" << json_escape(header_[i]) << "\":"; + if (i < is_null.size() && is_null[i]) { + out_ << "null"; + } else if (is_numeric(i)) { + out_ << (i < cells.size() ? cells[i] : "null"); + } else { + out_ << "\"" << json_escape(i < cells.size() ? cells[i] : "") + << "\""; + } + } + out_ << "}\n"; + return; + } + + ensure_header(); + const char sep = (fmt_ == OutputFormat::kCsv) ? ',' : '\t'; + for (size_t i = 0; i < cells.size(); ++i) { + if (i) { + out_ << sep; + } + bool null_cell = i < is_null.size() && is_null[i]; + if (null_cell) { + continue; + } + out_ << (fmt_ == OutputFormat::kCsv ? csv_escape(cells[i]) : cells[i]); + } + out_ << "\n"; +} + +void RowWriter::finish() { + if (fmt_ != OutputFormat::kTable) { + if (fmt_ == OutputFormat::kCsv || fmt_ == OutputFormat::kTsv) { + ensure_header(); + } + return; + } + + const size_t ncols = header_.size(); + std::vector width(ncols, 0); + if (!no_header_) { + for (size_t i = 0; i < ncols; ++i) { + width[i] = header_[i].size(); + } + } + for (const auto& row : rows_) { + for (size_t i = 0; i < ncols && i < row.size(); ++i) { + width[i] = std::max(width[i], row[i].size()); + } + } + + auto emit = [&](const std::vector& cells, + const std::vector& nulls) { + for (size_t i = 0; i < ncols; ++i) { + std::string cell = + (i < cells.size() && !(i < nulls.size() && nulls[i])) ? cells[i] + : ""; + out_ << cell; + if (i + 1 < ncols) { + out_ << std::string(width[i] - cell.size() + 2, ' '); + } + } + out_ << "\n"; + }; + + if (!no_header_) { + std::vector no_nulls(ncols, false); + emit(header_, no_nulls); + } + for (size_t r = 0; r < rows_.size(); ++r) { + emit(rows_[r], rows_null_[r]); + } +} + +} // namespace tsfile_cli diff --git a/cpp/tools/format/output_format.h b/cpp/tools/format/output_format.h new file mode 100644 index 000000000..c4fa14885 --- /dev/null +++ b/cpp/tools/format/output_format.h @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_OUTPUT_FORMAT_H +#define TSFILE_CLI_OUTPUT_FORMAT_H + +#include +#include +#include + +#include "cli/cli_args.h" +#include "common/db_common.h" + +namespace tsfile_cli { + +enum class OutputFormat { kCsv, kTsv, kJson, kTable }; + +OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty); + +const char* tsdatatype_name(common::TSDataType t); +const char* tsencoding_name(common::TSEncoding e); +const char* compression_name(common::CompressionType c); + +std::string csv_escape(const std::string& field); +std::string json_escape(const std::string& s); + +class RowWriter { + public: + RowWriter(std::ostream& out, OutputFormat fmt, + std::vector header, + std::vector types, bool no_header); + + void write(const std::vector& cells, + const std::vector& is_null); + void finish(); + + private: + void ensure_header(); + bool is_numeric(size_t col) const; + + std::ostream& out_; + OutputFormat fmt_; + std::vector header_; + std::vector types_; + bool no_header_; + bool header_done_ = false; + std::vector> rows_; + std::vector> rows_null_; +}; + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_OUTPUT_FORMAT_H diff --git a/cpp/tools/format/result_set_format.cc b/cpp/tools/format/result_set_format.cc new file mode 100644 index 000000000..bf30f0a6f --- /dev/null +++ b/cpp/tools/format/result_set_format.cc @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "format/result_set_format.h" + +#include +#include +#include +#include + +#include "utils/errno_define.h" + +namespace tsfile_cli { + +std::string cell_to_string(storage::ResultSet* rs, uint32_t i, + common::TSDataType type) { + std::ostringstream ss; + switch (type) { + case common::BOOLEAN: + return rs->get_value(i) ? "true" : "false"; + case common::INT32: + ss << rs->get_value(i); + return ss.str(); + case common::INT64: + case common::TIMESTAMP: + ss << rs->get_value(i); + return ss.str(); + case common::FLOAT: + ss << rs->get_value(i); + return ss.str(); + case common::DOUBLE: + ss << rs->get_value(i); + return ss.str(); + case common::DATE: { + std::tm d = rs->get_value(i); + char buf[16]; + std::snprintf(buf, sizeof(buf), "%04d-%02d-%02d", d.tm_year + 1900, + d.tm_mon + 1, d.tm_mday); + return buf; + } + case common::TEXT: + case common::STRING: + case common::BLOB: { + common::String* s = rs->get_value(i); + return s == nullptr ? std::string() : s->to_std_string(); + } + default: + return ""; + } +} + +int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out, long long offset, long long limit) { + auto meta = rs->get_metadata(); + const uint32_t ncol = meta->get_column_count(); + std::vector header; + std::vector types; + header.reserve(ncol); + types.reserve(ncol); + for (uint32_t i = 1; i <= ncol; ++i) { + header.push_back(meta->get_column_name(i)); + types.push_back(meta->get_column_type(i)); + } + + RowWriter writer(out, fmt, header, types, no_header); + bool has_next = false; + int code = common::E_OK; + long long skipped = 0; + long long emitted = 0; + while ((code = rs->next(has_next)) == common::E_OK && has_next) { + if (skipped < offset) { + ++skipped; + continue; + } + if (limit >= 0 && emitted >= limit) { + break; + } + std::vector cells(ncol); + std::vector nulls(ncol, false); + for (uint32_t i = 1; i <= ncol; ++i) { + if (rs->is_null(i)) { + nulls[i - 1] = true; + } else { + cells[i - 1] = cell_to_string(rs, i, types[i - 1]); + } + } + writer.write(cells, nulls); + ++emitted; + } + writer.finish(); + return code; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/format/result_set_format.h b/cpp/tools/format/result_set_format.h new file mode 100644 index 000000000..b49667a4d --- /dev/null +++ b/cpp/tools/format/result_set_format.h @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_RESULT_SET_FORMAT_H +#define TSFILE_CLI_RESULT_SET_FORMAT_H + +#include +#include + +#include "common/db_common.h" +#include "format/output_format.h" +#include "reader/result_set.h" + +namespace tsfile_cli { + +std::string cell_to_string(storage::ResultSet* rs, uint32_t col_index, + common::TSDataType type); + +int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out, long long offset = 0, + long long limit = -1); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_RESULT_SET_FORMAT_H diff --git a/cpp/tools/tools_main.cc b/cpp/tools/tools_main.cc new file mode 100644 index 000000000..97c815f47 --- /dev/null +++ b/cpp/tools/tools_main.cc @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include + +#include "cli/run_cli.h" + +int main(int argc, char** argv) { + std::vector args(argv + 1, argv + argc); + return tsfile_cli::run_cli(args, std::cout, std::cerr); +} From 59d6e4b1d1042b0732c84f0810be621b4ee5e204 Mon Sep 17 00:00:00 2001 From: spricoder Date: Tue, 2 Jun 2026 19:54:08 +0800 Subject: [PATCH 08/41] Consolidate TsFile CLI spec and plan --- .../plans/2026-06-01-tsfile-unix-cli.md | 2081 ----------------- ...i-redesign.md => 2026-06-02-tsfile-cli.md} | 922 ++++---- .../2026-06-01-tsfile-unix-cli-design.md | 371 --- .../specs/2026-06-02-tsfile-cli-design.md | 334 +++ 4 files changed, 760 insertions(+), 2948 deletions(-) delete mode 100644 docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md rename docs/superpowers/plans/{2026-06-02-tsfile-cli-redesign.md => 2026-06-02-tsfile-cli.md} (54%) delete mode 100644 docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md create mode 100644 docs/superpowers/specs/2026-06-02-tsfile-cli-design.md diff --git a/docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md b/docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md deleted file mode 100644 index 9484e3f78..000000000 --- a/docs/superpowers/plans/2026-06-01-tsfile-unix-cli.md +++ /dev/null @@ -1,2081 +0,0 @@ - - -# TsFile Unix-philosophy C++ CLI — Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Ship a single `tsfile` C++ binary with read-only, pipe-friendly verbs (`ls`, `schema`, `stats`, `head`, `cat`, `select`) for inspecting and exporting `.tsfile` files. - -**Architecture:** A new `cpp/tools/` directory builds an OBJECT library (`tsfile_cli_obj`) plus a thin `main`. The library is also linked into `TsFile_Test` for unit tests. All command output goes to an injected `std::ostream&` (data→stdout, diagnostics→stderr) so commands are testable in-process. Formatting is split into a pure layer (escaping/aligning/`RowWriter`, no reader dependency, heavily unit-tested) and a `ResultSet` pump layer (e2e-tested against a generated fixture). Everything is backed by the existing `storage::TsFileReader` API — the read engine is not modified. - -**Tech Stack:** C++11, CMake (≥3.11), Google Test 1.12.1, clang-format (Google style). No new third-party/runtime dependencies (argument parsing is hand-rolled). - -**Spec:** `docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md` - ---- - -## Conventions used in every task - -- **License header:** every new file (`.h`, `.cc`, `CMakeLists.txt`) starts with the Apache 2.0 header. For `.h`/`.cc` use the `/* ... */` block form copied verbatim from any existing file (e.g. `cpp/src/file/read_file.h` lines 1-18). For `CMakeLists.txt` use the `#[[ ... ]]` form (see `cpp/examples/CMakeLists.txt` lines 1-18). The code blocks below omit the header for brevity — **prepend it to each new file.** -- **Namespace:** all CLI code lives in `namespace tsfile_cli`. -- **Formatting:** run `./mvnw spotless:apply` (or `clang-format`) before each commit; the build's `-Wall` must stay clean. -- **Build/run from** `cpp/`: `bash build.sh -t=Debug` produces `build/Debug/bin/tsfile` and `build/Debug/lib/TsFile_Test`. - -## File structure (created by this plan) - -``` -cpp/tools/ -├── CMakeLists.txt # OBJECT lib tsfile_cli_obj + executable tsfile -├── tools_main.cc # main(): forwards argv to run_cli -├── cli/ -│ ├── exit_codes.h # kExitOk/kExitUsage/kExitFile/kExitRuntime -│ ├── cli_args.h / cli_args.cc # ParsedArgs + parse_args() -│ └── run_cli.h / run_cli.cc # top-level dispatch, reader open, error→exit mapping -├── format/ -│ ├── output_format.h / .cc # pure: resolve_format, escapes, type names, RowWriter -│ └── result_set_format.h / .cc # ResultSet pump: cell_to_string, write_result_set -└── commands/ - ├── commands.h # is_table_model + cmd_* declarations - ├── cmd_ls.cc cmd_schema.cc cmd_stats.cc - └── cmd_head.cc cmd_cat.cc cmd_select.cc - -cpp/test/tools/ -├── cli_test_util.h # writes a table-model fixture .tsfile to a temp path -├── cli_args_test.cc -├── output_format_test.cc -└── command_e2e_test.cc -``` - -Modified files: -- `cpp/CMakeLists.txt` — add `option(BUILD_TOOLS ...)` and `add_subdirectory(tools)`. -- `cpp/test/CMakeLists.txt` — glob `tools/*_test.cc`, link `tsfile_cli_obj`. -- `cpp/src/file/read_file.cc:52-55` — route open-error prints to `stderr`. - ---- - -## Task sequencing - -Tasks are ordered so each ends green and committable: - -1. CMake scaffold + `main` + `run_cli` skeleton (`--version`/`--help`) -2. `parse_args` (cli_args) -3. Pure output formatting (`output_format`) -4. `ResultSet` pump (`result_set_format`) -5. Model detection + `cmd_ls` -6. `cmd_schema` -7. `cmd_stats` -8. `cmd_head` / `cmd_cat` / `cmd_select` (row data) -9. Library stderr fix + `install()` + full-suite run + manual tree-model verification - -Detailed tasks follow in separate sections of this document (one task per `###` heading). Each is self-contained: exact files, complete code, exact commands, expected output. - ---- - -### Task 1: CMake scaffold + `main` + `run_cli` skeleton - -**Files:** -- Create: `cpp/tools/cli/exit_codes.h` -- Create: `cpp/tools/cli/run_cli.h`, `cpp/tools/cli/run_cli.cc` -- Create: `cpp/tools/tools_main.cc` -- Create: `cpp/tools/CMakeLists.txt` -- Modify: `cpp/CMakeLists.txt` (add option + subdir) -- Modify: `cpp/test/CMakeLists.txt` (glob tools tests, link object lib) -- Test: `cpp/test/tools/cli_args_test.cc` (skeleton-level: version/help) - -- [ ] **Step 1: Write the failing test** — `cpp/test/tools/cli_args_test.cc` - -```cpp -#include -#include -#include "cli/run_cli.h" - -TEST(RunCliTest, VersionFlagPrintsVersionAndReturnsOk) { - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"--version"}, out, err); - EXPECT_EQ(code, 0); - EXPECT_NE(out.str().find("tsfile"), std::string::npos); - EXPECT_TRUE(err.str().empty()); -} - -TEST(RunCliTest, NoArgsPrintsUsageToErrAndReturnsUsageError) { - std::ostringstream out, err; - int code = tsfile_cli::run_cli({}, out, err); - EXPECT_EQ(code, 1); - EXPECT_NE(err.str().find("Usage"), std::string::npos); -} - -TEST(RunCliTest, UnknownCommandIsUsageError) { - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"frobnicate", "x.tsfile"}, out, err); - EXPECT_EQ(code, 1); - EXPECT_NE(err.str().find("Unknown command"), std::string::npos); -} -``` - -- [ ] **Step 2: Create `cpp/tools/cli/exit_codes.h`** - -```cpp -#ifndef TSFILE_CLI_EXIT_CODES_H -#define TSFILE_CLI_EXIT_CODES_H -namespace tsfile_cli { -constexpr int kExitOk = 0; // success -constexpr int kExitUsage = 1; // bad arguments / unknown command -constexpr int kExitFile = 2; // cannot open or corrupted file -constexpr int kExitRuntime = 3; // query / runtime error -} // namespace tsfile_cli -#endif // TSFILE_CLI_EXIT_CODES_H -``` - -- [ ] **Step 3: Create `cpp/tools/cli/run_cli.h`** - -```cpp -#ifndef TSFILE_CLI_RUN_CLI_H -#define TSFILE_CLI_RUN_CLI_H -#include -#include -#include -namespace tsfile_cli { -// Entry point used by main() and by tests. argv excludes the program name. -// Data is written to `out`, diagnostics/errors to `err`. Returns an exit code -// from exit_codes.h. -int run_cli(const std::vector& args, std::ostream& out, - std::ostream& err); -} // namespace tsfile_cli -#endif // TSFILE_CLI_RUN_CLI_H -``` - -- [ ] **Step 4: Create `cpp/tools/cli/run_cli.cc`** (skeleton — dispatch filled in later tasks) - -```cpp -#include "cli/run_cli.h" - -#include "cli/exit_codes.h" - -#ifndef TSFILE_CLI_VERSION -#define TSFILE_CLI_VERSION "unknown" -#endif - -namespace tsfile_cli { - -namespace { -void print_usage(std::ostream& os) { - os << "Usage: tsfile [options] \n" - "Commands: ls schema stats head cat select\n" - "Run 'tsfile help ' for command options.\n"; -} -} // namespace - -int run_cli(const std::vector& args, std::ostream& out, - std::ostream& err) { - for (const std::string& a : args) { - if (a == "--version") { - out << "tsfile (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; - return kExitOk; - } - } - if (args.empty() || args[0] == "--help" || args[0] == "-h" || - args[0] == "help") { - print_usage(args.empty() ? err : out); - return args.empty() ? kExitUsage : kExitOk; - } - - const std::string& command = args[0]; - // Dispatch table is extended in Tasks 5-8. - err << "Unknown command: " << command << "\n"; - print_usage(err); - return kExitUsage; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 5: Create `cpp/tools/tools_main.cc`** - -```cpp -#include -#include -#include - -#include "cli/run_cli.h" - -int main(int argc, char** argv) { - std::vector args(argv + 1, argv + argc); - return tsfile_cli::run_cli(args, std::cout, std::cerr); -} -``` - -- [ ] **Step 6: Create `cpp/tools/CMakeLists.txt`** - -```cmake -message("Running in tools directory") - -# Sources for the CLI logic, excluding main(), compiled once as an OBJECT -# library so both the executable and the test target can reuse them. -file(GLOB_RECURSE TSFILE_CLI_SRCS - "cli/*.cc" - "format/*.cc" - "commands/*.cc") - -add_library(tsfile_cli_obj OBJECT ${TSFILE_CLI_SRCS}) - -# Headers: this dir (for "cli/..", "format/..", "commands/..") + the SDK src. -target_include_directories(tsfile_cli_obj PUBLIC - ${CMAKE_CURRENT_SOURCE_DIR} - ${PROJECT_SOURCE_DIR}/src) -if (ENABLE_ANTLR4) - target_include_directories(tsfile_cli_obj PUBLIC - ${PROJECT_SOURCE_DIR}/third_party/antlr4-cpp-runtime-4/runtime/src) -endif () - -target_compile_definitions(tsfile_cli_obj PRIVATE - TSFILE_CLI_VERSION="${TsFile_CPP_VERSION}") - -# The shipped binary. Target name differs from the `tsfile` library target to -# avoid a collision; OUTPUT_NAME makes the file `tsfile`. -add_executable(tsfile_cli tools_main.cc $) -target_include_directories(tsfile_cli PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}) -target_link_libraries(tsfile_cli tsfile) -set_target_properties(tsfile_cli PROPERTIES - OUTPUT_NAME tsfile - RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/bin) -``` - -- [ ] **Step 7: Modify `cpp/CMakeLists.txt`** — add the option after the other `option(...)` lines (near line 171) and the subdir before `add_subdirectory(test)` (so `tsfile_cli_obj` exists for the test target). Insert: - -```cmake -option(BUILD_TOOLS "Build the tsfile command-line tools" ON) -message("cmake using: BUILD_TOOLS=${BUILD_TOOLS}") -``` - -and change the tail of the file from: - -```cmake -add_subdirectory(src) -if (BUILD_TEST) -``` - -to: - -```cmake -add_subdirectory(src) -if (BUILD_TOOLS) - add_subdirectory(tools) -endif () -if (BUILD_TEST) -``` - -- [ ] **Step 8: Modify `cpp/test/CMakeLists.txt`** — add tools test glob after the existing `file(GLOB_RECURSE TEST_SRCS ...)` block (after line 114): - -```cmake -if (BUILD_TOOLS) - file(GLOB_RECURSE TOOLS_TEST_SRCS "tools/*_test.cc") - list(APPEND TEST_SRCS ${TOOLS_TEST_SRCS}) -endif () -``` - -and extend the test target's link + includes. Change: - -```cmake -add_executable(TsFile_Test ${TEST_SRCS}) -target_link_libraries( - TsFile_Test - GTest::gtest_main - GTest::gmock - tsfile -) -``` - -to: - -```cmake -add_executable(TsFile_Test ${TEST_SRCS}) -if (BUILD_TOOLS) - target_include_directories(TsFile_Test PRIVATE ${CMAKE_SOURCE_DIR}/tools) -endif () -target_link_libraries( - TsFile_Test - GTest::gtest_main - GTest::gmock - tsfile -) -if (BUILD_TOOLS) - target_link_libraries(TsFile_Test tsfile_cli_obj) -endif () -``` - -- [ ] **Step 9: Build and run the tests** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -20` -Expected: build succeeds; `build/Debug/bin/tsfile` and `build/Debug/lib/TsFile_Test` exist. - -Run: `cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=RunCliTest.*` -Expected: 3 tests PASS. - -Run: `cd cpp && ./build/Debug/bin/tsfile --version` -Expected: prints `tsfile (Apache TsFile C++) 2.2.1.dev` and exits 0. - -- [ ] **Step 10: Commit** - -```bash -git add cpp/tools cpp/test/tools/cli_args_test.cc cpp/CMakeLists.txt cpp/test/CMakeLists.txt -git commit -m "feat(cpp-tools): scaffold tsfile CLI binary with run_cli skeleton" -``` - ---- - -### Task 2: `parse_args` (cli_args) - -**Files:** -- Create: `cpp/tools/cli/cli_args.h`, `cpp/tools/cli/cli_args.cc` -- Test: append to `cpp/test/tools/cli_args_test.cc` - -- [ ] **Step 1: Write the failing tests** — append to `cpp/test/tools/cli_args_test.cc` - -```cpp -#include "cli/cli_args.h" - -TEST(ParseArgsTest, CommandAndFilePositional) { - auto p = tsfile_cli::parse_args({"ls", "data.tsfile"}); - EXPECT_TRUE(p.error.empty()); - EXPECT_EQ(p.command, "ls"); - EXPECT_EQ(p.file, "data.tsfile"); -} - -TEST(ParseArgsTest, FormatFlagParsed) { - auto p = tsfile_cli::parse_args({"cat", "-f", "json", "data.tsfile"}); - EXPECT_TRUE(p.error.empty()); - EXPECT_EQ(p.format, tsfile_cli::ParsedArgs::Format::kJson); -} - -TEST(ParseArgsTest, MeasurementsSplitOnComma) { - auto p = tsfile_cli::parse_args( - {"select", "-m", "s1,s2,s3", "data.tsfile"}); - ASSERT_EQ(p.measurements.size(), 3u); - EXPECT_EQ(p.measurements[1], "s2"); -} - -TEST(ParseArgsTest, LimitOffsetAndTimeRange) { - auto p = tsfile_cli::parse_args( - {"head", "-n", "5", "--offset", "2", "--start", "100", "--end", "200", - "data.tsfile"}); - EXPECT_EQ(p.limit, 5); - EXPECT_EQ(p.offset, 2); - EXPECT_TRUE(p.has_start); - EXPECT_EQ(p.start, 100); - EXPECT_TRUE(p.has_end); - EXPECT_EQ(p.end, 200); -} - -TEST(ParseArgsTest, UnknownFlagIsError) { - auto p = tsfile_cli::parse_args({"ls", "--bogus", "data.tsfile"}); - EXPECT_FALSE(p.error.empty()); -} - -TEST(ParseArgsTest, BadFormatValueIsError) { - auto p = tsfile_cli::parse_args({"cat", "-f", "yaml", "data.tsfile"}); - EXPECT_FALSE(p.error.empty()); -} - -TEST(ParseArgsTest, MissingFileIsAllowedAtParseTime) { - // File presence is validated by run_cli, not parse_args. - auto p = tsfile_cli::parse_args({"ls"}); - EXPECT_TRUE(p.error.empty()); - EXPECT_EQ(p.command, "ls"); - EXPECT_TRUE(p.file.empty()); -} -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.*` -Expected: compile failure (`cli_args.h` missing) — that counts as red. - -- [ ] **Step 3: Create `cpp/tools/cli/cli_args.h`** - -```cpp -#ifndef TSFILE_CLI_CLI_ARGS_H -#define TSFILE_CLI_CLI_ARGS_H -#include -#include -#include -namespace tsfile_cli { -struct ParsedArgs { - enum class Format { kAuto, kCsv, kTsv, kJson, kTable }; - std::string command; - std::string file; - std::string device; // -d / --device (tree model) - std::string table; // -t / --table (table model) - std::vector measurements; // -m / --measurements (comma list) - long long limit = -1; // -n / --limit (<0 = unlimited) - long long offset = 0; // --offset - long long start = LLONG_MIN; // --start (epoch ms) - long long end = LLONG_MAX; // --end (epoch ms) - bool has_start = false; - bool has_end = false; - Format format = Format::kAuto; // -f / --format - bool no_header = false; // --no-header - std::string model; // --model "tree"|"table"|"" - bool help = false; - bool version = false; - std::string error; // non-empty => parse error message -}; - -// Parses args (program name already stripped). On bad input, returns a -// ParsedArgs whose .error is set; otherwise .error is empty. Does NOT validate -// that a file was supplied — run_cli does that per command. -ParsedArgs parse_args(const std::vector& args); -} // namespace tsfile_cli -#endif // TSFILE_CLI_CLI_ARGS_H -``` - -- [ ] **Step 4: Create `cpp/tools/cli/cli_args.cc`** - -```cpp -#include "cli/cli_args.h" - -#include -#include - -namespace tsfile_cli { - -namespace { -std::vector split_csv(const std::string& s) { - std::vector out; - std::string item; - std::istringstream iss(s); - while (std::getline(iss, item, ',')) { - if (!item.empty()) out.push_back(item); - } - return out; -} - -bool parse_ll(const std::string& s, long long& out) { - if (s.empty()) return false; - char* endp = nullptr; - long long v = std::strtoll(s.c_str(), &endp, 10); - if (endp == nullptr || *endp != '\0') return false; - out = v; - return true; -} - -bool parse_format(const std::string& s, ParsedArgs::Format& out) { - if (s == "csv") out = ParsedArgs::Format::kCsv; - else if (s == "tsv") out = ParsedArgs::Format::kTsv; - else if (s == "json") out = ParsedArgs::Format::kJson; - else if (s == "table") out = ParsedArgs::Format::kTable; - else return false; - return true; -} -} // namespace - -ParsedArgs parse_args(const std::vector& args) { - ParsedArgs p; - if (args.empty()) return p; - p.command = args[0]; - - // Flags requiring a value; the lambda fetches the next token. - size_t i = 1; - auto need_value = [&](const std::string& flag, std::string& dst) -> bool { - if (i + 1 >= args.size()) { - p.error = "Missing value for " + flag; - return false; - } - dst = args[++i]; - return true; - }; - - for (; i < args.size(); ++i) { - const std::string& a = args[i]; - std::string val; - if (a == "-f" || a == "--format") { - if (!need_value(a, val)) return p; - if (!parse_format(val, p.format)) { - p.error = "Invalid format: " + val + " (use csv|tsv|json|table)"; - return p; - } - } else if (a == "-d" || a == "--device") { - if (!need_value(a, p.device)) return p; - } else if (a == "-t" || a == "--table") { - if (!need_value(a, p.table)) return p; - } else if (a == "-m" || a == "--measurements") { - if (!need_value(a, val)) return p; - p.measurements = split_csv(val); - } else if (a == "-n" || a == "--limit") { - if (!need_value(a, val)) return p; - if (!parse_ll(val, p.limit)) { p.error = "Invalid --limit: " + val; return p; } - } else if (a == "--offset") { - if (!need_value(a, val)) return p; - if (!parse_ll(val, p.offset)) { p.error = "Invalid --offset: " + val; return p; } - } else if (a == "--start") { - if (!need_value(a, val)) return p; - if (!parse_ll(val, p.start)) { p.error = "Invalid --start: " + val; return p; } - p.has_start = true; - } else if (a == "--end") { - if (!need_value(a, val)) return p; - if (!parse_ll(val, p.end)) { p.error = "Invalid --end: " + val; return p; } - p.has_end = true; - } else if (a == "--model") { - if (!need_value(a, val)) return p; - if (val != "tree" && val != "table") { - p.error = "Invalid --model: " + val + " (use tree|table)"; - return p; - } - p.model = val; - } else if (a == "--no-header") { - p.no_header = true; - } else if (a == "-h" || a == "--help") { - p.help = true; - } else if (a == "--version") { - p.version = true; - } else if (!a.empty() && a[0] == '-') { - p.error = "Unknown flag: " + a; - return p; - } else { - // First bare token is the file path; extra positionals are an error. - if (p.file.empty()) p.file = a; - else { p.error = "Unexpected argument: " + a; return p; } - } - } - return p; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 5: Build and run tests to verify they pass** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.*:RunCliTest.*` -Expected: all PASS. - -- [ ] **Step 6: Commit** - -```bash -git add cpp/tools/cli/cli_args.h cpp/tools/cli/cli_args.cc cpp/test/tools/cli_args_test.cc -git commit -m "feat(cpp-tools): add hand-rolled CLI argument parser" -``` - ---- - -### Task 3: Pure output formatting (`output_format`) - -**Files:** -- Create: `cpp/tools/format/output_format.h`, `cpp/tools/format/output_format.cc` -- Test: `cpp/test/tools/output_format_test.cc` - -This layer has **no dependency on the reader**: it operates on pre-stringified -cells plus a parallel vector of column types (used only to decide JSON quoting). - -- [ ] **Step 1: Write the failing tests** — `cpp/test/tools/output_format_test.cc` - -```cpp -#include - -#include -#include - -#include "common/db_common.h" -#include "format/output_format.h" - -using tsfile_cli::OutputFormat; -using tsfile_cli::ParsedArgs; -using tsfile_cli::RowWriter; - -TEST(ResolveFormatTest, AutoUsesTableOnTtyTsvOtherwise) { - EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, true), - OutputFormat::kTable); - EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, false), - OutputFormat::kTsv); - EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kJson, true), - OutputFormat::kJson); -} - -TEST(CsvEscapeTest, QuotesWhenSpecialCharsPresent) { - EXPECT_EQ(tsfile_cli::csv_escape("plain"), "plain"); - EXPECT_EQ(tsfile_cli::csv_escape("a,b"), "\"a,b\""); - EXPECT_EQ(tsfile_cli::csv_escape("she said \"hi\""), - "\"she said \"\"hi\"\"\""); - EXPECT_EQ(tsfile_cli::csv_escape("line\nbreak"), "\"line\nbreak\""); -} - -TEST(JsonEscapeTest, EscapesQuotesBackslashAndControls) { - EXPECT_EQ(tsfile_cli::json_escape("a\"b\\c"), "a\\\"b\\\\c"); - EXPECT_EQ(tsfile_cli::json_escape("tab\there"), "tab\\there"); -} - -TEST(TypeNameTest, KnownTypesMapToNames) { - EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::INT64), "INT64"); - EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::STRING), "STRING"); - EXPECT_STREQ(tsfile_cli::tsdatatype_name(common::BOOLEAN), "BOOLEAN"); -} - -TEST(RowWriterTest, TsvWritesHeaderThenRows) { - std::ostringstream out; - RowWriter w(out, OutputFormat::kTsv, {"time", "s1"}, - {common::INT64, common::INT64}, /*no_header=*/false); - w.write({"1", "10"}, {false, false}); - w.write({"2", ""}, {false, true}); - w.finish(); - EXPECT_EQ(out.str(), "time\ts1\n1\t10\n2\t\n"); -} - -TEST(RowWriterTest, NoHeaderSuppressesHeader) { - std::ostringstream out; - RowWriter w(out, OutputFormat::kTsv, {"name"}, {common::STRING}, true); - w.write({"table1"}, {false}); - w.finish(); - EXPECT_EQ(out.str(), "table1\n"); -} - -TEST(RowWriterTest, CsvEscapesCells) { - std::ostringstream out; - RowWriter w(out, OutputFormat::kCsv, {"name"}, {common::STRING}, false); - w.write({"a,b"}, {false}); - w.finish(); - EXPECT_EQ(out.str(), "name\n\"a,b\"\n"); -} - -TEST(RowWriterTest, JsonNumbersUnquotedStringsQuotedNullEmitted) { - std::ostringstream out; - RowWriter w(out, OutputFormat::kJson, {"time", "name"}, - {common::INT64, common::STRING}, false); - w.write({"5", "dev1"}, {false, false}); - w.write({"6", ""}, {false, true}); - w.finish(); - EXPECT_EQ(out.str(), - "{\"time\":5,\"name\":\"dev1\"}\n" - "{\"time\":6,\"name\":null}\n"); -} - -TEST(RowWriterTest, TableAlignsColumns) { - std::ostringstream out; - RowWriter w(out, OutputFormat::kTable, {"name", "type"}, - {common::STRING, common::STRING}, false); - w.write({"s1", "INT64"}, {false, false}); - w.write({"longname", "BOOLEAN"}, {false, false}); - w.finish(); - EXPECT_EQ(out.str(), - "name type\n" - "s1 INT64\n" - "longname BOOLEAN\n"); -} -``` - -- [ ] **Step 2: Run tests to verify they fail** - -Run: `cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=*Format*:RowWriterTest.*:*EscapeTest*:TypeNameTest.*` -Expected: compile failure (`format/output_format.h` missing) — red. - -- [ ] **Step 3: Create `cpp/tools/format/output_format.h`** - -```cpp -#ifndef TSFILE_CLI_OUTPUT_FORMAT_H -#define TSFILE_CLI_OUTPUT_FORMAT_H - -#include -#include -#include - -#include "cli/cli_args.h" -#include "common/db_common.h" - -namespace tsfile_cli { - -enum class OutputFormat { kCsv, kTsv, kJson, kTable }; - -// kAuto resolves to kTable on a TTY, kTsv otherwise. Other values pass through. -OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty); - -// Stable display name for every TSDataType value (does not assert). -const char* tsdatatype_name(common::TSDataType t); - -std::string csv_escape(const std::string& field); -std::string json_escape(const std::string& s); - -// Writes rows in the chosen format. Cells are pre-stringified; `types` is used -// only by the JSON formatter to decide whether a value is emitted bare -// (numeric/boolean) or quoted (everything else). For kTable, rows are buffered -// and flushed (column-aligned) by finish(). -class RowWriter { - public: - RowWriter(std::ostream& out, OutputFormat fmt, - std::vector header, - std::vector types, bool no_header); - void write(const std::vector& cells, - const std::vector& is_null); - void finish(); - - private: - void ensure_header(); // streaming formats: lazily emit header - bool is_numeric(size_t col) const; // JSON: bare vs quoted - - std::ostream& out_; - OutputFormat fmt_; - std::vector header_; - std::vector types_; - bool no_header_; - bool header_done_ = false; - std::vector> rows_; // kTable buffer - std::vector> rows_null_; // kTable buffer -}; - -} // namespace tsfile_cli -#endif // TSFILE_CLI_OUTPUT_FORMAT_H -``` - -- [ ] **Step 4: Create `cpp/tools/format/output_format.cc`** - -```cpp -#include "format/output_format.h" - -#include -#include - -namespace tsfile_cli { - -OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty) { - switch (f) { - case ParsedArgs::Format::kCsv: return OutputFormat::kCsv; - case ParsedArgs::Format::kTsv: return OutputFormat::kTsv; - case ParsedArgs::Format::kJson: return OutputFormat::kJson; - case ParsedArgs::Format::kTable: return OutputFormat::kTable; - case ParsedArgs::Format::kAuto: - default: - return stdout_is_tty ? OutputFormat::kTable : OutputFormat::kTsv; - } -} - -const char* tsdatatype_name(common::TSDataType t) { - switch (t) { - case common::BOOLEAN: return "BOOLEAN"; - case common::INT32: return "INT32"; - case common::INT64: return "INT64"; - case common::FLOAT: return "FLOAT"; - case common::DOUBLE: return "DOUBLE"; - case common::TEXT: return "TEXT"; - case common::VECTOR: return "VECTOR"; - case common::TIMESTAMP: return "TIMESTAMP"; - case common::DATE: return "DATE"; - case common::BLOB: return "BLOB"; - case common::STRING: return "STRING"; - case common::NULL_TYPE: return "NULL"; - default: return "UNKNOWN"; - } -} - -std::string csv_escape(const std::string& field) { - bool needs_quote = field.find_first_of(",\"\n\r") != std::string::npos; - if (!needs_quote) return field; - std::string out = "\""; - for (char c : field) { - if (c == '"') out += "\"\""; - else out += c; - } - out += "\""; - return out; -} - -std::string json_escape(const std::string& s) { - std::string out; - out.reserve(s.size() + 2); - for (unsigned char c : s) { - switch (c) { - case '"': out += "\\\""; break; - case '\\': out += "\\\\"; break; - case '\b': out += "\\b"; break; - case '\f': out += "\\f"; break; - case '\n': out += "\\n"; break; - case '\r': out += "\\r"; break; - case '\t': out += "\\t"; break; - default: - if (c < 0x20) { - char buf[8]; - std::snprintf(buf, sizeof(buf), "\\u%04x", c); - out += buf; - } else { - out += static_cast(c); - } - } - } - return out; -} - -RowWriter::RowWriter(std::ostream& out, OutputFormat fmt, - std::vector header, - std::vector types, bool no_header) - : out_(out), - fmt_(fmt), - header_(std::move(header)), - types_(std::move(types)), - no_header_(no_header) {} - -bool RowWriter::is_numeric(size_t col) const { - if (col >= types_.size()) return false; - switch (types_[col]) { - case common::BOOLEAN: - case common::INT32: - case common::INT64: - case common::FLOAT: - case common::DOUBLE: - case common::TIMESTAMP: - return true; - default: - return false; - } -} - -void RowWriter::ensure_header() { - if (header_done_) return; - header_done_ = true; - if (no_header_) return; - const char sep = (fmt_ == OutputFormat::kCsv) ? ',' : '\t'; - for (size_t i = 0; i < header_.size(); ++i) { - if (i) out_ << sep; - out_ << (fmt_ == OutputFormat::kCsv ? csv_escape(header_[i]) : header_[i]); - } - out_ << "\n"; -} - -void RowWriter::write(const std::vector& cells, - const std::vector& is_null) { - if (fmt_ == OutputFormat::kTable) { - rows_.push_back(cells); - rows_null_.push_back(is_null); - return; - } - if (fmt_ == OutputFormat::kJson) { - out_ << "{"; - for (size_t i = 0; i < header_.size(); ++i) { - if (i) out_ << ","; - out_ << "\"" << json_escape(header_[i]) << "\":"; - if (i < is_null.size() && is_null[i]) { - out_ << "null"; - } else if (is_numeric(i)) { - out_ << (i < cells.size() ? cells[i] : "null"); - } else { - out_ << "\"" << json_escape(i < cells.size() ? cells[i] : "") - << "\""; - } - } - out_ << "}\n"; - return; - } - // csv / tsv - ensure_header(); - const char sep = (fmt_ == OutputFormat::kCsv) ? ',' : '\t'; - for (size_t i = 0; i < cells.size(); ++i) { - if (i) out_ << sep; - bool null_cell = i < is_null.size() && is_null[i]; - if (null_cell) continue; // empty field - out_ << (fmt_ == OutputFormat::kCsv ? csv_escape(cells[i]) : cells[i]); - } - out_ << "\n"; -} - -void RowWriter::finish() { - if (fmt_ != OutputFormat::kTable) return; - const size_t ncols = header_.size(); - std::vector width(ncols, 0); - if (!no_header_) { - for (size_t i = 0; i < ncols; ++i) width[i] = header_[i].size(); - } - for (const auto& row : rows_) { - for (size_t i = 0; i < ncols && i < row.size(); ++i) { - width[i] = std::max(width[i], row[i].size()); - } - } - auto emit = [&](const std::vector& cells, - const std::vector& nulls) { - for (size_t i = 0; i < ncols; ++i) { - std::string cell = - (i < cells.size() && !(i < nulls.size() && nulls[i])) ? cells[i] - : ""; - out_ << cell; - if (i + 1 < ncols) { - out_ << std::string(width[i] - cell.size() + 2, ' '); - } - } - out_ << "\n"; - }; - if (!no_header_) { - std::vector no_nulls(ncols, false); - emit(header_, no_nulls); - } - for (size_t r = 0; r < rows_.size(); ++r) emit(rows_[r], rows_null_[r]); -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 5: Build and run tests to verify they pass** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=*Format*:RowWriterTest.*:*EscapeTest*:TypeNameTest.*` -Expected: all PASS. - -- [ ] **Step 6: Commit** - -```bash -git add cpp/tools/format/output_format.h cpp/tools/format/output_format.cc cpp/test/tools/output_format_test.cc -git commit -m "feat(cpp-tools): add pure output formatters (csv/tsv/json/table)" -``` - ---- - -### Task 4: `ResultSet` pump (`result_set_format`) - -**Files:** -- Create: `cpp/tools/format/result_set_format.h`, `cpp/tools/format/result_set_format.cc` - -This layer converts a live `storage::ResultSet` into formatted rows. It is -exercised end-to-end by the command tests (Tasks 5-8); it has no standalone unit -test because constructing a `ResultSet` requires a real file. Keep the typed -extraction here and out of the pure layer. - -- [ ] **Step 1: Create `cpp/tools/format/result_set_format.h`** - -```cpp -#ifndef TSFILE_CLI_RESULT_SET_FORMAT_H -#define TSFILE_CLI_RESULT_SET_FORMAT_H - -#include -#include - -#include "common/db_common.h" -#include "format/output_format.h" -#include "reader/result_set.h" - -namespace tsfile_cli { - -// Stringifies one cell (column index is 1-based, per ResultSetMetadata). -// Caller must have checked is_null() first. -std::string cell_to_string(storage::ResultSet* rs, uint32_t col_index, - common::TSDataType type); - -// Pumps every row of `rs` into `out` using `fmt`. Reads column names/types from -// the result set metadata. Returns 0 on success or a non-zero error code if the -// underlying ResultSet::next() fails. -int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, - std::ostream& out); - -} // namespace tsfile_cli -#endif // TSFILE_CLI_RESULT_SET_FORMAT_H -``` - -- [ ] **Step 2: Create `cpp/tools/format/result_set_format.cc`** - -```cpp -#include "format/result_set_format.h" - -#include -#include -#include - -#include "utils/errno_define.h" // common::E_OK - -namespace tsfile_cli { - -std::string cell_to_string(storage::ResultSet* rs, uint32_t i, - common::TSDataType type) { - std::ostringstream ss; - switch (type) { - case common::BOOLEAN: - return rs->get_value(i) ? "true" : "false"; - case common::INT32: - ss << rs->get_value(i); - return ss.str(); - case common::INT64: - case common::TIMESTAMP: - ss << rs->get_value(i); - return ss.str(); - case common::FLOAT: - ss << rs->get_value(i); - return ss.str(); - case common::DOUBLE: - ss << rs->get_value(i); - return ss.str(); - case common::DATE: { - std::tm d = rs->get_value(i); - char buf[16]; - std::snprintf(buf, sizeof(buf), "%04d-%02d-%02d", d.tm_year + 1900, - d.tm_mon + 1, d.tm_mday); - return buf; - } - case common::TEXT: - case common::STRING: - case common::BLOB: { - common::String* s = rs->get_value(i); - return s == nullptr ? std::string() : s->to_std_string(); - } - default: - return ""; - } -} - -int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, - std::ostream& out) { - auto meta = rs->get_metadata(); - const uint32_t ncol = meta->get_column_count(); - std::vector header; - std::vector types; - header.reserve(ncol); - types.reserve(ncol); - for (uint32_t i = 1; i <= ncol; ++i) { - header.push_back(meta->get_column_name(i)); - types.push_back(meta->get_column_type(i)); - } - - RowWriter writer(out, fmt, header, types, no_header); - bool has_next = false; - int code = common::E_OK; - while ((code = rs->next(has_next)) == common::E_OK && has_next) { - std::vector cells(ncol); - std::vector nulls(ncol, false); - for (uint32_t i = 1; i <= ncol; ++i) { - if (rs->is_null(i)) { - nulls[i - 1] = true; - } else { - cells[i - 1] = cell_to_string(rs, i, types[i - 1]); - } - } - writer.write(cells, nulls); - } - writer.finish(); - return code; -} - -} // namespace tsfile_cli -``` - -> **Note:** `common::E_OK` is defined in `cpp/src/utils/errno_define.h` (and is -> also pulled in transitively by `reader/result_set.h`). The explicit include -> above keeps the source self-documenting. - -- [ ] **Step 3: Build to verify it compiles** (no test yet; covered in Task 5) - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5` -Expected: build succeeds (the new `.cc` is picked up by the tools glob). - -- [ ] **Step 4: Commit** - -```bash -git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc -git commit -m "feat(cpp-tools): add ResultSet-to-rows pump layer" -``` - ---- - -### Task 5: Model detection + `cmd_ls` + reader-open dispatch - -**Files:** -- Create: `cpp/tools/commands/commands.h` -- Create: `cpp/tools/commands/cmd_ls.cc` -- Replace: `cpp/tools/cli/run_cli.cc` (full dispatch + reader open) -- Create: `cpp/test/tools/cli_test_util.h` -- Create: `cpp/test/tools/command_e2e_test.cc` - -- [ ] **Step 1: Create `cpp/tools/commands/commands.h`** - -```cpp -#ifndef TSFILE_CLI_COMMANDS_H -#define TSFILE_CLI_COMMANDS_H - -#include - -#include "cli/cli_args.h" -#include "format/output_format.h" - -namespace storage { -class TsFileReader; -} - -namespace tsfile_cli { - -// Returns true if the file should be treated as table-model. Honors -// args.model ("tree"/"table"); otherwise detects via table schemas presence. -bool is_table_model(const ParsedArgs& args, storage::TsFileReader& reader); - -// Every command writes data to `out`, diagnostics to `err`, and returns an -// exit code from exit_codes.h. -int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -int cmd_select(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); - -} // namespace tsfile_cli -#endif // TSFILE_CLI_COMMANDS_H -``` - -- [ ] **Step 2: Create `cpp/tools/commands/cmd_ls.cc`** - -```cpp -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -bool is_table_model(const ParsedArgs& args, storage::TsFileReader& reader) { - if (args.model == "tree") return false; - if (args.model == "table") return true; - return !reader.get_all_table_schemas().empty(); -} - -int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - std::vector names; - if (is_table_model(args, reader)) { - for (auto& ts : reader.get_all_table_schemas()) { - if (ts) names.push_back(ts->get_table_name()); - } - } else { - for (auto& dev : reader.get_all_device_ids()) { - if (dev) names.push_back(dev->get_device_name()); - } - } - RowWriter w(out, fmt, {"name"}, {common::STRING}, args.no_header); - for (const std::string& n : names) { - w.write({n}, {false}); - } - w.finish(); - return kExitOk; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 3: Replace `cpp/tools/cli/run_cli.cc`** with the full version - -```cpp -#include "cli/run_cli.h" - -#include -#include - -#include "cli/cli_args.h" -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "format/output_format.h" -#include "reader/tsfile_reader.h" - -#ifdef _WIN32 -#include -#define TSFILE_ISATTY _isatty -#define TSFILE_FILENO _fileno -#else -#include -#define TSFILE_ISATTY isatty -#define TSFILE_FILENO fileno -#endif - -#ifndef TSFILE_CLI_VERSION -#define TSFILE_CLI_VERSION "unknown" -#endif - -namespace tsfile_cli { - -namespace { -void print_usage(std::ostream& os) { - os << "Usage: tsfile [options] \n" - "Commands:\n" - " ls list devices (tree) or tables (table)\n" - " schema per-measurement data type/encoding/compression\n" - " stats per-series row count and time range\n" - " head first N rows (use -n)\n" - " cat all rows of a device/table\n" - " select choose columns (-m), time range (--start/--end), " - "limit/offset\n" - "Options: -f/--format csv|tsv|json|table, -d/--device, -t/--table,\n" - " -m/--measurements a,b, -n/--limit, --offset, --start, --end,\n" - " --no-header, --model tree|table, -h/--help, --version\n"; -} - -bool is_known_command(const std::string& c) { - static const std::set kCmds = {"ls", "schema", "stats", - "head", "cat", "select"}; - return kCmds.count(c) != 0; -} -} // namespace - -int run_cli(const std::vector& args, std::ostream& out, - std::ostream& err) { - ParsedArgs p = parse_args(args); - - if (p.version || (!args.empty() && args[0] == "--version")) { - out << "tsfile (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; - return kExitOk; - } - if (args.empty()) { - print_usage(err); - return kExitUsage; - } - if (p.command == "help" || p.command == "--help" || p.command == "-h" || - (p.help && p.file.empty() && !is_known_command(p.command))) { - print_usage(out); - return kExitOk; - } - if (!p.error.empty()) { - err << "Error: " << p.error << "\n"; - print_usage(err); - return kExitUsage; - } - if (!is_known_command(p.command)) { - err << "Unknown command: " << p.command << "\n"; - print_usage(err); - return kExitUsage; - } - if (p.file.empty()) { - err << "Error: missing argument\n"; - return kExitUsage; - } - - storage::libtsfile_init(); - storage::TsFileReader reader; - int open_ret = reader.open(p.file); - if (open_ret != 0) { - err << "Error: cannot open or corrupted file: " << p.file << "\n"; - return kExitFile; - } - - bool stdout_tty = TSFILE_ISATTY(TSFILE_FILENO(stdout)) != 0; - OutputFormat fmt = resolve_format(p.format, stdout_tty); - - int code; - if (p.command == "ls") { - code = cmd_ls(p, reader, fmt, out, err); - } else { - // Filled in by Tasks 6-8 (schema/stats/head/cat/select). - err << "Error: command not yet implemented: " << p.command << "\n"; - code = kExitUsage; - } - - reader.close(); - return code; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 4: Create `cpp/test/tools/cli_test_util.h`** (table-model fixture writer) - -```cpp -#ifndef TSFILE_CLI_TEST_UTIL_H -#define TSFILE_CLI_TEST_UTIL_H - -#include - -#include -#include - -namespace tsfile_cli_test { - -// Writes a small table-model fixture and returns its path. Table "table1": -// TAG columns id1,id2 (STRING) + FIELD column s1 (INT64); 5 rows, ts=0..4, -// s1 = row*10. -inline std::string write_table_fixture( - const std::string& path = "tsfile_cli_fixture.tsfile") { - storage::libtsfile_init(); - std::string table_name = "table1"; - - storage::WriteFile file; - int flags = O_WRONLY | O_CREAT | O_TRUNC; -#ifdef _WIN32 - flags |= O_BINARY; -#endif - file.create(path, flags, 0666); - - auto* schema = new storage::TableSchema( - table_name, - { - common::ColumnSchema("id1", common::STRING, common::UNCOMPRESSED, - common::PLAIN, common::ColumnCategory::TAG), - common::ColumnSchema("id2", common::STRING, common::UNCOMPRESSED, - common::PLAIN, common::ColumnCategory::TAG), - common::ColumnSchema("s1", common::INT64, common::UNCOMPRESSED, - common::PLAIN, common::ColumnCategory::FIELD), - }); - - auto* writer = new storage::TsFileTableWriter(&file, schema); - storage::Tablet tablet( - table_name, {"id1", "id2", "s1"}, - {common::STRING, common::STRING, common::INT64}, - {common::ColumnCategory::TAG, common::ColumnCategory::TAG, - common::ColumnCategory::FIELD}, - 10); - for (int row = 0; row < 5; ++row) { - tablet.add_timestamp(row, static_cast(row)); - tablet.add_value(row, "id1", "id1_field_1"); - tablet.add_value(row, "id2", "id2_field_2"); - tablet.add_value(row, "s1", static_cast(row * 10)); - } - writer->write_table(tablet); - writer->flush(); - writer->close(); - - delete writer; - delete schema; - return path; -} - -} // namespace tsfile_cli_test -#endif // TSFILE_CLI_TEST_UTIL_H -``` - -> **If the fixture fails to compile** (a transitively-included type is missing), -> add the explicit header — `common/tablet.h` for `Tablet`, `file/write_file.h` -> for `WriteFile`, `common/schema.h` for `TableSchema`/`ColumnSchema`. The -> `examples/cpp_examples/demo_write.cpp` compiles with just the table-writer -> include, so start minimal. - -- [ ] **Step 5: Create `cpp/test/tools/command_e2e_test.cc`** - -```cpp -#include - -#include -#include -#include - -#include "cli/run_cli.h" -#include "cli_test_util.h" - -namespace { -struct Fixture { - std::string path = tsfile_cli_test::write_table_fixture(); - ~Fixture() { std::remove(path.c_str()); } -}; -} // namespace - -TEST(CliE2E, LsListsTableNameTsv) { - Fixture f; - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"ls", "-f", "tsv", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "name\ntable1\n"); - EXPECT_TRUE(err.str().empty()); -} - -TEST(CliE2E, LsNoHeaderJustName) { - Fixture f; - std::ostringstream out, err; - int code = - tsfile_cli::run_cli({"ls", "-f", "tsv", "--no-header", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "table1\n"); -} - -TEST(CliE2E, OpenMissingFileReturnsFileError) { - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"ls", "definitely_missing.tsfile"}, out, err); - EXPECT_EQ(code, 2); - EXPECT_FALSE(err.str().empty()); -} -``` - -- [ ] **Step 6: Build and run tests** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -8 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*` -Expected: 3 tests PASS. - -Run: `cd cpp && ./build/Debug/bin/tsfile ls -f tsv examples/test_cpp.tsfile` -Expected: prints `name` then `table1` (the bundled example is table-model). - -- [ ] **Step 7: Commit** - -```bash -git add cpp/tools/commands/commands.h cpp/tools/commands/cmd_ls.cc cpp/tools/cli/run_cli.cc cpp/test/tools/cli_test_util.h cpp/test/tools/command_e2e_test.cc -git commit -m "feat(cpp-tools): implement model detection and 'ls' command" -``` - ---- - -### Task 6: `cmd_schema` (+ encoding/compression name helpers) - -**Files:** -- Modify: `cpp/tools/format/output_format.h` / `.cc` (add `tsencoding_name`, `compression_name`) -- Create: `cpp/tools/commands/cmd_schema.cc` -- Modify: `cpp/tools/cli/run_cli.cc` (dispatch `schema`) -- Modify: `cpp/test/tools/output_format_test.cc`, `cpp/test/tools/command_e2e_test.cc` - -`schema` emits a uniform 5-column shape `target, measurement, datatype, encoding, -compression`. Name + type come from `get_timeseries_metadata()` (works for both -models). Encoding/compression are enriched from `get_timeseries_schema()` for -tree-model files; for table-model files those two columns are blank (no public -getter on `TableSchema`). - -- [ ] **Step 1: Add failing unit tests** — append to `cpp/test/tools/output_format_test.cc` - -```cpp -TEST(EncodingNameTest, KnownEncodings) { - EXPECT_STREQ(tsfile_cli::tsencoding_name(common::PLAIN), "PLAIN"); - EXPECT_STREQ(tsfile_cli::tsencoding_name(common::TS_2DIFF), "TS_2DIFF"); - EXPECT_STREQ(tsfile_cli::tsencoding_name(common::SPRINTZ), "SPRINTZ"); -} - -TEST(CompressionNameTest, KnownCompressors) { - EXPECT_STREQ(tsfile_cli::compression_name(common::UNCOMPRESSED), - "UNCOMPRESSED"); - EXPECT_STREQ(tsfile_cli::compression_name(common::SNAPPY), "SNAPPY"); - EXPECT_STREQ(tsfile_cli::compression_name(common::LZ4), "LZ4"); -} -``` - -- [ ] **Step 2: Add declarations** to `cpp/tools/format/output_format.h` (after `tsdatatype_name`) - -```cpp -const char* tsencoding_name(common::TSEncoding e); -const char* compression_name(common::CompressionType c); -``` - -- [ ] **Step 3: Add definitions** to `cpp/tools/format/output_format.cc` (after `tsdatatype_name`) - -```cpp -const char* tsencoding_name(common::TSEncoding e) { - switch (e) { - case common::PLAIN: return "PLAIN"; - case common::DICTIONARY: return "DICTIONARY"; - case common::RLE: return "RLE"; - case common::DIFF: return "DIFF"; - case common::TS_2DIFF: return "TS_2DIFF"; - case common::BITMAP: return "BITMAP"; - case common::GORILLA_V1: return "GORILLA_V1"; - case common::REGULAR: return "REGULAR"; - case common::GORILLA: return "GORILLA"; - case common::ZIGZAG: return "ZIGZAG"; - case common::FREQ: return "FREQ"; - case common::SPRINTZ: return "SPRINTZ"; - default: return "UNKNOWN"; - } -} - -const char* compression_name(common::CompressionType c) { - switch (c) { - case common::UNCOMPRESSED: return "UNCOMPRESSED"; - case common::SNAPPY: return "SNAPPY"; - case common::GZIP: return "GZIP"; - case common::LZO: return "LZO"; - case common::SDT: return "SDT"; - case common::PAA: return "PAA"; - case common::PLA: return "PLA"; - case common::LZ4: return "LZ4"; - default: return "UNKNOWN"; - } -} -``` - -- [ ] **Step 4: Create `cpp/tools/commands/cmd_schema.cc`** - -```cpp -#include -#include -#include - -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "common/schema.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - const bool table = is_table_model(args, reader); - RowWriter w(out, fmt, - {"target", "measurement", "datatype", "encoding", "compression"}, - {common::STRING, common::STRING, common::STRING, common::STRING, - common::STRING}, - args.no_header); - - storage::DeviceTimeseriesMetadataMap meta = reader.get_timeseries_metadata(); - for (auto& kv : meta) { - std::string target = kv.first ? kv.first->get_device_name() : ""; - - // Tree-model enrichment: measurement -> (encoding, compression). - std::map> enc_comp; - if (!table && kv.first) { - std::vector ms; - if (reader.get_timeseries_schema(kv.first, ms) == 0) { - for (auto& m : ms) { - enc_comp[m.measurement_name_] = {tsencoding_name(m.encoding_), - compression_name(m.compression_type_)}; - } - } - } - - for (auto& ts : kv.second) { - if (!ts) continue; - std::string m = ts->get_measurement_name().to_std_string(); - std::string dt = tsdatatype_name(ts->get_data_type()); - std::string enc, comp; - auto it = enc_comp.find(m); - if (it != enc_comp.end()) { - enc = it->second.first; - comp = it->second.second; - } - w.write({target, m, dt, enc, comp}, - {false, false, false, enc.empty(), comp.empty()}); - } - } - w.finish(); - return kExitOk; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 5: Wire dispatch** — in `cpp/tools/cli/run_cli.cc`, replace the `ls`/else block: - -```cpp - int code; - if (p.command == "ls") { - code = cmd_ls(p, reader, fmt, out, err); - } else { -``` - -with: - -```cpp - int code; - if (p.command == "ls") { - code = cmd_ls(p, reader, fmt, out, err); - } else if (p.command == "schema") { - code = cmd_schema(p, reader, fmt, out, err); - } else { -``` - -- [ ] **Step 6: Add e2e test** — append to `cpp/test/tools/command_e2e_test.cc` - -```cpp -TEST(CliE2E, SchemaShowsFieldColumnAndType) { - Fixture f; - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"schema", "-f", "tsv", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_NE(out.str().find( - "target\tmeasurement\tdatatype\tencoding\tcompression"), - std::string::npos); - EXPECT_NE(out.str().find("s1"), std::string::npos); - EXPECT_NE(out.str().find("INT64"), std::string::npos); -} -``` - -> **If `SchemaShowsFieldColumnAndType` shows no rows** (i.e. -> `get_timeseries_metadata()` returns empty for a table-model file in this build), -> fall back to deriving name+type from a zero-row probe: -> `reader.queryByRow(table_name, all_measurement_names, /*offset=*/0, -> /*limit=*/0, rs)` and read `rs->get_metadata()`. Keep the 5-column output -> shape; leave encoding/compression blank. - -- [ ] **Step 7: Build and run tests** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:EncodingNameTest.*:CompressionNameTest.*` -Expected: all PASS. - -- [ ] **Step 8: Commit** - -```bash -git add cpp/tools/format/output_format.h cpp/tools/format/output_format.cc cpp/tools/commands/cmd_schema.cc cpp/tools/cli/run_cli.cc cpp/test/tools/output_format_test.cc cpp/test/tools/command_e2e_test.cc -git commit -m "feat(cpp-tools): implement 'schema' command" -``` - ---- - -### Task 7: `cmd_stats` - -**Files:** -- Create: `cpp/tools/commands/cmd_stats.cc` -- Modify: `cpp/tools/cli/run_cli.cc` (dispatch `stats`) -- Modify: `cpp/test/tools/command_e2e_test.cc` - -`stats` emits `target, measurement, count, start_time, end_time` from each -series' `Statistic` (via `get_timeseries_metadata()`). - -- [ ] **Step 1: Create `cpp/tools/commands/cmd_stats.cc`** - -```cpp -#include -#include - -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "common/statistic.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - RowWriter w(out, fmt, - {"target", "measurement", "count", "start_time", "end_time"}, - {common::STRING, common::STRING, common::INT64, common::INT64, - common::INT64}, - args.no_header); - - storage::DeviceTimeseriesMetadataMap meta = reader.get_timeseries_metadata(); - for (auto& kv : meta) { - std::string target = kv.first ? kv.first->get_device_name() : ""; - for (auto& ts : kv.second) { - if (!ts) continue; - std::string m = ts->get_measurement_name().to_std_string(); - storage::Statistic* st = ts->get_statistic(); - if (st != nullptr) { - w.write({target, m, std::to_string(st->get_count()), - std::to_string(st->start_time_), - std::to_string(st->end_time_)}, - {false, false, false, false, false}); - } else { - w.write({target, m, "", "", ""}, - {false, false, true, true, true}); - } - } - } - w.finish(); - return kExitOk; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 2: Wire dispatch** — in `cpp/tools/cli/run_cli.cc`, add a branch after the `schema` branch: - -```cpp - } else if (p.command == "stats") { - code = cmd_stats(p, reader, fmt, out, err); -``` - -(Place it between the `schema` branch and the final `else`.) - -- [ ] **Step 3: Add e2e test** — append to `cpp/test/tools/command_e2e_test.cc` - -```cpp -TEST(CliE2E, StatsReportsCountAndTimeRange) { - Fixture f; - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"stats", "-f", "tsv", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_NE(out.str().find( - "target\tmeasurement\tcount\tstart_time\tend_time"), - std::string::npos); - // s1 has 5 rows with timestamps 0..4. - EXPECT_NE(out.str().find("s1\t5\t0\t4"), std::string::npos); -} -``` - -- [ ] **Step 4: Build and run tests** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*` -Expected: all PASS (including the new `StatsReportsCountAndTimeRange`). - -> **If `s1\t5\t0\t4` is not found**, print the raw output -> (`./build/Debug/bin/tsfile stats -f tsv examples/test_cpp.tsfile`) and adjust -> the substring to the actual whitespace/columns — the count(5) and range(0..4) -> values themselves are guaranteed by the fixture. - -- [ ] **Step 5: Commit** - -```bash -git add cpp/tools/commands/cmd_stats.cc cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc -git commit -m "feat(cpp-tools): implement 'stats' command" -``` - ---- - -### Task 8: `cmd_head` / `cmd_cat` / `cmd_select` (row data) - -**Files:** -- Modify: `cpp/tools/format/result_set_format.h` / `.cc` (add offset/limit) -- Modify: `cpp/tools/commands/commands.h` (declare `run_row_query`) -- Create: `cpp/tools/commands/row_query.cc` -- Create: `cpp/tools/commands/cmd_head.cc`, `cmd_cat.cc`, `cmd_select.cc` -- Modify: `cpp/tools/cli/run_cli.cc` (dispatch head/cat/select) -- Modify: `cpp/test/tools/command_e2e_test.cc` - -All three row commands share `run_row_query`, which opens a `ResultSet` (time -range honored via `--start/--end`) and pumps it with client-side offset/limit. -`head` defaults `limit` to 10; `cat`/`select` use the parsed `--limit` -(default unlimited). - -- [ ] **Step 1: Add offset/limit to `write_result_set`** — change the declaration in `cpp/tools/format/result_set_format.h`: - -```cpp -int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, - std::ostream& out, long long offset = 0, - long long limit = -1); -``` - -and update the definition's loop in `cpp/tools/format/result_set_format.cc` (replace the existing `while` loop and the `RowWriter writer(...)` line onward): - -```cpp - RowWriter writer(out, fmt, header, types, no_header); - bool has_next = false; - int code = common::E_OK; - long long skipped = 0, emitted = 0; - while ((code = rs->next(has_next)) == common::E_OK && has_next) { - if (skipped < offset) { - ++skipped; - continue; - } - if (limit >= 0 && emitted >= limit) break; - std::vector cells(ncol); - std::vector nulls(ncol, false); - for (uint32_t i = 1; i <= ncol; ++i) { - if (rs->is_null(i)) { - nulls[i - 1] = true; - } else { - cells[i - 1] = cell_to_string(rs, i, types[i - 1]); - } - } - writer.write(cells, nulls); - ++emitted; - } - writer.finish(); - return code; -``` - -- [ ] **Step 2: Declare `run_row_query`** in `cpp/tools/commands/commands.h` (before the `cmd_*` declarations): - -```cpp -// Shared by head/cat/select: opens a row ResultSet (honoring --start/--end and -// --device/--table/--measurements) and writes it with client-side offset/limit. -int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err, - long long offset, long long limit); -``` - -- [ ] **Step 3: Create `cpp/tools/commands/row_query.cc`** - -```cpp -#include -#include -#include -#include - -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "common/device_id.h" -#include "common/schema.h" -#include "format/result_set_format.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err, - long long offset, long long limit) { - const int64_t start = - args.has_start ? static_cast(args.start) - : std::numeric_limits::min(); - const int64_t end = args.has_end ? static_cast(args.end) - : std::numeric_limits::max(); - - storage::ResultSet* rs = nullptr; - int qret = 0; - - if (is_table_model(args, reader)) { - std::string table_name = args.table; - if (table_name.empty()) { - auto schemas = reader.get_all_table_schemas(); - if (schemas.empty() || !schemas[0]) { - err << "Error: no table found in file\n"; - return kExitRuntime; - } - table_name = schemas[0]->get_table_name(); - } - std::vector cols = args.measurements; - if (cols.empty()) { - auto ts = reader.get_table_schema(table_name); - if (ts) cols = ts->get_measurement_names(); - } - qret = reader.query(table_name, cols, start, end, rs); - } else { - std::vector devices; - if (!args.device.empty()) { - devices.push_back(args.device); - } else { - for (auto& d : reader.get_all_device_ids()) { - if (d) devices.push_back(d->get_device_name()); - } - } - std::vector paths; - for (const std::string& dev : devices) { - std::vector ms = args.measurements; - if (ms.empty()) { - auto did = std::make_shared(dev); - std::vector sch; - if (reader.get_timeseries_schema(did, sch) == 0) { - for (auto& m : sch) ms.push_back(m.measurement_name_); - } - } - for (const std::string& m : ms) paths.push_back(dev + "." + m); - } - if (paths.empty()) { - err << "Error: no time series found\n"; - return kExitRuntime; - } - qret = reader.query(paths, start, end, rs); - } - - if (qret != 0 || rs == nullptr) { - err << "Error: query failed (code " << qret << ")\n"; - if (rs != nullptr) reader.destroy_query_data_set(rs); - return kExitRuntime; - } - - int wret = write_result_set(rs, fmt, args.no_header, out, offset, limit); - reader.destroy_query_data_set(rs); - return wret == 0 ? kExitOk : kExitRuntime; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 4: Create `cpp/tools/commands/cmd_head.cc`** - -```cpp -#include "commands/commands.h" - -namespace tsfile_cli { -int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err) { - long long limit = args.limit < 0 ? 10 : args.limit; - return run_row_query(args, reader, fmt, out, err, args.offset, limit); -} -} // namespace tsfile_cli -``` - -- [ ] **Step 5: Create `cpp/tools/commands/cmd_cat.cc`** - -```cpp -#include "commands/commands.h" - -namespace tsfile_cli { -int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err) { - return run_row_query(args, reader, fmt, out, err, args.offset, args.limit); -} -} // namespace tsfile_cli -``` - -- [ ] **Step 6: Create `cpp/tools/commands/cmd_select.cc`** - -```cpp -#include "commands/commands.h" - -namespace tsfile_cli { -int cmd_select(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err) { - return run_row_query(args, reader, fmt, out, err, args.offset, args.limit); -} -} // namespace tsfile_cli -``` - -- [ ] **Step 7: Wire dispatch** — in `cpp/tools/cli/run_cli.cc`, add three branches before the final `else`: - -```cpp - } else if (p.command == "head") { - code = cmd_head(p, reader, fmt, out, err); - } else if (p.command == "cat") { - code = cmd_cat(p, reader, fmt, out, err); - } else if (p.command == "select") { - code = cmd_select(p, reader, fmt, out, err); -``` - -- [ ] **Step 8: Add e2e tests** — append to `cpp/test/tools/command_e2e_test.cc` - -```cpp -namespace { -size_t count_lines(const std::string& s) { - size_t n = 0; - for (char c : s) if (c == '\n') ++n; - return n; -} -} // namespace - -TEST(CliE2E, HeadProjectsAndLimits) { - Fixture f; - std::ostringstream out, err; - int code = - tsfile_cli::run_cli({"head", "-m", "s1", "-n", "2", "-f", "tsv", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "time\ts1\n0\t0\n1\t10\n"); -} - -TEST(CliE2E, CatReturnsAllRows) { - Fixture f; - std::ostringstream out, err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "-f", "tsv", f.path}, out, - err); - EXPECT_EQ(code, 0); - // header + 5 data rows - EXPECT_EQ(count_lines(out.str()), 6u); - EXPECT_NE(out.str().find("time\ts1\n"), std::string::npos); -} - -TEST(CliE2E, SelectWithTimeRange) { - Fixture f; - std::ostringstream out, err; - int code = - tsfile_cli::run_cli({"select", "-m", "s1", "--start", "2", "--end", "3", - "-f", "tsv", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); -} - -TEST(CliE2E, SelectJsonIsNdjson) { - Fixture f; - std::ostringstream out, err; - int code = - tsfile_cli::run_cli({"select", "-m", "s1", "--start", "0", "--end", "0", - "-f", "json", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); -} -``` - -- [ ] **Step 9: Build and run tests** - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -8 && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*` -Expected: all CliE2E tests PASS. - -> **If a row-order or column-order assertion fails**, print the actual output -> (`./build/Debug/bin/tsfile head -m s1 -n 2 -f tsv examples/test_cpp.tsfile`) -> and align the expected string. The values (ts 0..4, s1 = ts*10) are fixed by -> the fixture; only column/row ordering could differ. - -- [ ] **Step 10: Commit** - -```bash -git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc cpp/tools/commands/commands.h cpp/tools/commands/row_query.cc cpp/tools/commands/cmd_head.cc cpp/tools/commands/cmd_cat.cc cpp/tools/commands/cmd_select.cc cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc -git commit -m "feat(cpp-tools): implement 'head', 'cat', and 'select' row commands" -``` - ---- - -### Task 9: stderr fix, `install()`, full-suite run, manual verification - -**Files:** -- Modify: `cpp/src/file/read_file.cc` (route open-error prints to stderr) -- Modify: `cpp/tools/CMakeLists.txt` (add `install`) - -- [ ] **Step 1: Route open errors to stderr** — in `cpp/src/file/read_file.cc`, change the two `std::cout` lines inside the `if (fd_ < 0)` block (around lines 52-55) to `std::cerr`: - -```cpp - fd_ = ::open(file_path_.c_str(), O_RDONLY); - if (fd_ < 0) { - std::cerr << "open file " << file_path << " error :" << fd_ - << std::endl; - std::cerr << "open error" << errno << " " << strerror(errno) - << std::endl; - return E_FILE_OPEN_ERR; - } -``` - -Rationale: a CLI that emits diagnostics on stdout would corrupt `tsfile cat f | jq`. Errors belong on stderr. - -- [ ] **Step 2: Run the FULL test suite** to confirm the library change causes no regression - -Run: `cd cpp && bash build.sh -t=Debug 2>&1 | tail -5 && ./build/Debug/lib/TsFile_Test 2>&1 | tail -15` -Expected: all suites PASS (existing reader/file tests + the new `RunCliTest`, `ParseArgsTest`, `RowWriterTest`, `*NameTest`, `*EscapeTest`, `CliE2E`). - -- [ ] **Step 3: Add `install()`** to the end of `cpp/tools/CMakeLists.txt` - -```cmake -install(TARGETS tsfile_cli RUNTIME DESTINATION bin) -``` - -- [ ] **Step 4: Manual verification against the bundled example** (table-model file) - -Run each and confirm behavior: - -```bash -cd cpp -BIN=./build/Debug/bin/tsfile -F=examples/test_cpp.tsfile -$BIN ls -f tsv $F # -> name / table1 -$BIN schema -f tsv $F # -> header + rows incl. s1 INT64 -$BIN stats -f tsv $F # -> count/start/end per series -$BIN head -n 3 -f tsv $F # -> header + 3 rows -$BIN cat -f csv $F | head -n 3 # -> CSV, pipe-clean (no log noise on stdout) -$BIN select -m s1 -f json $F # -> NDJSON: one {"time":..,"s1":..} per line -$BIN cat $F # -> aligned table form (stdout is a TTY) -echo "exit on missing:"; $BIN ls nope.tsfile; echo "rc=$?" # rc=2, error on stderr -``` - -Expected: data on stdout, diagnostics on stderr, exit codes per the table; the TTY run shows aligned columns while the piped run shows TSV/CSV. - -- [ ] **Step 5: Tree-model manual check (only if a tree-model `.tsfile` is available)** - -The automated e2e fixture is table-model. If you have a tree-model file (e.g. produced by `TsFileWriter::write_tree`), verify the tree branch: - -```bash -$BIN ls -f tsv # -> device names, one per line -$BIN schema -f tsv # -> datatype + encoding + compression filled -$BIN cat -d -m -f tsv -``` - -If unavailable, note it in the PR description as untested-by-CI and rely on the shared `run_row_query`/formatter coverage from the table-model tests. - -- [ ] **Step 6: Format check** - -Run: `cd /Users/zhanghongyin/iotdb/tsfile && ./mvnw spotless:apply -P with-cpp 2>&1 | tail -5 && ./mvnw spotless:check -P with-cpp 2>&1 | tail -5` -Expected: clang-format applies cleanly; check passes. (Or run `clang-format -i` over `cpp/tools/**` and `cpp/test/tools/**` if invoking Maven is impractical locally.) - -- [ ] **Step 7: Commit** - -```bash -git add cpp/src/file/read_file.cc cpp/tools/CMakeLists.txt -git commit -m "feat(cpp-tools): install tsfile binary; route open errors to stderr" -``` - ---- - -## Plan self-review (spec coverage) - -| Spec requirement | Covered by | -|---|---| -| Single multi-call `tsfile` binary, git-style dispatch | Task 1 (CMake `OUTPUT_NAME tsfile`, run_cli dispatch) | -| `ls` / `schema` / `stats` / `head` / `cat` / `select` | Tasks 5 / 6 / 7 / 8 | -| Hand-rolled arg parsing, no new deps | Task 2 | -| Data→stdout, diagnostics→stderr | Injected `out`/`err` everywhere; Task 9 lib fix | -| Exit codes 0/1/2/3 | `exit_codes.h` (Task 1); mapped in run_cli (Tasks 1, 5, 8) | -| TTY-adaptive default; `--format csv/tsv/json/table` | `resolve_format` (Task 3); run_cli isatty (Task 5) | -| CSV RFC-4180 quoting; NDJSON; null handling | `csv_escape`/`RowWriter`/`json_escape` (Task 3) | -| tree/table auto-detect + `--model` override | `is_table_model` (Task 5) | -| schema blanks encoding/compression for table model | `cmd_schema` enrichment branch (Task 6) | -| Timestamps as raw epoch | `cell_to_string` INT64 path (Task 4) | -| `BUILD_TOOLS` option; `install()` | Task 1 (option), Task 9 (install) | -| Tests: cli_args, formatters, model detect, e2e | Tasks 2, 3, 5-8 | -| License headers on new files | Conventions section + every Create step | - -**Placeholder scan:** no `TBD`/`TODO`/"implement later" remain; the "filled in by later tasks" branch in run_cli is replaced concretely in Tasks 6-8. **Type consistency:** `ParsedArgs`, `OutputFormat`, `RowWriter` ctor (`out, fmt, header, types, no_header`), `write_result_set(rs, fmt, no_header, out, offset, limit)`, and the `cmd_*`/`run_row_query` signatures are used identically across all tasks. - -**Known residual risks (validated during execution, not blockers):** -1. `get_timeseries_metadata()` yielding rows for table-model files — Task 6/7 notes give a fallback. -2. Exact column/row ordering in row-command output — Tasks 7/8 notes give the adjust-the-string fallback; values are fixture-guaranteed. -3. Fixture compile relying on transitive includes — Task 5 note lists the explicit headers to add. diff --git a/docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md b/docs/superpowers/plans/2026-06-02-tsfile-cli.md similarity index 54% rename from docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md rename to docs/superpowers/plans/2026-06-02-tsfile-cli.md index 27877ca20..8f70a2307 100644 --- a/docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md +++ b/docs/superpowers/plans/2026-06-02-tsfile-cli.md @@ -17,246 +17,193 @@ under the License. --> -# TsFile CLI Redesign Implementation Plan +# TsFile CLI(`tsfile`)Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. -**Goal:** 将当前 C++ `tsfile` v1 CLI 从 `ls/schema/stats/head/cat/select` 调整为 `ls/schema/meta/stats/head/cat/count/sample`,并让投影、时间范围、limit/offset 作为行输出命令的共享参数工作。 +**Goal:** 从当前「半迁移」工作树出发,把 C++ `tsfile` CLI 收尾为完整的只读 8 动词工具 +(`ls / schema / meta / stats / head / cat / count / sample`),并清掉残留的 `select` +死代码,使整套实现可构建、测试通过、可提交。 -**Architecture:** 保留现有 `cpp/tools/` 分层:`cli/` 负责参数解析与分发,`commands/` 负责读 metadata 或 row query,`format/` 负责 `RowWriter` 和 `ResultSet` 输出。新增 metadata/stat helper 复用 `Statistic` 格式化逻辑,新增 sampled result-set writer 复用现有 cell extraction,避免在命令层复制行输出代码。 +**Architecture:** 保留现有 `cpp/tools/` 分层:`cli/` 负责参数解析与分发,`commands/` +负责读 metadata 或 row query,`format/` 负责 `RowWriter` 与 `ResultSet` 输出。新增 +`stat_table.*` 复用 `Statistic` 格式化逻辑给 `stats`/`count`/`meta` 共用;新增 +sampled result-set writer 复用现有 cell extraction。不修改存储引擎。 -**Tech Stack:** C++11/C++14 兼容代码,CMake `BUILD_TOOLS`,Google Test,现有 `storage::TsFileReader`、`storage::Statistic`、`RowWriter`、`write_result_set`。 +**Tech Stack:** C++11/C++14 兼容代码(测试目标 `-std=c++14`),CMake `BUILD_TOOLS`, +Google Test 1.12.1,现有 `storage::TsFileReader`、`storage::Statistic`、`RowWriter`、 +`write_result_set`。 + +**Spec:** `docs/superpowers/specs/2026-06-02-tsfile-cli-design.md` --- ## 执行前提 -- 工作目录:`/Users/zhanghongyin/iotdb/tsfile` -- 执行实现前先确认 `git status --short`,不要 stage `.codegraph/` 或与本计划无关的改动。 +- 工作目录:`/Users/zhanghongyin/iotdb/tsfile`。 +- 执行前 `git status --short`,确认未把 `.codegraph/` 或无关改动纳入暂存。 +- 每个新建 `.h`/`.cc` 文件都以 Apache 2.0 块注释头(`/* ... */`)开头——从任一现有 + `cpp/tools/**` 文件原样复制。下文代码块为简洁省略了该头,**新建文件时务必前置**。 +- 所有 CLI 代码在 `namespace tsfile_cli` 内。 - C++ 验证命令从 `cpp/` 目录运行: ```bash bash build.sh -t=Debug -./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:StatTableTest.*:ResultSetSampleTest.* +./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* ``` -## 文件结构 +## 起点:当前工作树状态(2026-06-02) + +- **已提交**(commit `a392a56f`,仅四个文件):`cpp/tools/cli/cli_args.h`、 + `cli_args.cc`、`run_cli.cc`、`cpp/test/tools/cli_args_test.cc`。命令面已是 8 动词, + 含 `--seed` 解析、`validate_command_flags`;`select` 不在白名单;`meta`/`count`/ + `sample` 被 `is_unimplemented_command` 拦截返回 “command not implemented yet”。 +- **未提交(untracked)**:`cpp/tools/CMakeLists.txt`、`tools_main.cc`、 + `cli/exit_codes.h`、`cli/run_cli.h`、`commands/`(`commands.h`、`row_query.cc`、 + `cmd_ls/cmd_schema/cmd_stats/cmd_head/cmd_cat/cmd_select.cc`)、`format/` + (`output_format.*`、`result_set_format.*`)、`cpp/test/tools/cli_test_util.h`、 + `command_e2e_test.cc`、`output_format_test.cc`。 +- **已修改(tracked)**:`cpp/CMakeLists.txt`(`BUILD_TOOLS`)、 + `cpp/src/file/read_file.cc`(open 错误改 stderr)、`cpp/test/CMakeLists.txt` + (glob tools 测试、链接 `tsfile_cli_obj`)。 +- **遗留不一致(Task 1 修复)**:`cmd_select.cc` 与其声明是死代码; + `command_e2e_test.cc` 仍以 `select` 命令测试,与已移除 `select` 的命令面冲突。 -现有文件继续保留职责: +## 文件结构 -- `cpp/tools/cli/cli_args.h` / `cpp/tools/cli/cli_args.cc`:解析命令、flag、数值参数。 -- `cpp/tools/cli/run_cli.cc`:顶层 usage、命令白名单、命令/flag 组合校验、reader open、分发。 -- `cpp/tools/commands/commands.h`:命令函数和共享 helper 声明。 -- `cpp/tools/commands/row_query.cc`:`head`、`cat`、`sample` 共用的 query 构造。 -- `cpp/tools/format/output_format.*`:`RowWriter` 和标量格式转换。 -- `cpp/tools/format/result_set_format.*`:从 `ResultSet` 抽取行并写出。 -- `cpp/test/tools/*_test.cc`:CLI 单元测试和 in-process E2E 测试。 +保留职责:`cli/cli_args.*`、`cli/run_cli.cc`、`commands/commands.h`、 +`commands/row_query.cc`、`format/output_format.*`、`format/result_set_format.*`。 新增文件: -- `cpp/tools/commands/stat_table.h`:定义 `SeriesStatRow`、`FileSummary`,声明 metadata/stat 收集与统计值格式化 helper。 -- `cpp/tools/commands/stat_table.cc`:实现 `collect_series_stats`、`collect_file_summary`、`statistic_value_cells`,供 `stats`、`count`、`meta` 共用。 -- `cpp/tools/commands/cmd_meta.cc`:实现 `tsfile meta`。 -- `cpp/tools/commands/cmd_count.cc`:实现 `tsfile count`。 -- `cpp/tools/commands/cmd_sample.cc`:实现 `tsfile sample`。 -- `cpp/test/tools/stat_table_test.cc`:直接测试 `Statistic` 值格式化和汇总 helper 的稳定行为。 -- `cpp/test/tools/result_set_sample_test.cc`:测试抽样 writer 的确定性行为。 +- `cpp/tools/commands/stat_table.h` / `.cc`:`SeriesStatRow`、`FileSummary`、 + `StatisticCells`,以及 `collect_series_stats`、`collect_file_summary`、 + `statistic_value_cells`,供 `stats`/`count`/`meta` 共用。 +- `cpp/tools/commands/cmd_meta.cc`、`cmd_count.cc`、`cmd_sample.cc`。 +- `cpp/test/tools/stat_table_test.cc`:统计值格式化 helper 单元测试。 删除文件: -- `cpp/tools/commands/cmd_select.cc`:`select` 能力并入 `cat/head/sample` 的共享参数。 +- `cpp/tools/commands/cmd_select.cc`:`select` 能力已并入 `cat/head/sample` 共享参数。 --- -### Task 1: 命令面、参数解析和 flag 组合校验 +### Task 1: 调和基线 —— 移除 `select` 死代码、构建变绿、提交既有实现 **Files:** -- Modify: `cpp/tools/cli/cli_args.h` -- Modify: `cpp/tools/cli/cli_args.cc` -- Modify: `cpp/tools/cli/run_cli.cc` +- Delete: `cpp/tools/commands/cmd_select.cc` +- Modify: `cpp/tools/commands/commands.h` +- Modify: `cpp/test/tools/command_e2e_test.cc` - Modify: `cpp/test/tools/cli_args_test.cc` -- [ ] **Step 1: 写失败测试,覆盖 `--seed`、新命令和删除 `select`** - -在 `cpp/test/tools/cli_args_test.cc` 末尾追加: - -```cpp -TEST(ParseArgsTest, SeedFlagParsed) { - auto p = tsfile_cli::parse_args( - {"sample", "-m", "s1", "-n", "3", "--seed", "42", "data.tsfile"}); - EXPECT_TRUE(p.error.empty()); - EXPECT_EQ(p.command, "sample"); - EXPECT_EQ(p.limit, 3); - EXPECT_TRUE(p.has_seed); - EXPECT_EQ(p.seed, 42); -} +本任务不引入新功能,只让 untracked 实现与已提交的 8 动词命令面一致,并把整套实现提交为 +工作基线。 -TEST(ParseArgsTest, BadSeedValueIsError) { - auto p = tsfile_cli::parse_args( - {"sample", "--seed", "not_a_number", "data.tsfile"}); - EXPECT_FALSE(p.error.empty()); - EXPECT_NE(p.error.find("Invalid --seed"), std::string::npos); -} - -TEST(RunCliTest, SelectIsNoLongerKnownCommand) { - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"select", "x.tsfile"}, out, err); - EXPECT_EQ(code, 1); - EXPECT_NE(err.str().find("Unknown command"), std::string::npos); -} - -TEST(RunCliTest, SeedOnCatIsUsageError) { - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli( - {"cat", "--seed", "7", "x.tsfile"}, out, err); - EXPECT_EQ(code, 1); - EXPECT_NE(err.str().find("--seed is only valid for sample"), - std::string::npos); -} - -TEST(RunCliTest, OffsetOnSampleIsUsageError) { - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli( - {"sample", "--offset", "2", "x.tsfile"}, out, err); - EXPECT_EQ(code, 1); - EXPECT_NE(err.str().find("--offset is not valid for sample"), - std::string::npos); -} -``` - -- [ ] **Step 2: 运行测试确认失败** - -Run: +- [ ] **Step 1: 删除 `cmd_select.cc`** ```bash -cd cpp && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.SeedFlagParsed:ParseArgsTest.BadSeedValueIsError:RunCliTest.SelectIsNoLongerKnownCommand:RunCliTest.SeedOnCatIsUsageError:RunCliTest.OffsetOnSampleIsUsageError +rm cpp/tools/commands/cmd_select.cc ``` -Expected: 编译或测试失败,至少包含 `ParsedArgs` 没有 `seed` / `has_seed`,或 `select` 仍是已知命令。 - -- [ ] **Step 3: 在 `ParsedArgs` 中加入 seed 字段** +- [ ] **Step 2: 从 `commands.h` 删除 `cmd_select` 声明** -在 `cpp/tools/cli/cli_args.h` 的 `ParsedArgs` 内、`has_end` 后加入: +删除 `cpp/tools/commands/commands.h` 中这段: ```cpp - long long seed = 0; - bool has_seed = false; +int cmd_select(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); ``` -- [ ] **Step 4: 解析 `--seed`** +- [ ] **Step 3: 把 `select` E2E 改写为 `cat`** -在 `cpp/tools/cli/cli_args.cc` 的 `parse_args` 循环中,把下面分支放在 `--end` 分支之后、`--model` 分支之前: +在 `cpp/test/tools/command_e2e_test.cc` 中,将 `SelectWithTimeRange` 改为: ```cpp - } else if (a == "--seed") { - if (!need_value(a, val)) { - return p; - } - if (!parse_ll(val, p.seed)) { - p.error = "Invalid --seed: " + val; - return p; - } - p.has_seed = true; +TEST(CliE2E, CatWithTimeRange) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "2", "--end", + "3", "-f", "tsv", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); +} ``` -- [ ] **Step 5: 更新 `run_cli.cc` 的 usage、白名单和 flag 组合校验** - -在 `cpp/tools/cli/run_cli.cc` 中: - -1. 将 usage 的 Commands 段替换为: +将 `SelectJsonIsNdjson` 改为: ```cpp - " ls list devices (tree) or tables (table)\n" - " schema per-measurement data type/encoding/compression\n" - " meta file-level summary without data-page scans\n" - " stats per-series statistics\n" - " head first N rows (default 10, use -n)\n" - " cat matching rows of a device/table\n" - " count per-series row counts from statistics\n" - " sample sampled rows (default 10, use -n and --seed)\n" +TEST(CliE2E, CatJsonIsNdjson) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "0", "--end", + "0", "-f", "json", f.path}, + out, err); + EXPECT_EQ(code, 0); + EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); +} ``` -2. 将 Options 段替换为: +- [ ] **Step 4: 修正过时的解析测试命令名** -```cpp - "Options: -f/--format csv|tsv|json|table, -d/--device, -t/--table,\n" - " -m/--measurements a,b, -n/--limit, --offset, --start,\n" - " --end, --seed, --no-header, --model tree|table,\n" - " -h/--help, --version\n"; -``` - -3. 将 `is_known_command` 的集合替换为: +在 `cpp/test/tools/cli_args_test.cc` 的 `MeasurementsSplitOnComma` 中,把命令从 +`select` 改为 `cat`(仅 cosmetic,`parse_args` 不校验命令名): ```cpp - static const std::set kCmds = { - "ls", "schema", "meta", "stats", - "head", "cat", "count", "sample"}; +TEST(ParseArgsTest, MeasurementsSplitOnComma) { + auto p = tsfile_cli::parse_args({"cat", "-m", "s1,s2,s3", "data.tsfile"}); + ASSERT_EQ(p.measurements.size(), 3u); + EXPECT_EQ(p.measurements[1], "s2"); +} ``` -4. 在匿名 namespace 中新增: +- [ ] **Step 5: 构建并运行 CLI 测试,确认基线全绿** -```cpp -bool validate_command_flags(const ParsedArgs& p, std::ostream& err) { - if (p.has_seed && p.command != "sample") { - err << "Error: --seed is only valid for sample\n"; - return false; - } - if (p.command == "sample" && p.offset != 0) { - err << "Error: --offset is not valid for sample\n"; - return false; - } - if (!p.device.empty() && !p.table.empty()) { - err << "Error: use either --device or --table, not both\n"; - return false; - } - if (p.limit < -1) { - err << "Error: --limit must be >= 0\n"; - return false; - } - if (p.offset < 0) { - err << "Error: --offset must be >= 0\n"; - return false; - } - if (p.has_start && p.has_end && p.start > p.end) { - err << "Error: --start must be <= --end\n"; - return false; - } - return true; -} +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.* ``` -5. 在 `if (p.file.empty())` 检查之后、`storage::libtsfile_init();` 之前加入: +Expected: 构建成功;选定测试全部通过。其中 `RunCliTest.SelectIsNoLongerKnownCommand`、 +`RunCliTest.NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen` 仍通过(`meta`/ +`count`/`sample` 此时仍是 stub)。 -```cpp - if (!validate_command_flags(p, err)) { - print_usage(err); - return kExitUsage; - } -``` +> 若 `CliE2E.SchemaTableMeasurementFilterOnlyShowsRequestedColumn` 等已有断言因字符串 +> 细节失败,先用 `./build/Debug/bin/tsfile -f tsv ` 打印实际输出再对齐, +> fixture 的数值(ts 0..4,s1=ts*10)是固定的。 -- [ ] **Step 6: 运行测试确认通过** +- [ ] **Step 6: 手动确认 `select` 已不可用、help 不含 select** Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.*:RunCliTest.* +cd cpp && ./build/Debug/bin/tsfile --help | grep -i select; echo "rc=$?" ``` -Expected: build succeeds; selected tests pass. +Expected: 无输出,`rc=1`(grep 未命中);help 列出 `ls schema meta stats head cat +count sample`。 -- [ ] **Step 7: 提交** +- [ ] **Step 7: 提交工作基线** ```bash -git add cpp/tools/cli/cli_args.h cpp/tools/cli/cli_args.cc cpp/tools/cli/run_cli.cc cpp/test/tools/cli_args_test.cc -git commit -m "Update tsfile CLI command surface" +git add cpp/CMakeLists.txt cpp/test/CMakeLists.txt cpp/src/file/read_file.cc \ + cpp/tools/CMakeLists.txt cpp/tools/tools_main.cc \ + cpp/tools/cli/exit_codes.h cpp/tools/cli/run_cli.h \ + cpp/tools/commands cpp/tools/format \ + cpp/test/tools/cli_test_util.h cpp/test/tools/command_e2e_test.cc \ + cpp/test/tools/output_format_test.cc cpp/test/tools/cli_args_test.cc +git commit -m "Add tsfile CLI ls/schema/stats/head/cat implementation and tests" ``` +> 注意:`git add cpp/tools/commands` 会把已被 `rm` 的 `cmd_select.cc` 记为删除。提交前 +> `git status --short` 确认未纳入 `.codegraph/`。 + --- -### Task 2: 统计 helper 与 `stats` 扩展字段 +### Task 2: 统计 helper 与 `stats` 扩展到 min/max/first/last/sum **Files:** - Create: `cpp/tools/commands/stat_table.h` @@ -265,30 +212,9 @@ git commit -m "Update tsfile CLI command surface" - Create: `cpp/test/tools/stat_table_test.cc` - Modify: `cpp/test/tools/command_e2e_test.cc` -- [ ] **Step 1: 写失败测试,直接覆盖统计值格式化** - -新增 `cpp/test/tools/stat_table_test.cc`: +- [ ] **Step 1: 写失败测试,直接覆盖统计值格式化** — `cpp/test/tools/stat_table_test.cc` ```cpp -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - #include "commands/stat_table.h" #include @@ -305,7 +231,8 @@ TEST(StatTableTest, Int64StatisticCellsContainValueSummaries) { EXPECT_EQ(cells.values[2], "10"); EXPECT_EQ(cells.values[3], "30"); EXPECT_EQ(cells.values[4], "40"); - EXPECT_EQ(cells.is_null, std::vector({false, false, false, false, false})); + EXPECT_EQ(cells.is_null, + std::vector({false, false, false, false, false})); } TEST(StatTableTest, BooleanStatisticLeavesMinMaxNull) { @@ -326,35 +253,14 @@ TEST(StatTableTest, BooleanStatisticLeavesMinMaxNull) { Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=StatTableTest.* +cd cpp && bash build.sh -t=Debug ``` -Expected: build fails because `commands/stat_table.h` does not exist. +Expected: 构建失败,因为 `commands/stat_table.h` 不存在。 -- [ ] **Step 3: 创建 `stat_table.h`** - -新增 `cpp/tools/commands/stat_table.h`: +- [ ] **Step 3: 创建 `cpp/tools/commands/stat_table.h`**(前置 license 头) ```cpp -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - #ifndef TSFILE_CLI_STAT_TABLE_H #define TSFILE_CLI_STAT_TABLE_H @@ -397,8 +303,8 @@ struct FileSummary { }; StatisticCells statistic_value_cells(storage::Statistic* st); -std::vector collect_series_stats( - const ParsedArgs& args, storage::TsFileReader& reader); +std::vector collect_series_stats(const ParsedArgs& args, + storage::TsFileReader& reader); FileSummary collect_file_summary(const ParsedArgs& args, storage::TsFileReader& reader); @@ -407,9 +313,7 @@ FileSummary collect_file_summary(const ParsedArgs& args, #endif // TSFILE_CLI_STAT_TABLE_H ``` -- [ ] **Step 4: 创建 `stat_table.cc`** - -新增 `cpp/tools/commands/stat_table.cc`,核心实现如下: +- [ ] **Step 4: 创建 `cpp/tools/commands/stat_table.cc`**(前置 license 头) ```cpp #include "commands/stat_table.h" @@ -530,8 +434,8 @@ StatisticCells statistic_value_cells(storage::Statistic* st) { return cells; } -std::vector collect_series_stats( - const ParsedArgs& args, storage::TsFileReader& reader) { +std::vector collect_series_stats(const ParsedArgs& args, + storage::TsFileReader& reader) { std::vector rows; storage::DeviceTimeseriesMetadataMap meta = reader.get_timeseries_metadata(); @@ -611,23 +515,34 @@ FileSummary collect_file_summary(const ParsedArgs& args, } // namespace tsfile_cli ``` -- [ ] **Step 5: 用 helper 改写 `cmd_stats.cc`** +> **编译风险提示**:上面对 `storage::Statistic` 子类字段(`min_value_`、`max_value_`、 +> `first_value_`、`last_value_`、`sum_value_`、`start_time_`、`end_time_`)和访问器 +> (`get_count()`、`get_type()`、`get_statistic()`)的引用,应在编译失败时对照 +> `cpp/src/common/statistic.h` 校正名称,不要改测试期望值。 + +- [ ] **Step 5: 用 helper 改写 `cmd_stats.cc`,输出 10 列** -将 `cpp/tools/commands/cmd_stats.cc` 的命令体改为输出 10 列: +将 `cpp/tools/commands/cmd_stats.cc` 整个命令体替换为: ```cpp +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" #include "commands/stat_table.h" +namespace tsfile_cli { + int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - RowWriter w( - out, fmt, - {"target", "measurement", "count", "start_time", "end_time", - "min", "max", "first", "last", "sum"}, - {common::STRING, common::STRING, common::INT64, common::INT64, - common::INT64, common::STRING, common::STRING, common::STRING, - common::STRING, common::STRING}, - args.no_header); + RowWriter w(out, fmt, + {"target", "measurement", "count", "start_time", "end_time", + "min", "max", "first", "last", "sum"}, + {common::STRING, common::STRING, common::INT64, common::INT64, + common::INT64, common::STRING, common::STRING, common::STRING, + common::STRING, common::STRING}, + args.no_header); std::vector rows = collect_series_stats(args, reader); for (const SeriesStatRow& row : rows) { @@ -637,8 +552,8 @@ int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, cells.insert(cells.end(), row.value_cells.values.begin(), row.value_cells.values.end()); - std::vector nulls = {false, false, false, - row.count == 0, row.count == 0}; + std::vector nulls = {false, false, false, row.count == 0, + row.count == 0}; nulls.insert(nulls.end(), row.value_cells.is_null.begin(), row.value_cells.is_null.end()); w.write(cells, nulls); @@ -646,21 +561,24 @@ int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, w.finish(); return kExitOk; } + +} // namespace tsfile_cli ``` -- [ ] **Step 6: 更新 E2E 断言新 stats 表头和值** +- [ ] **Step 6: 更新 `stats` E2E 断言表头与值** -在 `cpp/test/tools/command_e2e_test.cc` 中,将 `StatsReportsCountAndTimeRange` 的表头断言替换为: +在 `cpp/test/tools/command_e2e_test.cc` 中,把 `StatsReportsCountAndTimeRange` 的两条 +`EXPECT_NE` 替换为: ```cpp - EXPECT_NE(out.str().find( - "target\tmeasurement\tcount\tstart_time\tend_time\tmin\tmax\tfirst\tlast\tsum"), + EXPECT_NE(out.str().find("target\tmeasurement\tcount\tstart_time\tend_" + "time\tmin\tmax\tfirst\tlast\tsum"), std::string::npos); EXPECT_NE(out.str().find("s1\t5\t0\t4\t0\t40\t0\t40\t100"), std::string::npos); ``` -- [ ] **Step 7: 运行测试确认通过** +- [ ] **Step 7: 构建并运行测试确认通过** Run: @@ -668,25 +586,27 @@ Run: cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=StatTableTest.*:CliE2E.StatsReportsCountAndTimeRange ``` -Expected: build succeeds; selected tests pass. +Expected: 构建成功;选定测试通过。 - [ ] **Step 8: 提交** ```bash -git add cpp/tools/commands/stat_table.h cpp/tools/commands/stat_table.cc cpp/tools/commands/cmd_stats.cc cpp/test/tools/stat_table_test.cc cpp/test/tools/command_e2e_test.cc -git commit -m "Add tsfile CLI statistic helpers" +git add cpp/tools/commands/stat_table.h cpp/tools/commands/stat_table.cc \ + cpp/tools/commands/cmd_stats.cc cpp/test/tools/stat_table_test.cc \ + cpp/test/tools/command_e2e_test.cc +git commit -m "Extend tsfile stats with value summaries and shared stat helpers" ``` --- -### Task 3: 实现 `meta` 和 `count` +### Task 3: 实现 `meta` **Files:** - Create: `cpp/tools/commands/cmd_meta.cc` -- Create: `cpp/tools/commands/cmd_count.cc` - Modify: `cpp/tools/commands/commands.h` - Modify: `cpp/tools/cli/run_cli.cc` - Modify: `cpp/test/tools/command_e2e_test.cc` +- Modify: `cpp/test/tools/cli_args_test.cc` - [ ] **Step 1: 写失败 E2E 测试** @@ -700,71 +620,46 @@ TEST(CliE2E, MetaReportsFileSummary) { int code = tsfile_cli::run_cli({"meta", "-f", "tsv", f.path}, out, err); EXPECT_EQ(code, 0); EXPECT_TRUE(err.str().empty()); - EXPECT_NE(out.str().find( - "file\tmodel\tversion\tdevice_count\ttable_count\tseries_count\tstart_time\tend_time\tbloom_filter\tfile_size_bytes"), + EXPECT_NE(out.str().find("file\tmodel\tversion\tdevice_count\ttable_" + "count\tseries_count\tstart_time\tend_time\tbloom_" + "filter\tfile_size_bytes"), std::string::npos); EXPECT_NE(out.str().find("\ttable\t"), std::string::npos); - EXPECT_NE(out.str().find("\t1\t"), std::string::npos); } +``` -TEST(CliE2E, CountReportsSeriesCountsAndTotal) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"count", "-f", "tsv", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_TRUE(err.str().empty()); - EXPECT_NE(out.str().find("target\tmeasurement\tcount"), std::string::npos); - EXPECT_NE(out.str().find("table1\ts1\t5"), std::string::npos); - EXPECT_NE(out.str().find("total\t\t"), std::string::npos); -} +- [ ] **Step 2: 把 `meta` 从「未实现」集合移除** + +在 `cpp/test/tools/cli_args_test.cc` 的 +`NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen` 中,把循环范围从 +`{"meta", "count", "sample"}` 改为: + +```cpp + for (const char* command : {"count", "sample"}) { ``` -- [ ] **Step 2: 运行测试确认失败** +- [ ] **Step 3: 运行测试确认失败** Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary:CliE2E.CountReportsSeriesCountsAndTotal +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary ``` -Expected: build or tests fail because `meta` and `count` are not dispatched. +Expected: 测试失败——`meta` 仍被 `is_unimplemented_command` 拦截,返回退出码 1。 -- [ ] **Step 3: 更新命令声明** +- [ ] **Step 4: 声明 `cmd_meta`** -在 `cpp/tools/commands/commands.h` 中,删除 `cmd_select` 声明,并在 `cmd_schema` 与 `cmd_stats` 附近加入: +在 `cpp/tools/commands/commands.h` 的 `cmd_schema` 声明之后加入: ```cpp int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); -int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); ``` -- [ ] **Step 4: 新增 `cmd_meta.cc`** - -创建 `cpp/tools/commands/cmd_meta.cc`: +- [ ] **Step 5: 创建 `cpp/tools/commands/cmd_meta.cc`**(前置 license 头) ```cpp -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - #include "commands/commands.h" #include "cli/exit_codes.h" @@ -775,29 +670,23 @@ namespace tsfile_cli { int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - RowWriter w( - out, fmt, - {"file", "model", "version", "device_count", "table_count", - "series_count", "start_time", "end_time", "bloom_filter", - "file_size_bytes"}, - {common::STRING, common::STRING, common::STRING, common::INT64, - common::INT64, common::INT64, common::INT64, common::INT64, - common::STRING, common::INT64}, - args.no_header); + RowWriter w(out, fmt, + {"file", "model", "version", "device_count", "table_count", + "series_count", "start_time", "end_time", "bloom_filter", + "file_size_bytes"}, + {common::STRING, common::STRING, common::STRING, common::INT64, + common::INT64, common::INT64, common::INT64, common::INT64, + common::STRING, common::INT64}, + args.no_header); FileSummary s = collect_file_summary(args, reader); - w.write({s.file, - s.model, - "", - std::to_string(s.device_count), - std::to_string(s.table_count), - std::to_string(s.series_count), + w.write({s.file, s.model, "", std::to_string(s.device_count), + std::to_string(s.table_count), std::to_string(s.series_count), s.has_time_range ? std::to_string(s.start_time) : "", - s.has_time_range ? std::to_string(s.end_time) : "", - "", + s.has_time_range ? std::to_string(s.end_time) : "", "", std::to_string(s.file_size_bytes)}, - {false, false, true, false, false, false, - !s.has_time_range, !s.has_time_range, true, false}); + {false, false, true, false, false, false, !s.has_time_range, + !s.has_time_range, true, false}); w.finish(); return kExitOk; } @@ -805,30 +694,102 @@ int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, } // namespace tsfile_cli ``` -- [ ] **Step 5: 新增 `cmd_count.cc`** +- [ ] **Step 6: 在 `run_cli.cc` 中放开 `meta` 并加入分发** + +在 `cpp/tools/cli/run_cli.cc` 中: + +1. 把 `is_unimplemented_command` 的集合改为: + +```cpp + static const std::set kCmds = {"count", "sample"}; +``` + +2. 在分发链的 `cmd_schema` 分支之后、`cmd_stats` 分支之前插入: + +```cpp + } else if (p.command == "meta") { + code = cmd_meta(p, reader, fmt, out, err); +``` + +- [ ] **Step 7: 构建并运行测试确认通过** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary:RunCliTest.NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen +``` + +Expected: 构建成功;两个测试通过。 + +- [ ] **Step 8: 提交** + +```bash +git add cpp/tools/commands/cmd_meta.cc cpp/tools/commands/commands.h \ + cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc \ + cpp/test/tools/cli_args_test.cc +git commit -m "Add tsfile meta command" +``` + +--- + +### Task 4: 实现 `count` + +**Files:** +- Create: `cpp/tools/commands/cmd_count.cc` +- Modify: `cpp/tools/commands/commands.h` +- Modify: `cpp/tools/cli/run_cli.cc` +- Modify: `cpp/test/tools/command_e2e_test.cc` +- Modify: `cpp/test/tools/cli_args_test.cc` + +- [ ] **Step 1: 写失败 E2E 测试** -创建 `cpp/tools/commands/cmd_count.cc`: +在 `cpp/test/tools/command_e2e_test.cc` 末尾追加: ```cpp -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ +TEST(CliE2E, CountReportsSeriesCountsAndTotal) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"count", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_TRUE(err.str().empty()); + EXPECT_NE(out.str().find("target\tmeasurement\tcount"), std::string::npos); + EXPECT_NE(out.str().find("\ts1\t5"), std::string::npos); + EXPECT_NE(out.str().find("total\t\t"), std::string::npos); +} +``` + +- [ ] **Step 2: 把 `count` 从「未实现」集合移除** + +在 `cpp/test/tools/cli_args_test.cc` 的 +`NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen` 中,把循环范围改为: +```cpp + for (const char* command : {"sample"}) { +``` + +- [ ] **Step 3: 运行测试确认失败** + +Run: + +```bash +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.CountReportsSeriesCountsAndTotal +``` + +Expected: 测试失败——`count` 仍被拦截。 + +- [ ] **Step 4: 声明 `cmd_count`** + +在 `cpp/tools/commands/commands.h` 的 `cmd_meta` 声明之后加入: + +```cpp +int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +``` + +- [ ] **Step 5: 创建 `cpp/tools/commands/cmd_count.cc`**(前置 license 头) + +```cpp #include "commands/commands.h" #include "cli/exit_codes.h" @@ -858,46 +819,45 @@ int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, } // namespace tsfile_cli ``` -- [ ] **Step 6: 更新分发** +- [ ] **Step 6: 在 `run_cli.cc` 中放开 `count` 并加入分发** + +在 `cpp/tools/cli/run_cli.cc` 中: -在 `cpp/tools/cli/run_cli.cc` 的命令分发链中: +1. 把 `is_unimplemented_command` 的集合改为: ```cpp - } else if (p.command == "schema") { - code = cmd_schema(p, reader, fmt, out, err); - } else if (p.command == "meta") { - code = cmd_meta(p, reader, fmt, out, err); - } else if (p.command == "stats") { - code = cmd_stats(p, reader, fmt, out, err); + static const std::set kCmds = {"sample"}; ``` -并在 `cat` 后加入: +2. 在分发链的 `cmd_cat` 分支之后插入: ```cpp } else if (p.command == "count") { code = cmd_count(p, reader, fmt, out, err); ``` -- [ ] **Step 7: 运行测试确认通过** +- [ ] **Step 7: 构建并运行测试确认通过** Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary:CliE2E.CountReportsSeriesCountsAndTotal +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.CountReportsSeriesCountsAndTotal:RunCliTest.NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen ``` -Expected: build succeeds; selected tests pass. +Expected: 构建成功;两个测试通过。 - [ ] **Step 8: 提交** ```bash -git add cpp/tools/commands/cmd_meta.cc cpp/tools/commands/cmd_count.cc cpp/tools/commands/commands.h cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc -git commit -m "Add tsfile meta and count commands" +git add cpp/tools/commands/cmd_count.cc cpp/tools/commands/commands.h \ + cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc \ + cpp/test/tools/cli_args_test.cc +git commit -m "Add tsfile count command" ``` --- -### Task 4: 实现 deterministic `sample` +### Task 5: 实现确定性 `sample`,并彻底移除「未实现」拦截 **Files:** - Modify: `cpp/tools/format/result_set_format.h` @@ -905,8 +865,8 @@ git commit -m "Add tsfile meta and count commands" - Create: `cpp/tools/commands/cmd_sample.cc` - Modify: `cpp/tools/commands/commands.h` - Modify: `cpp/tools/cli/run_cli.cc` -- Create: `cpp/test/tools/result_set_sample_test.cc` - Modify: `cpp/test/tools/command_e2e_test.cc` +- Modify: `cpp/test/tools/cli_args_test.cc` - [ ] **Step 1: 写失败 E2E 测试** @@ -920,12 +880,12 @@ TEST(CliE2E, SampleIsReproducibleWithSeed) { std::ostringstream out2; std::ostringstream err2; - int code1 = tsfile_cli::run_cli({"sample", "-m", "s1", "-n", "3", - "--seed", "7", "-f", "tsv", f.path}, - out1, err1); - int code2 = tsfile_cli::run_cli({"sample", "-m", "s1", "-n", "3", - "--seed", "7", "-f", "tsv", f.path}, - out2, err2); + int code1 = tsfile_cli::run_cli( + {"sample", "-m", "s1", "-n", "3", "--seed", "7", "-f", "tsv", f.path}, + out1, err1); + int code2 = tsfile_cli::run_cli( + {"sample", "-m", "s1", "-n", "3", "--seed", "7", "-f", "tsv", f.path}, + out2, err2); EXPECT_EQ(code1, 0); EXPECT_EQ(code2, 0); @@ -937,7 +897,12 @@ TEST(CliE2E, SampleIsReproducibleWithSeed) { } ``` -- [ ] **Step 2: 运行测试确认失败** +- [ ] **Step 2: 删除 `cli_args_test.cc` 中的「未实现」测试** + +`meta`/`count`/`sample` 都将实现,删除 `cpp/test/tools/cli_args_test.cc` 中整个 +`TEST(RunCliTest, NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen) { ... }`。 + +- [ ] **Step 3: 运行测试确认失败** Run: @@ -945,27 +910,27 @@ Run: cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed ``` -Expected: build or test fails because `sample` is not dispatched. +Expected: 测试失败——`sample` 仍被拦截。 -- [ ] **Step 3: 声明 sampled writer** +- [ ] **Step 4: 声明 sampled writer** -在 `cpp/tools/format/result_set_format.h` 中追加: +在 `cpp/tools/format/result_set_format.h` 中,`write_result_set` 声明之后追加: ```cpp int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, - bool no_header, std::ostream& out, - long long limit, unsigned long long seed); + bool no_header, std::ostream& out, long long limit, + unsigned long long seed); ``` -- [ ] **Step 4: 实现 sampled writer** +- [ ] **Step 5: 实现 sampled writer** -在 `cpp/tools/format/result_set_format.cc` 中新增 include: +在 `cpp/tools/format/result_set_format.cc` 顶部 include 区加入: ```cpp #include ``` -在 `write_result_set` 之后新增: +在 `write_result_set` 定义之后追加: ```cpp namespace { @@ -994,8 +959,8 @@ BufferedRow read_current_row(storage::ResultSet* rs, } // namespace int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, - bool no_header, std::ostream& out, - long long limit, unsigned long long seed) { + bool no_header, std::ostream& out, long long limit, + unsigned long long seed) { if (limit < 0) { limit = 10; } @@ -1043,30 +1008,18 @@ int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, } ``` -- [ ] **Step 5: 新增 `cmd_sample.cc`** +- [ ] **Step 6: 声明 `cmd_sample`** -创建 `cpp/tools/commands/cmd_sample.cc`: +在 `cpp/tools/commands/commands.h` 的 `cmd_count` 声明之后加入: ```cpp -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * License); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ +int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); +``` +- [ ] **Step 7: 创建 `cpp/tools/commands/cmd_sample.cc`**(前置 license 头) + +```cpp #include "commands/commands.h" #include @@ -1154,8 +1107,8 @@ int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, const long long limit = args.limit < 0 ? 10 : args.limit; const unsigned long long seed = args.has_seed ? static_cast(args.seed) : 0ULL; - int wret = write_result_set_sampled(rs, fmt, args.no_header, out, limit, - seed); + int wret = + write_result_set_sampled(rs, fmt, args.no_header, out, limit, seed); reader.destroy_query_data_set(rs); return wret == 0 ? kExitOk : kExitRuntime; } @@ -1163,187 +1116,164 @@ int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, } // namespace tsfile_cli ``` -- [ ] **Step 6: 更新声明和分发** +> `cmd_sample` 的 query 构造与 `commands/row_query.cc::run_row_query` 几乎相同;二者唯一 +> 差异是 `sample` 走 `write_result_set_sampled`、不接受 `--offset`。先保持各自独立, +> 待第二个真实共享点出现再抽取——不要为消除这一处重复提前抽象(YAGNI)。 -在 `cpp/tools/commands/commands.h` 加入: +- [ ] **Step 8: 移除「未实现」拦截,加入 `sample` 分发** + +在 `cpp/tools/cli/run_cli.cc` 中: + +1. 删除整个 `is_unimplemented_command` 函数定义。 + +2. 删除 `run_cli` 中调用它的守卫块: ```cpp -int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); + if (is_unimplemented_command(p.command)) { + err << "Error: command not implemented yet: " << p.command << "\n"; + print_usage(err); + return kExitUsage; + } ``` -在 `cpp/tools/cli/run_cli.cc` 的分发链中 `count` 后加入: +3. 在分发链的 `cmd_count` 分支之后插入: ```cpp } else if (p.command == "sample") { code = cmd_sample(p, reader, fmt, out, err); ``` -- [ ] **Step 7: 运行测试确认通过** +- [ ] **Step 9: 构建并运行测试确认通过** Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed:CliE2E.*:RunCliTest.* ``` -Expected: build succeeds; selected test passes. +Expected: 构建成功;`sample` 可复现测试与全部 CLI 测试通过;不再有 `RunCliTest` +引用 `command not implemented yet`。 -- [ ] **Step 8: 提交** +- [ ] **Step 10: 提交** ```bash -git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc cpp/tools/commands/cmd_sample.cc cpp/tools/commands/commands.h cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc +git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc \ + cpp/tools/commands/cmd_sample.cc cpp/tools/commands/commands.h \ + cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc \ + cpp/test/tools/cli_args_test.cc git commit -m "Add deterministic tsfile sample command" ``` --- -### Task 5: 移除 `select` 并把时间范围测试迁到 `cat` +### Task 6: 全量验证、help 快照与最终检查 **Files:** -- Delete: `cpp/tools/commands/cmd_select.cc` -- Modify: `cpp/test/tools/cli_args_test.cc` -- Modify: `cpp/test/tools/command_e2e_test.cc` - -- [ ] **Step 1: 更新解析测试里的旧命令名** - -在 `cpp/test/tools/cli_args_test.cc` 中,将 `MeasurementsSplitOnComma` 的输入从: - -```cpp -auto p = - tsfile_cli::parse_args({"select", "-m", "s1,s2,s3", "data.tsfile"}); -``` - -改为: - -```cpp -auto p = - tsfile_cli::parse_args({"cat", "-m", "s1,s2,s3", "data.tsfile"}); -``` - -- [ ] **Step 2: 将 `select` E2E 改为 `cat`** - -在 `cpp/test/tools/command_e2e_test.cc` 中,把 `SelectWithTimeRange` 改名为 `CatWithTimeRange`,命令从 `select` 改为 `cat`: - -```cpp -TEST(CliE2E, CatWithTimeRange) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "2", - "--end", "3", "-f", "tsv", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); -} -``` - -把 `SelectJsonIsNdjson` 改名为 `CatJsonIsNdjson`,命令从 `select` 改为 `cat`: - -```cpp -TEST(CliE2E, CatJsonIsNdjson) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "0", - "--end", "0", "-f", "json", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); -} -``` - -- [ ] **Step 3: 删除 `cmd_select.cc`** +- Modify: `docs/superpowers/plans/2026-06-02-tsfile-cli.md`(仅当执行中需修正执行笔记)。 -Run: - -```bash -rm cpp/tools/commands/cmd_select.cc -``` - -- [ ] **Step 4: 运行测试确认通过** +- [ ] **Step 1: 跑完整 CLI 相关测试** Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=ParseArgsTest.MeasurementsSplitOnComma:RunCliTest.SelectIsNoLongerKnownCommand:CliE2E.CatWithTimeRange:CliE2E.CatJsonIsNdjson -``` - -Expected: build succeeds; selected tests pass. - -- [ ] **Step 5: 提交** - -```bash -git add cpp/test/tools/cli_args_test.cc cpp/test/tools/command_e2e_test.cc cpp/tools/commands/cmd_select.cc -git commit -m "Remove tsfile select command" +cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* ``` ---- - -### Task 6: 全量验证、help 文案快照和最终提交检查 - -**Files:** -- Modify: `docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md` only if execution notes need correction during implementation. +Expected: 构建成功;选定测试全部通过。 -- [ ] **Step 1: 跑完整 CLI 相关测试** +- [ ] **Step 2: 跑完整 C++ 测试可执行文件** Run: ```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* +cd cpp && ./build/Debug/lib/TsFile_Test ``` -Expected: build succeeds; selected tests pass. +Expected: 全部通过。若有与本计划无关的既有测试失败,记录确切失败名与输出,再决定是否 +缩小验证范围。 -- [ ] **Step 2: 跑完整 C++ 测试可执行文件** +- [ ] **Step 3: 手动检查 help 与命令面** Run: ```bash -cd cpp && ./build/Debug/lib/TsFile_Test +cd cpp && ./build/Debug/bin/tsfile --help ``` -Expected: all tests pass. If unrelated existing tests fail, capture the exact failing test names and output before deciding whether to narrow verification. +Expected: stdout 含 `ls schema meta stats head cat count sample`;不含 `select`、 +不含 “not implemented”。 -- [ ] **Step 3: 手动检查 CLI help 不再出现 `select`** +- [ ] **Step 4: 针对自带样例手动冒烟** -Run: +Run(样例为 table 模型): ```bash -cd cpp && ./build/Debug/bin/tsfile --help +cd cpp +BIN=./build/Debug/bin/tsfile +F=examples/test_cpp.tsfile +$BIN ls -f tsv $F +$BIN meta -f tsv $F +$BIN stats -f tsv $F +$BIN count -f tsv $F +$BIN head -n 3 -f tsv $F +$BIN sample -m s1 -n 3 --seed 7 -f tsv $F +echo "missing file:"; $BIN ls nope.tsfile; echo "rc=$?" ``` -Expected: stdout contains `meta`, `count`, `sample`; stdout does not contain `select`. +Expected: 数据在 stdout、诊断在 stderr;`ls nope.tsfile` 退出码 2 且错误在 stderr。 -- [ ] **Step 4: 检查 whitespace 和暂存范围** +- [ ] **Step 5: 格式化与暂存范围检查** Run: ```bash +cd /Users/zhanghongyin/iotdb/tsfile && ./mvnw spotless:apply -P with-cpp 2>&1 | tail -5 && ./mvnw spotless:check -P with-cpp 2>&1 | tail -5 git diff --check git status --short ``` -Expected: `git diff --check` exits 0. `git status --short` shows only this CLI redesign work and any pre-existing unrelated files remain unstaged. - -- [ ] **Step 5: 最终提交** +Expected: clang-format 干净通过;`git diff --check` 退出 0;`git status --short` 仅含本 +CLI 工作,`.codegraph/` 等无关项未被暂存。 -如果 Task 6 只产生测试/文档微调,提交它们: +- [ ] **Step 6: 最终提交(如有格式化/笔记改动)** ```bash -git add docs/superpowers/plans/2026-06-02-tsfile-cli-redesign.md -git commit -m "Document tsfile CLI redesign execution notes" +git add -u cpp/tools cpp/test/tools +git commit -m "Format tsfile CLI sources" ``` -如果 Task 6 没有产生文件改动,不创建空提交。 - -## 覆盖检查 - -- `select` 删除:Task 1、Task 5。 -- `meta`:Task 3。 -- `count`:Task 3。 -- `sample` 与 `--seed`:Task 1、Task 4。 -- `stats` 扩展到 min/max/first/last/sum:Task 2。 -- 共享参数投影、时间范围、limit/offset:现有 `row_query.cc` 保留,Task 5 用 `cat` E2E 覆盖时间范围。 -- 输出格式与 stdout/stderr:现有 formatter 测试保留,Task 6 跑完整相关测试。 -- 构建、安装和 CMake glob:现有 `cpp/tools/CMakeLists.txt` 使用 `GLOB_RECURSE`,新增 `.cc` 自动纳入,Task 6 通过 build 验证。 +若本任务未产生文件改动,不创建空提交。 + +## 覆盖检查(plan self-review) + +| Spec 要求 | 对应 | +|---|---| +| 单 `tsfile` 二进制、git 式子命令分发 | 已实现(基线,Task 1 提交) | +| `ls`/`schema`/`head`/`cat` | 已实现(基线,Task 1 提交) | +| `select` 删除(动词 + 死代码) | 命令面已删(`a392a56f`);死代码 + 测试 Task 1 | +| `stats` 扩展 min/max/first/last/sum | Task 2 | +| `meta` | Task 3 | +| `count` | Task 4 | +| `sample` 与 `--seed` 可复现 | `--seed` 解析已提交;writer + 命令 Task 5 | +| 共享参数:投影/时间范围/limit/offset | 基线 `row_query.cc` 已实现;Task 1 `cat` E2E 覆盖时间范围 | +| 输出格式 csv/tsv/json/table、stdout/stderr 分离 | 基线 formatter + `read_file.cc` 改动,Task 1 提交、Task 6 验证 | +| tree/table 自动检测 + `--model` | 基线 `is_table_model`;`stats`/`count`/`meta` 经 `collect_*` 支持作用域 | +| 退出码 0/1/2/3 | `exit_codes.h`(基线);各命令返回值 | +| `BUILD_TOOLS` + `install()` | 基线 `cpp/tools/CMakeLists.txt`,Task 1 提交 | + +**占位扫描**:无 `TBD`/`TODO`/“implement later”。`run_cli.cc` 的 +`is_unimplemented_command` 拦截在 Task 3/4/5 逐步收窄并于 Task 5 完全删除。 + +**类型一致性**:`ParsedArgs`、`OutputFormat`、`RowWriter(out, fmt, header, types, +no_header)`、`write_result_set(rs, fmt, no_header, out, offset, limit)`、 +`write_result_set_sampled(rs, fmt, no_header, out, limit, seed)`、 +`collect_series_stats`/`collect_file_summary`/`statistic_value_cells`、各 +`cmd_*(args, reader, fmt, out, err)` 签名在各任务间一致。 + +**已知残留风险(执行中验证,非阻塞)**: +1. `storage::Statistic` 子类字段/访问器名称——Task 2 Step 4 注明编译失败时对照 + `cpp/src/common/statistic.h` 校正,不改测试期望。 +2. 行/列顺序导致的 E2E 字符串断言——用 `tsfile -f tsv ` 打印实际输出对齐; + fixture 数值(ts 0..4,s1=ts*10)固定。 +3. table 模型下 `get_timeseries_metadata()` 是否为每序列返回统计量——若 `meta`/`count`/ + `stats` 行数为空,对照基线 `cmd_schema.cc` 已验证的 metadata 读取路径排查。 diff --git a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md b/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md deleted file mode 100644 index 9710e6dd0..000000000 --- a/docs/superpowers/specs/2026-06-01-tsfile-unix-cli-design.md +++ /dev/null @@ -1,371 +0,0 @@ - - -# Design: TsFile C++ CLI 重设计 - -- **日期**:2026-06-02 -- **模块**:`cpp/` -- **状态**:设计已批准,待编写实现计划 -- **调研依据**: - `/Users/zhanghongyin/reasearchNotes/research/tsfile/Report.md` 第 5.3 节, - 以及 - `/Users/zhanghongyin/reasearchNotes/research/tsfile/调研报告/各文件格式CLI工具调研.md` - -## 目标 - -为 TsFile 提供一个单二进制、可组合、适合管道使用的 C++ 命令行工具: - -```sh -tsfile [options] -tsfile --help | --version -tsfile help -``` - -这个 CLI 要让用户能像查看其他自描述数据文件一样查看 `.tsfile`:发现命名空间、查看 -schema 和元数据、预览行、流式导出行、统计行数、抽样行,而不需要自己写 reader 代码。 - -本次重设计保留 v1 的整体方向,但让命令面更贴近 Parquet 及相近数据格式的工具谱系。 -可见变化是:删除 `select` 动词,新增 `meta`、`count`、`sample`,并把投影、时间范围、 -limit、offset 下沉为行输出命令的共享参数。 - -## 调研结论对设计的约束 - -TsFile 同时有两个身份: - -1. **像 Parquet 的文件形态**:封存、不可变、自描述、列式,带 footer 元数据、偏移和统计量。 - 因此 Parquet CLI 是最重要的命令设计参照。 -2. **像 HDF5/netCDF 的命名空间**:TsFile 不总是单表文件;tree 模型下有多 device, - table 模型下有多 table。因此它需要一个 `ls` 式命名空间命令。 - -CLI 调研把不可变数据文件的只读工具谱系统一为: - -```text -schema | meta(/footer/stats) | head(/cat) | count | sample -``` - -Parquet 是最完整模板:Apache `parquet-cli` 提供 `schema`、`meta`、`footer`、`head`、 -`cat` 以及索引/统计命令;Rust `pqrs` 补齐了特别有用的 `rowcount` 和 `sample`。 -ORC 与 Avro 也印证同一模式:官方工具提供 `meta`/`data`/`count`、`getschema`/ -`getmeta`/`cat`/`count`。HDF5 和 netCDF 则提供命名空间与 header 经验:`h5ls`、 -`h5dump -H`、`ncdump -h` 的价值在于不用打开应用就能查看文件内部结构。 - -映射到 TsFile 后,除了五动词谱系,还需要额外保留 `ls`,因为 TsFile 文件内部存在 -device/table 命名空间。 - -## 范围 - -本次重设计包含: - -- 一个名为 `tsfile` 的多命令二进制。 -- 只读命令:`ls`、`schema`、`meta`、`stats`、`head`、`cat`、`count`、`sample`。 -- 输出格式、模型选择、列投影、行数限制、offset、时间范围等共享参数。 -- 基于现有 `storage::TsFileReader` 读路径实现。 -- 遵守 Unix 风格:数据输出到 stdout,诊断和错误输出到 stderr,便于接入 `awk`、`jq`、 - `sort`、导入工具和 shell 管道。 - -本次重设计不包含: - -- 写入、转换、合并、重写命令。 -- 与 Java `TsFileSketchTool` 完全等价的字节结构 dump。 -- FUSE 挂载、DuckDB/ClickHouse/VisiData connector 或 SQL replacement scan。 -- ISO 时间格式化,以及超出时间范围和 measurement 投影的复杂谓词。 -- 拆分为多个 `tsfile-*` 二进制。 - -## 命令谱系 - -命令集合对齐 Parquet/ORC/Avro 的只读谱系,并吸收 HDF5/netCDF 的命名空间查看能力。 - -| 动词 | 谱系来源 | 目的 | 主要 reader 支撑 | -|---|---|---|---| -| `ls` | `h5ls`、`ncdump -h`;Parquet 通常不需要 | tree 模型列 device,table 模型列 table,一行一个名字 | `get_all_device_ids()`、`get_all_table_schemas()` | -| `schema` | `parquet-cli schema`、Avro `getschema`、SQL `DESCRIBE` | 输出序列或列的类型信息 | `get_timeseries_schema()`、`get_table_schema()` | -| `meta` | `parquet-cli meta/footer`、Avro `getmeta`、DuckDB metadata 函数 | 输出文件级摘要:模型、版本、命名空间规模、全局时间范围、Bloom filter、文件大小 | reader 元数据和文件系统元数据 | -| `stats` | `parquet-cli column-index/check-stats`、ORC statistics、SQL `SUMMARIZE` | 输出每条序列的 count、时间范围、min、max、first、last、sum | `get_timeseries_metadata()` 统计量 | -| `head` | `parquet-cli head`、`pqrs head`、SQL `LIMIT` | 输出前 N 行 | 共享 row query 路径 | -| `cat` | `parquet-cli cat/scan`、Avro `cat`/`tojson`、ORC `data` | 流式输出匹配行 | 共享 row query 路径 | -| `count` | `pqrs rowcount`、ORC `count`、Avro `count`、SQL `count(*)` | 不扫描数据页,输出序列或作用域内行数 | `get_timeseries_metadata()` 统计量 | -| `sample` | `pqrs sample`、SQL sampling | 输出可复现样本行 | 共享 row query 路径加确定性抽样 | - -`select` 不再作为独立动词。它实际承载的是投影、时间过滤、limit 和 offset;这些能力应作为 -`head`、`cat`、`sample` 的共享参数存在。这也更接近 Parquet 工具把列选择挂到 -行输出命令上的习惯。 - -## 命令语义 - -### `ls` - -`ls` 输出顶层逻辑命名空间: - -- tree 模型:每行一个 device ID; -- table 模型:每行一个 table name。 - -默认输出刻意保持简单稳定,便于管道处理。measurement 或 column 级细节由 `schema` 负责。 - -### `schema` - -`schema` 输出统一的逻辑 schema 表: - -```text -target, measurement, datatype, encoding, compression -``` - -tree 模型下,`target` 是 device,`measurement` 是测点。table 模型下,`target` 是 table, -`measurement` 是列名。若当前公开 API 能拿到 datatype 但拿不到 encoding/compression, -CSV/TSV 输出空字段,JSON 输出 `null`。 - -### `meta` - -`meta` 输出无需解码数据页即可回答的文件级信息。目标字段为: - -```text -file, model, version, device_count, table_count, series_count, -start_time, end_time, bloom_filter, file_size_bytes -``` - -它是 TsFile 对 Parquet `meta`/`footer` 的对应命令:先快速了解文件,再决定是否继续查看 -schema、stats 或行数据。若某个文件级字段当前公开 reader API 无法直接暴露,实现时应输出 -空值而不是扫描数据页。 - -### `stats` - -`stats` 输出每条序列的统计量: - -```text -target, measurement, count, start_time, end_time, -min, max, first, last, sum -``` - -这直接暴露 TsFile 的格式优势:Chunk/Page 级统计量包含 count 和数值摘要,很多查看问题 -不需要读取或解码数据页。 - -### `head` 与 `cat` - -`head` 和 `cat` 是行输出命令: - -- `head` 默认输出前 10 行,并接受 `-n, --limit` 覆盖行数。 -- `cat` 默认流式输出全部匹配行,除非显式指定 limit。 -- 两者都通过共享 row query 路径接受投影(`--measurements`)和时间范围(`--start`、 - `--end`)。 - -`head` 是面向用户的便捷命令,本质上等价于带默认 limit 的 `cat`。 - -### `count` - -`count` 从统计量中读取行数,不通过 row iterator 扫描数据。这是 TsFile 可以优于常见 -Parquet CLI 表面的地方:`parquet-cli` 没有独立 row-count 命令,而 TsFile 的统计量能 -低成本回答 count。 - -作用域规则: - -- 不指定作用域:输出所有序列的 count,并在适合的格式中给出总数; -- `--device`:输出某个 tree-model device 下的 count; -- `--table`:输出某个 table-model table 下的 count。 - -### `sample` - -`sample` 通过共享 row query 和 formatter 输出 N 条样本行,默认 N 为 10,并接受 -`--seed` 保证可复现。 - -实现可以使用 reservoir sampling 或确定性 skip 策略。设计要求是:同一文件、作用域、 -投影、时间范围、limit 和 seed 下,输出稳定。 - -## 共享参数 - -| 参数 | 含义 | 适用命令 | -|---|---|---| -| `-f, --format csv\|tsv\|json\|table` | 输出格式;默认随 stdout 是否为 TTY 自适应 | 全部 | -| `-d, --device ` | 限定 tree-model device | 行输出命令、`schema`、`stats`、`count` | -| `-t, --table ` | 限定 table-model table | 行输出命令、`schema`、`stats`、`count` | -| `-m, --measurements a,b,c` | measurement 或 column 投影 | `head`、`cat`、`sample` | -| `-n, --limit N` | 最大输出行数;`head` 用它作为行数 | `head`、`cat`、`sample` | -| `--offset N` | 跳过开头 N 行 | `head`、`cat` | -| `--start ` / `--end ` | epoch milliseconds 时间范围,闭区间 | `head`、`cat`、`sample` | -| `--seed N` | 可复现抽样种子 | `sample` | -| `--no-header` | 不输出表头 | 表格类输出 | -| `--model tree\|table` | 强制模型,覆盖自动检测 | 全部 | -| `-h, --help` / `--version` | 帮助和版本 | 顶层和单命令 | - -参数与命令不匹配时按 usage error 处理。例如在非 `sample` 命令使用 `--seed`,或在 -`sample` 命令使用 `--offset`,应返回退出码 `1`,并向 stderr 输出明确错误信息。 - -## Tree 与 table 模型 - -模型检测规则保持自动化: - -```text -get_all_table_schemas() non-empty => table model -otherwise => tree model -``` - -`--model tree|table` 可覆盖自动检测。 - -统一命令面下的行为: - -- `ls` 在 tree 文件中列 device,在 table 文件中列 table。 -- `schema`、`stats`、`count` 可用 `--device` 或 `--table` 收窄作用域。 -- 行输出始终把时间列视为第一列。 -- tree 模型行输出使用 device + measurements;table 模型行输出使用 table + columns。 - -## 输出格式 - -保留 v1 formatter 设计: - -- `table`:面向人的对齐表格;stdout 是终端时默认使用。 -- `tsv`:tab 分隔;stdout 被 pipe 或 redirect 时默认使用。 -- `csv`:按 RFC 4180 引号规则输出。字段包含分隔符、引号或换行时加引号,内部引号双写。 -- `json`:NDJSON,一行一个 JSON object。 - -null 在 CSV/TSV 中输出为空字段,在 JSON 中输出为 `null`。时间戳输出存储中的 epoch -milliseconds 整数。ISO 时间格式是后续工作。 - -数据输出到 stdout;诊断、usage、错误输出到 stderr。 - -## 退出码 - -| 退出码 | 条件 | -|---|---| -| `0` | 成功 | -| `1` | usage 或参数错误 | -| `2` | 文件打不开或文件损坏 | -| `3` | 查询或运行时错误 | - -`ReadFile::open` 中当前会向 stdout 打印打开错误(`cpp/src/file/read_file.cc`)。CLI -路径必须避免污染 stdout,应改为向 stderr 输出诊断。 - -## 架构与 v1 迁移 - -当前未提交的 v1 实现已经形成合理边界: - -```text -cpp/tools/ -├── CMakeLists.txt -├── tools_main.cc -├── cli/ -│ ├── cli_args.h -│ ├── cli_args.cc -│ ├── run_cli.h -│ ├── run_cli.cc -│ └── exit_codes.h -├── format/ -│ ├── output_format.h -│ ├── output_format.cc -│ ├── result_set_format.h -│ └── result_set_format.cc -└── commands/ - ├── commands.h - ├── row_query.cc - ├── cmd_ls.cc - ├── cmd_schema.cc - ├── cmd_stats.cc - ├── cmd_head.cc - ├── cmd_cat.cc - └── cmd_select.cc -``` - -重设计后的目标结构是在上述基础上调整 commands: - -```text -cpp/tools/commands/ -├── commands.h -├── row_query.cc -├── cmd_ls.cc -├── cmd_schema.cc -├── cmd_meta.cc -├── cmd_stats.cc -├── cmd_head.cc -├── cmd_cat.cc -├── cmd_count.cc -└── cmd_sample.cc -``` - -迁移项: - -- 删除 `cmd_select.cc`。 -- 新增 `cmd_meta.cc`、`cmd_count.cc`、`cmd_sample.cc`。 -- 在 `ParsedArgs` 中新增 `seed`,并在 `cli_args.cc` 解析 `--seed`。 -- 更新 `run_cli.cc` 的命令注册、help 文案和命令校验。 -- 更新 `commands.h` 声明。 -- 保留 `row_query.cc` 作为 `head`、`cat`、`sample` 的共享行读取路径。 -- 保留 formatter 模块;仅在新命令的结果形状需要时复用通用 row/table 输出能力。 -- 不改 storage engine。新增命令全部使用现有 reader 元数据或现有 row query API。 - -不引入第三方参数解析库。当前手写 parser 足以覆盖这个命令面,也保持 C++ 模块的低依赖。 - -## 构建与发布 - -`cpp/CMakeLists.txt` 在工具开启时包含 `cpp/tools/`,构建链接 `libtsfile` 的 `tsfile` -可执行文件。 - -该二进制随 C++ 产物安装。`cpp/examples/` 继续保留示例定位;CLI 放在 `cpp/tools/`, -因为它是面向用户的工具,不是示例代码。 - -## 测试 - -测试放在 `cpp/test/tools/`,使用 Google Test。 - -单元测试覆盖: - -- `cli_args`:命令与参数解析,包括 `--seed`、未知命令、错误参数值、缺失文件参数、 - 命令与参数不匹配。 -- formatter:`csv`、`tsv`、`json`、`table`,覆盖 null、包含分隔符的字符串、引号、 - 换行。 -- 模型检测:存在 table schema 即 table,否则 tree;`--model` 覆盖两者。 -- `meta`:聚合文件级字段,不触发数据页扫描。 -- `count`:基于 `Statistic.count`,不通过 row iterator。 -- `sample`:固定 seed 下输出可复现。 - -端到端测试覆盖: - -- 生成或复用一个小 `.tsfile` fixture。 -- 通过构建出的 `tsfile` 二进制对子进程运行每个命令。 -- 断言退出码、stdout 形状和 stderr 行为。 -- TTY 自适应格式通过单元测试覆盖;子进程测试显式覆盖 `--format`。 - -测试只验证 CLI 行为和真实 reader 路径,不新增 storage engine 行为。 - -## 被拒绝的方案 - -### 保留 `select` 动词 - -拒绝。`select` 让 CLI 更像 SQL,但和 `cat`、`head` 重叠。它真正提供的是投影和过滤, -因此应落到共享参数上。Parquet 风格工具把列选择放在行输出命令上,TsFile 也应如此。 - -### 把 `count` 折叠进 `stats` 或 `meta` - -拒绝。`count` 足够常用,且 TsFile 可以从统计量低成本回答。显式保留 `count` 能让这个 -格式优势更容易被用户发现。 - -### 为了完全模仿 Parquet 删除 `ls` - -拒绝。TsFile 不总是单逻辑表。多 device 和多 table 命名空间使 `ls` 成为用户经常需要的 -第一个命令,就像 HDF5 中 `h5ls` 很自然一样。 - -### 现在实现写入或转换命令 - -拒绝。本阶段只读命令风险更低,也正好对应调研结论:TsFile 不是完全没有 CLI,而是动词 -不齐、没有统一分发器、还不能被通用查看器直接看见。 - -## 后续工作 - -- 与 Java `TsFileSketchTool` 对齐的结构 dump 命令。 -- ISO 时间格式化。 -- 超出时间范围和 measurement 投影的复杂谓词。 -- 写入、转换、合并、重写命令。 -- DuckDB、ClickHouse、VisiData reader,让 TsFile 进入多格式查询/查看工具。 -- 如果项目选择通过文件系统路径暴露 TsFile,设计只读 FUSE 命名空间或 TableFS 视图。 diff --git a/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md b/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md new file mode 100644 index 000000000..6208d994b --- /dev/null +++ b/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md @@ -0,0 +1,334 @@ + + +# Design: TsFile C++ CLI(`tsfile`) + +- **日期**:2026-06-02 +- **模块**:`cpp/`(新增 `cpp/tools/`、`cpp/test/tools/`) +- **状态**:设计已批准;部分实现(见 §10「实现现状」),剩余工作见 + `docs/superpowers/plans/2026-06-02-tsfile-cli.md` +- **目标参照**:Parquet 的 `parquet-cli` / `pqrs` —— 让 `.tsfile` 像 `.parquet` + 一样可以在命令行里被浏览、检视、预览、导出。 +- **调研依据**: + - `/Users/zhanghongyin/reasearchNotes/research/tsfile/Report.md`(主报告 §5.3) + - `/Users/zhanghongyin/reasearchNotes/research/tsfile/调研报告/各文件格式CLI工具调研.md` + +本文是「实现 tsfile-cli」的单一权威设计文档,取代此前拆分的 +`2026-06-01-tsfile-unix-cli-design.md`。 + +## 1. 目标 + +为 TsFile 提供一个单二进制、可组合、适合管道使用的 C++ 命令行工具: + +```sh +tsfile [options] +tsfile --help | --version +tsfile help +``` + +让用户能像查看其他自描述数据文件一样查看 `.tsfile`:发现命名空间、查看 schema 和 +元数据、预览行、流式导出行、统计行数、抽样行,而不需要自己写 reader 代码。 + +命令面贴近 Parquet 及相近数据格式的工具谱系:动词为 +`ls / schema / meta / stats / head / cat / count / sample`,投影、时间范围、limit、 +offset 作为行输出命令的共享参数。 + +## 2. 调研结论对设计的约束 + +TsFile 同时有两个身份: + +1. **像 Parquet 的文件形态**:封存、不可变、自描述、列式,带 footer 元数据、偏移和 + 统计量。因此 Parquet CLI 是最重要的命令设计参照。 +2. **像 HDF5/netCDF 的命名空间**:TsFile 不总是单表文件;tree 模型下有多 device, + table 模型下有多 table。因此它需要一个 `ls` 式命名空间命令。 + +CLI 调研把不可变数据文件的只读工具谱系统一为: + +```text +schema | meta(/footer/stats) | head(/cat) | count | sample +``` + +Parquet 是最完整模板:Apache `parquet-cli` 提供 `schema`、`meta`、`footer`、`head`、 +`cat` 以及索引/统计命令;Rust `pqrs` 补齐了特别有用的 `rowcount` 和 `sample`。ORC +与 Avro 也印证同一模式(`meta`/`data`/`count`、`getschema`/`getmeta`/`cat`/`count`)。 +HDF5 和 netCDF 提供命名空间与 header 经验:`h5ls`、`h5dump -H`、`ncdump -h` 的价值在于 +不用打开应用就能查看文件内部结构。 + +调研的一句话结论是:TsFile **缺的不是「有没有 CLI」,而是「动词齐不齐 + 是否统一分发 + +能否被通用查看器看见」**。本设计解决前两者——统一成 `tsfile ` 分发器并补齐 +只读动词;通用查看器接入(DuckDB/ClickHouse/VisiData reader)属后续工作(§9)。 + +## 3. 范围 + +包含: + +- 一个名为 `tsfile` 的多命令二进制。 +- 只读命令:`ls`、`schema`、`meta`、`stats`、`head`、`cat`、`count`、`sample`。 +- 输出格式、模型选择、列投影、行数限制、offset、时间范围、抽样种子等共享参数。 +- 基于现有 `storage::TsFileReader` 读路径实现,不修改存储引擎。 +- 遵守 Unix 风格:数据输出到 stdout,诊断和错误输出到 stderr,便于接入 `awk`、`jq`、 + `sort`、导入工具和 shell 管道。 + +不包含: + +- 写入、转换、合并、重写命令。 +- 与 Java `TsFileSketchTool` 完全等价的字节结构 dump。 +- FUSE 挂载、DuckDB/ClickHouse/VisiData connector 或 SQL replacement scan。 +- ISO 时间格式化,以及超出时间范围和 measurement 投影的复杂谓词。 +- 拆分为多个 `tsfile-*` 二进制;不引入第三方参数解析库。 + +## 4. 命令谱系 + +| 动词 | 谱系来源 | 目的 | 主要 reader 支撑 | +|---|---|---|---| +| `ls` | `h5ls`、`ncdump -h` | tree 模型列 device,table 模型列 table,一行一个名字 | `get_all_device_ids()`、`get_all_table_schemas()` | +| `schema` | `parquet-cli schema`、Avro `getschema`、SQL `DESCRIBE` | 输出序列或列的类型信息 | `get_timeseries_metadata()`、`get_timeseries_schema()` | +| `meta` | `parquet-cli meta/footer`、Avro `getmeta` | 输出文件级摘要:模型、版本、命名空间规模、全局时间范围、Bloom filter、文件大小 | reader 元数据 + 文件系统元数据 | +| `stats` | `parquet-cli column-index/check-stats`、ORC statistics、SQL `SUMMARIZE` | 输出每条序列的 count、时间范围、min、max、first、last、sum | `get_timeseries_metadata()` 统计量 | +| `head` | `parquet-cli head`、`pqrs head`、SQL `LIMIT` | 输出前 N 行 | 共享 row query 路径 | +| `cat` | `parquet-cli cat/scan`、Avro `cat`/`tojson`、ORC `data` | 流式输出匹配行 | 共享 row query 路径 | +| `count` | `pqrs rowcount`、ORC `count`、Avro `count`、SQL `count(*)` | 不扫描数据页,从统计量输出行数 | `get_timeseries_metadata()` 统计量 | +| `sample` | `pqrs sample`、SQL sampling | 输出可复现样本行 | 共享 row query 路径 + 确定性抽样 | + +`select` **不是**独立动词。它实际承载的是投影、时间过滤、limit 和 offset;这些能力作为 +`head`、`cat`、`sample` 的共享参数存在,与 Parquet 工具把列选择挂到行输出命令上的习惯 +一致。 + +## 5. 命令语义 + +### `ls` + +输出顶层逻辑命名空间:tree 模型每行一个 device ID,table 模型每行一个 table name。默认 +输出刻意保持简单稳定,便于管道处理;measurement / column 级细节由 `schema` 负责。 + +### `schema` + +输出统一的逻辑 schema 表: + +```text +target, measurement, datatype, encoding, compression +``` + +tree 模型下 `target` 是 device、`measurement` 是测点;table 模型下 `target` 是 table、 +`measurement` 是列名。若当前公开 API 能拿到 datatype 但拿不到 encoding/compression +(如 table 模型),CSV/TSV 输出空字段,JSON 输出 `null`。`-m` 可投影到指定列。 + +### `meta` + +输出无需解码数据页即可回答的文件级信息: + +```text +file, model, version, device_count, table_count, series_count, +start_time, end_time, bloom_filter, file_size_bytes +``` + +对应 Parquet `meta`/`footer`:先快速了解文件,再决定是否继续查看 schema、stats 或 +行数据。若某字段当前公开 reader API 无法直接暴露(如 `version`、`bloom_filter`),输出 +空值而不是扫描数据页。 + +### `stats` + +输出每条序列的统计量: + +```text +target, measurement, count, start_time, end_time, min, max, first, last, sum +``` + +直接暴露 TsFile 的格式优势:Chunk/Page 级统计量包含 count 和数值摘要,很多查看问题不 +需要读取或解码数据页。`min`/`max`/`first`/`last`/`sum` 按类型可空(如布尔无 min/max, +文本无 sum)。 + +### `head` 与 `cat` + +行输出命令: + +- `head` 默认输出前 10 行,并接受 `-n, --limit` 覆盖行数。 +- `cat` 默认流式输出全部匹配行,除非显式指定 limit。 +- 两者都通过共享 row query 路径接受投影(`-m`)、时间范围(`--start`/`--end`)、offset。 + +`head` 本质上等价于带默认 limit 的 `cat`。 + +### `count` + +从统计量读取行数,不通过 row iterator 扫描数据。这是 TsFile 优于 `parquet-cli` 表面的 +地方(后者没有独立 row-count 子命令)。作用域规则: + +- 不指定作用域:输出所有序列的 count,并给出总数行; +- `--device`:限定某个 tree-model device; +- `--table`:限定某个 table-model table。 + +### `sample` + +通过共享 row query 和确定性抽样输出 N 条样本行,默认 N=10,接受 `--seed` 保证可复现。 +实现使用 reservoir sampling。设计要求:同一文件、作用域、投影、时间范围、limit 和 seed +下输出稳定。 + +## 6. 共享参数 + +| 参数 | 含义 | 适用命令 | +|---|---|---| +| `-f, --format csv\|tsv\|json\|table` | 输出格式;默认随 stdout 是否为 TTY 自适应 | 全部 | +| `-d, --device ` | 限定 tree-model device | 行输出命令、`schema`、`stats`、`count` | +| `-t, --table ` | 限定 table-model table | 行输出命令、`schema`、`stats`、`count` | +| `-m, --measurements a,b,c` | measurement / column 投影 | `schema`、`head`、`cat`、`sample` | +| `-n, --limit N` | 最大输出行数;`head` 用它作为行数 | `head`、`cat`、`sample` | +| `--offset N` | 跳过开头 N 行 | `head`、`cat` | +| `--start ` / `--end ` | epoch milliseconds 时间范围,闭区间 | `head`、`cat`、`sample` | +| `--seed N` | 可复现抽样种子 | `sample` | +| `--no-header` | 不输出表头 | 表格类输出 | +| `--model tree\|table` | 强制模型,覆盖自动检测 | 全部 | +| `-h, --help` / `--version` | 帮助和版本 | 顶层和单命令 | + +参数与命令不匹配时按 usage error 处理(退出码 `1`,错误到 stderr)。已实现的组合校验 +(`run_cli.cc::validate_command_flags`): + +- `--seed` 仅对 `sample` 有效; +- `--offset` 对 `sample` 无效; +- `--device` 与 `--table` 不能同时使用; +- `--limit >= -1`、`--offset >= 0`、`--start <= --end`。 + +## 7. Tree 与 table 模型 + +模型检测规则自动化: + +```text +get_all_table_schemas() non-empty => table model +otherwise => tree model +``` + +`--model tree|table` 可覆盖自动检测。统一命令面下的行为: + +- `ls` 在 tree 文件中列 device,在 table 文件中列 table。 +- `schema`、`stats`、`count` 可用 `--device` 或 `--table` 收窄作用域。 +- 行输出始终把时间列视为第一列;tree 模型用 device + measurements,table 模型用 + table + columns。 + +## 8. 输出格式与退出码 + +formatter(`format/output_format.*`、`format/result_set_format.*`): + +- `table`:面向人的对齐表格;stdout 是终端时默认使用。 +- `tsv`:tab 分隔;stdout 被 pipe 或 redirect 时默认使用。 +- `csv`:按 RFC 4180 引号规则输出(字段含分隔符/引号/换行时加引号,内部引号双写)。 +- `json`:NDJSON,一行一个 JSON object;数值/布尔裸输出,其余加引号,null 输出 `null`。 + +null 在 CSV/TSV 中输出为空字段。时间戳输出存储中的 epoch milliseconds 整数(ISO 格式化 +是后续工作)。数据→stdout,诊断/usage/错误→stderr。 + +退出码: + +| 退出码 | 条件 | +|---|---| +| `0` | 成功 | +| `1` | usage 或参数错误 | +| `2` | 文件打不开或文件损坏 | +| `3` | 查询或运行时错误 | + +`ReadFile::open`(`cpp/src/file/read_file.cc`)原先向 stdout 打印打开错误,会污染 +`tsfile cat f | jq`,已改为向 stderr 输出。 + +## 9. 架构 + +```text +cpp/tools/ +├── CMakeLists.txt # OBJECT 库 tsfile_cli_obj + 可执行文件 tsfile +├── tools_main.cc # main(): 转发 argv 给 run_cli +├── cli/ +│ ├── exit_codes.h # kExitOk/kExitUsage/kExitFile/kExitRuntime +│ ├── cli_args.h / .cc # ParsedArgs + parse_args() +│ └── run_cli.h / .cc # 顶层 usage、白名单、flag 组合校验、reader open、分发 +├── format/ +│ ├── output_format.h / .cc # 纯层:resolve_format、转义、类型名、RowWriter +│ └── result_set_format.h/.cc # ResultSet 泵:cell_to_string、write_result_set[_sampled] +└── commands/ + ├── commands.h # is_table_model + run_row_query + cmd_* 声明 + ├── row_query.cc # head/cat/sample 共用的 query 构造 + ├── stat_table.h / .cc # collect_series_stats / collect_file_summary / 统计值格式化 + ├── cmd_ls.cc cmd_schema.cc cmd_meta.cc cmd_stats.cc + └── cmd_head.cc cmd_cat.cc cmd_count.cc cmd_sample.cc + +cpp/test/tools/ +├── cli_test_util.h # 写一个 table-model fixture .tsfile 到临时路径 +├── cli_args_test.cc # parse_args + run_cli 参数/分发单元测试 +├── output_format_test.cc # 纯 formatter 单元测试 +├── stat_table_test.cc # 统计值格式化与汇总 helper 单元测试 +└── command_e2e_test.cc # 通过 run_cli in-process 跑每个命令的 E2E(含确定性抽样) +``` + +设计要点: + +- CLI 逻辑编译为 OBJECT 库 `tsfile_cli_obj`,既链入可执行文件 `tsfile`,也链入 + `TsFile_Test`,使命令可在进程内对注入的 `std::ostream&` 测试。 +- formatter 分纯层(无 reader 依赖、重单元测试)和 `ResultSet` 泵层(E2E 测试)。 +- 手写参数 parser,零新依赖。 +- 不修改存储引擎:所有命令使用现有 reader 元数据或现有 row query API。 + +构建:`cpp/CMakeLists.txt` 提供 `option(BUILD_TOOLS ... ON)`,开启时 +`add_subdirectory(tools)`,链接 `libtsfile` 产出 `tsfile` 可执行文件,并 `install()` 到 +`bin`。`cpp/tools/CMakeLists.txt` 用 `GLOB_RECURSE` 收集源文件,新增 `.cc` 自动纳入。 + +## 10. 实现现状(2026-06-02) + +工作树处于「半迁移」状态,剩余工作详见 +`docs/superpowers/plans/2026-06-02-tsfile-cli.md`: + +- **已提交**(commit `a392a56f`,仅 `cli/` 层 + `cli_args_test.cc`): + - 8 动词命令面、usage/help、白名单、`--seed` 解析、`validate_command_flags`; + - `select` 已从白名单移除(`select` → `Unknown command`,退出码 1); + - `meta`/`count`/`sample` 在白名单内,但被 `is_unimplemented_command` 拦截,返回 + “command not implemented yet”。 +- **已实现但未提交**(untracked):`ls`、`schema`、`stats`(仅 5 列旧版)、`head`、`cat` + 及其依赖(`commands/`、`format/`、`tools_main.cc`、`CMakeLists.txt` 等)和 E2E 测试。 +- **遗留不一致**: + - `cmd_select.cc` 与 `commands.h` 中 `cmd_select` 声明仍在,但不被分发——死代码。 + - `command_e2e_test.cc` 仍以 `select` 命令测试 `SelectWithTimeRange` / + `SelectJsonIsNdjson`,与已移除 `select` 的命令面冲突——若构建会失败。 +- **尚未实现**:`stats` 扩展到 min/max/first/last/sum;`meta`;`count`;`sample`。 + +## 11. 测试 + +测试放在 `cpp/test/tools/`,使用 Google Test,只验证 CLI 行为和真实 reader 路径,不新增 +存储引擎行为。 + +单元测试覆盖:`cli_args`(命令与参数解析、`--seed`、未知命令/参数、命令/参数不匹配); +formatter(csv/tsv/json/table,含 null、分隔符、引号、换行);模型检测(含 `--model` +覆盖);统计值格式化(`statistic_value_cells` 各类型)。 + +E2E 测试:生成 table-model fixture,通过进程内 `run_cli` 跑每个命令,断言退出码、stdout +形状、stderr 行为;确定性抽样由固定 `--seed` 跑两次断言输出一致覆盖;TTY 自适应格式由 +单元测试覆盖,E2E 显式指定 `--format`。 + +## 12. 被拒绝的方案 + +- **保留 `select` 动词**:拒绝。它与 `cat`/`head` 重叠,真正提供的是投影和过滤,应落到 + 共享参数上(Parquet 风格)。 +- **把 `count` 折叠进 `stats` 或 `meta`**:拒绝。`count` 足够常用,且 TsFile 可从统计量 + 低成本回答,显式保留能让这个格式优势更易被发现。 +- **为完全模仿 Parquet 删除 `ls`**:拒绝。TsFile 不总是单逻辑表,多 device/多 table + 命名空间使 `ls` 成为用户经常需要的第一个命令。 +- **现在实现写入或转换命令**:拒绝。本阶段只读命令风险更低,正对应调研结论。 + +## 13. 后续工作 + +- 与 Java `TsFileSketchTool` 对齐的结构 dump 命令。 +- ISO 时间格式化;超出时间范围和 measurement 投影的复杂谓词。 +- 写入、转换、合并、重写命令。 +- DuckDB / ClickHouse / VisiData reader,让 TsFile 进入多格式查询/查看工具 + (对应主报告 §6.3.3「缺连接器宿主的适配层」)。 +- 只读 FUSE 命名空间或 TableFS 视图(若项目选择通过文件系统路径暴露 TsFile)。 From e3b4ce52b720619ef88a4b54a37049a64b3d62c1 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:26:54 +0800 Subject: [PATCH 09/41] Extend tsfile stats with value summaries and shared stat helpers --- cpp/test/tools/command_e2e_test.cc | 9 +- cpp/test/tools/stat_table_test.cc | 50 +++++++ cpp/tools/commands/cmd_stats.cc | 56 +++----- cpp/tools/commands/stat_table.cc | 216 +++++++++++++++++++++++++++++ cpp/tools/commands/stat_table.h | 69 +++++++++ 5 files changed, 359 insertions(+), 41 deletions(-) create mode 100644 cpp/test/tools/stat_table_test.cc create mode 100644 cpp/tools/commands/stat_table.cc create mode 100644 cpp/tools/commands/stat_table.h diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 09289910b..a1d4f0798 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -105,10 +105,11 @@ TEST(CliE2E, StatsReportsCountAndTimeRange) { std::ostringstream err; int code = tsfile_cli::run_cli({"stats", "-f", "tsv", f.path}, out, err); EXPECT_EQ(code, 0); - EXPECT_NE( - out.str().find("target\tmeasurement\tcount\tstart_time\tend_time"), - std::string::npos); - EXPECT_NE(out.str().find("s1\t5\t0\t4"), std::string::npos); + EXPECT_NE(out.str().find("target\tmeasurement\tcount\tstart_time\tend_" + "time\tmin\tmax\tfirst\tlast\tsum"), + std::string::npos); + EXPECT_NE(out.str().find("s1\t5\t0\t4\t0\t40\t0\t40\t100"), + std::string::npos); } TEST(CliE2E, HeadProjectsAndLimits) { diff --git a/cpp/test/tools/stat_table_test.cc b/cpp/test/tools/stat_table_test.cc new file mode 100644 index 000000000..7beb58c13 --- /dev/null +++ b/cpp/test/tools/stat_table_test.cc @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/stat_table.h" + +#include + +#include "common/statistic.h" + +TEST(StatTableTest, Int64StatisticCellsContainValueSummaries) { + storage::Int64Statistic st; + st.update(1, static_cast(10)); + st.update(3, static_cast(30)); + tsfile_cli::StatisticCells cells = tsfile_cli::statistic_value_cells(&st); + EXPECT_EQ(cells.values[0], "10"); + EXPECT_EQ(cells.values[1], "30"); + EXPECT_EQ(cells.values[2], "10"); + EXPECT_EQ(cells.values[3], "30"); + EXPECT_EQ(cells.values[4], "40"); + EXPECT_EQ(cells.is_null, + std::vector({false, false, false, false, false})); +} + +TEST(StatTableTest, BooleanStatisticLeavesMinMaxNull) { + storage::BooleanStatistic st; + st.update(1, true); + st.update(2, false); + tsfile_cli::StatisticCells cells = tsfile_cli::statistic_value_cells(&st); + EXPECT_TRUE(cells.is_null[0]); + EXPECT_TRUE(cells.is_null[1]); + EXPECT_EQ(cells.values[2], "true"); + EXPECT_EQ(cells.values[3], "false"); + EXPECT_EQ(cells.values[4], "1"); +} diff --git a/cpp/tools/commands/cmd_stats.cc b/cpp/tools/commands/cmd_stats.cc index 65b9ed3ba..1af68e298 100644 --- a/cpp/tools/commands/cmd_stats.cc +++ b/cpp/tools/commands/cmd_stats.cc @@ -17,56 +17,38 @@ * under the License. */ -#include #include +#include #include "cli/exit_codes.h" #include "commands/commands.h" -#include "common/statistic.h" -#include "reader/tsfile_reader.h" +#include "commands/stat_table.h" namespace tsfile_cli { int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { RowWriter w(out, fmt, - {"target", "measurement", "count", "start_time", "end_time"}, + {"target", "measurement", "count", "start_time", "end_time", + "min", "max", "first", "last", "sum"}, {common::STRING, common::STRING, common::INT64, common::INT64, - common::INT64}, + common::INT64, common::STRING, common::STRING, common::STRING, + common::STRING, common::STRING}, args.no_header); - storage::DeviceTimeseriesMetadataMap meta = - reader.get_timeseries_metadata(); - for (auto& kv : meta) { - std::string target = kv.first ? kv.first->get_device_name() : ""; - if (!args.device.empty() && target != args.device) { - continue; - } - if (!args.table.empty() && kv.first && - kv.first->get_table_name() != args.table) { - continue; - } - for (auto& ts : kv.second) { - if (!ts) { - continue; - } - std::string m = ts->get_measurement_name().to_std_string(); - if (!args.measurements.empty() && - std::find(args.measurements.begin(), args.measurements.end(), - m) == args.measurements.end()) { - continue; - } - storage::Statistic* st = ts->get_statistic(); - if (st != nullptr) { - w.write({target, m, std::to_string(st->get_count()), - std::to_string(st->start_time_), - std::to_string(st->end_time_)}, - {false, false, false, false, false}); - } else { - w.write({target, m, "", "", ""}, - {false, false, true, true, true}); - } - } + std::vector rows = collect_series_stats(args, reader); + for (const SeriesStatRow& row : rows) { + std::vector cells = { + row.target, row.measurement, std::to_string(row.count), + std::to_string(row.start_time), std::to_string(row.end_time)}; + cells.insert(cells.end(), row.value_cells.values.begin(), + row.value_cells.values.end()); + + std::vector nulls = {false, false, false, row.count == 0, + row.count == 0}; + nulls.insert(nulls.end(), row.value_cells.is_null.begin(), + row.value_cells.is_null.end()); + w.write(cells, nulls); } w.finish(); return kExitOk; diff --git a/cpp/tools/commands/stat_table.cc b/cpp/tools/commands/stat_table.cc new file mode 100644 index 000000000..73f94b00c --- /dev/null +++ b/cpp/tools/commands/stat_table.cc @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "commands/stat_table.h" + +#include +#include +#include +#include + +#include "commands/commands.h" +#include "common/statistic.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { +namespace { + +template +std::string value_to_string(T value) { + std::ostringstream ss; + ss << value; + return ss.str(); +} + +std::string bool_to_string(bool value) { return value ? "true" : "false"; } + +std::string string_to_std(const common::String& value) { + return value.to_std_string(); +} + +long long file_size(const std::string& path) { + std::ifstream in(path.c_str(), std::ios::binary | std::ios::ate); + if (!in.good()) { + return 0; + } + return static_cast(in.tellg()); +} + +} // namespace + +StatisticCells statistic_value_cells(storage::Statistic* st) { + StatisticCells cells; + cells.values.assign(5, ""); + cells.is_null.assign(5, true); + if (st == nullptr || st->get_count() == 0) { + return cells; + } + + switch (st->get_type()) { + case common::BOOLEAN: { + auto* s = static_cast(st); + cells.values = {"", "", bool_to_string(s->first_value_), + bool_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {true, true, false, false, false}; + break; + } + case common::INT32: + case common::DATE: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::INT64: + case common::TIMESTAMP: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::FLOAT: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::DOUBLE: { + auto* s = static_cast(st); + cells.values = {value_to_string(s->min_value_), + value_to_string(s->max_value_), + value_to_string(s->first_value_), + value_to_string(s->last_value_), + value_to_string(s->sum_value_)}; + cells.is_null = {false, false, false, false, false}; + break; + } + case common::STRING: { + auto* s = static_cast(st); + cells.values = {string_to_std(s->min_value_), + string_to_std(s->max_value_), + string_to_std(s->first_value_), + string_to_std(s->last_value_), ""}; + cells.is_null = {false, false, false, false, true}; + break; + } + case common::TEXT: { + auto* s = static_cast(st); + cells.values = {"", "", string_to_std(s->first_value_), + string_to_std(s->last_value_), ""}; + cells.is_null = {true, true, false, false, true}; + break; + } + default: + break; + } + return cells; +} + +std::vector collect_series_stats(const ParsedArgs& args, + storage::TsFileReader& reader) { + std::vector rows; + storage::DeviceTimeseriesMetadataMap meta = + reader.get_timeseries_metadata(); + for (auto& kv : meta) { + std::string target = kv.first ? kv.first->get_device_name() : ""; + if (!args.device.empty() && target != args.device) { + continue; + } + if (!args.table.empty() && kv.first && + kv.first->get_table_name() != args.table) { + continue; + } + for (auto& ts : kv.second) { + if (!ts) { + continue; + } + std::string measurement = + ts->get_measurement_name().to_std_string(); + if (!args.measurements.empty() && + std::find(args.measurements.begin(), args.measurements.end(), + measurement) == args.measurements.end()) { + continue; + } + storage::Statistic* st = ts->get_statistic(); + SeriesStatRow row; + row.target = target; + row.measurement = measurement; + if (st != nullptr) { + row.count = st->get_count(); + row.start_time = st->start_time_; + row.end_time = st->end_time_; + row.value_cells = statistic_value_cells(st); + } else { + row.value_cells.values.assign(5, ""); + row.value_cells.is_null.assign(5, true); + } + rows.push_back(row); + } + } + return rows; +} + +FileSummary collect_file_summary(const ParsedArgs& args, + storage::TsFileReader& reader) { + FileSummary s; + s.file = args.file; + s.model = is_table_model(args, reader) ? "table" : "tree"; + s.device_count = + static_cast(reader.get_all_device_ids().size()); + s.table_count = + static_cast(reader.get_all_table_schemas().size()); + s.file_size_bytes = file_size(args.file); + + ParsedArgs all = args; + all.device.clear(); + all.table.clear(); + all.measurements.clear(); + std::vector rows = collect_series_stats(all, reader); + s.series_count = static_cast(rows.size()); + long long min_start = std::numeric_limits::max(); + long long max_end = std::numeric_limits::min(); + for (const SeriesStatRow& row : rows) { + if (row.count <= 0) { + continue; + } + min_start = std::min(min_start, row.start_time); + max_end = std::max(max_end, row.end_time); + s.has_time_range = true; + } + if (s.has_time_range) { + s.start_time = min_start; + s.end_time = max_end; + } + return s; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/stat_table.h b/cpp/tools/commands/stat_table.h new file mode 100644 index 000000000..e79bd20c0 --- /dev/null +++ b/cpp/tools/commands/stat_table.h @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_STAT_TABLE_H +#define TSFILE_CLI_STAT_TABLE_H + +#include +#include + +#include "cli/cli_args.h" + +namespace storage { +class Statistic; +class TsFileReader; +} // namespace storage + +namespace tsfile_cli { + +struct StatisticCells { + std::vector values; + std::vector is_null; +}; + +struct SeriesStatRow { + std::string target; + std::string measurement; + long long count = 0; + long long start_time = 0; + long long end_time = 0; + StatisticCells value_cells; +}; + +struct FileSummary { + std::string file; + std::string model; + long long device_count = 0; + long long table_count = 0; + long long series_count = 0; + long long start_time = 0; + long long end_time = 0; + bool has_time_range = false; + long long file_size_bytes = 0; +}; + +StatisticCells statistic_value_cells(storage::Statistic* st); +std::vector collect_series_stats(const ParsedArgs& args, + storage::TsFileReader& reader); +FileSummary collect_file_summary(const ParsedArgs& args, + storage::TsFileReader& reader); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_STAT_TABLE_H From 8810f7003a4544df6e2c279cb5f54ca4b959d406 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:32:01 +0800 Subject: [PATCH 10/41] Add tsfile meta command --- cpp/test/tools/cli_args_test.cc | 2 +- cpp/test/tools/command_e2e_test.cc | 14 ++++++++ cpp/tools/cli/run_cli.cc | 4 ++- cpp/tools/commands/cmd_meta.cc | 52 ++++++++++++++++++++++++++++++ cpp/tools/commands/commands.h | 2 ++ 5 files changed, 72 insertions(+), 2 deletions(-) create mode 100644 cpp/tools/commands/cmd_meta.cc diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 43a976c56..4b109ae82 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -144,7 +144,7 @@ TEST(RunCliTest, OffsetOnSampleIsUsageError) { } TEST(RunCliTest, NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen) { - for (const char* command : {"meta", "count", "sample"}) { + for (const char* command : {"count", "sample"}) { std::ostringstream out; std::ostringstream err; int code = tsfile_cli::run_cli( diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index a1d4f0798..8377e8140 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -154,3 +154,17 @@ TEST(CliE2E, CatJsonIsNdjson) { EXPECT_EQ(code, 0); EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); } + +TEST(CliE2E, MetaReportsFileSummary) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"meta", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_TRUE(err.str().empty()); + EXPECT_NE(out.str().find("file\tmodel\tversion\tdevice_count\ttable_" + "count\tseries_count\tstart_time\tend_time\tbloom_" + "filter\tfile_size_bytes"), + std::string::npos); + EXPECT_NE(out.str().find("\ttable\t"), std::string::npos); +} diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index def598f3e..cca659a4b 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -70,7 +70,7 @@ bool is_known_command(const std::string& c) { } bool is_unimplemented_command(const std::string& c) { - static const std::set kCmds = {"meta", "count", "sample"}; + static const std::set kCmds = {"count", "sample"}; return kCmds.count(c) != 0; } @@ -161,6 +161,8 @@ int run_cli(const std::vector& args, std::ostream& out, code = cmd_ls(p, reader, fmt, out, err); } else if (p.command == "schema") { code = cmd_schema(p, reader, fmt, out, err); + } else if (p.command == "meta") { + code = cmd_meta(p, reader, fmt, out, err); } else if (p.command == "stats") { code = cmd_stats(p, reader, fmt, out, err); } else if (p.command == "head") { diff --git a/cpp/tools/commands/cmd_meta.cc b/cpp/tools/commands/cmd_meta.cc new file mode 100644 index 000000000..9b6d8f07a --- /dev/null +++ b/cpp/tools/commands/cmd_meta.cc @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "commands/stat_table.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w(out, fmt, + {"file", "model", "version", "device_count", "table_count", + "series_count", "start_time", "end_time", "bloom_filter", + "file_size_bytes"}, + {common::STRING, common::STRING, common::STRING, common::INT64, + common::INT64, common::INT64, common::INT64, common::INT64, + common::STRING, common::INT64}, + args.no_header); + + FileSummary s = collect_file_summary(args, reader); + w.write({s.file, s.model, "", std::to_string(s.device_count), + std::to_string(s.table_count), std::to_string(s.series_count), + s.has_time_range ? std::to_string(s.start_time) : "", + s.has_time_range ? std::to_string(s.end_time) : "", "", + std::to_string(s.file_size_bytes)}, + {false, false, true, false, false, false, !s.has_time_range, + !s.has_time_range, true, false}); + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/commands.h b/cpp/tools/commands/commands.h index 39a3ac2b0..a55bf42c3 100644 --- a/cpp/tools/commands/commands.h +++ b/cpp/tools/commands/commands.h @@ -41,6 +41,8 @@ int cmd_ls(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, From 9e6ec59b4e206f8c9627938ecebac3c81b8b96fa Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:34:25 +0800 Subject: [PATCH 11/41] Add tsfile count command --- cpp/test/tools/cli_args_test.cc | 2 +- cpp/test/tools/command_e2e_test.cc | 12 ++++++++ cpp/tools/cli/run_cli.cc | 4 ++- cpp/tools/commands/cmd_count.cc | 47 ++++++++++++++++++++++++++++++ cpp/tools/commands/commands.h | 2 ++ 5 files changed, 65 insertions(+), 2 deletions(-) create mode 100644 cpp/tools/commands/cmd_count.cc diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 4b109ae82..4239f6bac 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -144,7 +144,7 @@ TEST(RunCliTest, OffsetOnSampleIsUsageError) { } TEST(RunCliTest, NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen) { - for (const char* command : {"count", "sample"}) { + for (const char* command : {"sample"}) { std::ostringstream out; std::ostringstream err; int code = tsfile_cli::run_cli( diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 8377e8140..732e647e3 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -168,3 +168,15 @@ TEST(CliE2E, MetaReportsFileSummary) { std::string::npos); EXPECT_NE(out.str().find("\ttable\t"), std::string::npos); } + +TEST(CliE2E, CountReportsSeriesCountsAndTotal) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"count", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_TRUE(err.str().empty()); + EXPECT_NE(out.str().find("target\tmeasurement\tcount"), std::string::npos); + EXPECT_NE(out.str().find("\ts1\t5"), std::string::npos); + EXPECT_NE(out.str().find("total\t\t"), std::string::npos); +} diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index cca659a4b..784b44fa8 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -70,7 +70,7 @@ bool is_known_command(const std::string& c) { } bool is_unimplemented_command(const std::string& c) { - static const std::set kCmds = {"count", "sample"}; + static const std::set kCmds = {"sample"}; return kCmds.count(c) != 0; } @@ -169,6 +169,8 @@ int run_cli(const std::vector& args, std::ostream& out, code = cmd_head(p, reader, fmt, out, err); } else if (p.command == "cat") { code = cmd_cat(p, reader, fmt, out, err); + } else if (p.command == "count") { + code = cmd_count(p, reader, fmt, out, err); } else { err << "Unknown command: " << p.command << "\n"; code = kExitUsage; diff --git a/cpp/tools/commands/cmd_count.cc b/cpp/tools/commands/cmd_count.cc new file mode 100644 index 000000000..7cb592253 --- /dev/null +++ b/cpp/tools/commands/cmd_count.cc @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "commands/stat_table.h" + +namespace tsfile_cli { + +int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { + RowWriter w(out, fmt, {"target", "measurement", "count"}, + {common::STRING, common::STRING, common::INT64}, + args.no_header); + + long long total = 0; + std::vector rows = collect_series_stats(args, reader); + for (const SeriesStatRow& row : rows) { + total += row.count; + w.write({row.target, row.measurement, std::to_string(row.count)}, + {false, false, false}); + } + w.write({"total", "", std::to_string(total)}, {false, true, false}); + w.finish(); + return kExitOk; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/commands.h b/cpp/tools/commands/commands.h index a55bf42c3..dddbbb247 100644 --- a/cpp/tools/commands/commands.h +++ b/cpp/tools/commands/commands.h @@ -43,6 +43,8 @@ int cmd_schema(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, From b3292b46619df905039c400e5aaeea59fd8e1120 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:37:55 +0800 Subject: [PATCH 12/41] Add deterministic tsfile sample command --- cpp/test/tools/cli_args_test.cc | 14 ---- cpp/test/tools/command_e2e_test.cc | 23 ++++++ cpp/tools/cli/run_cli.cc | 12 +-- cpp/tools/commands/cmd_sample.cc | 112 ++++++++++++++++++++++++++ cpp/tools/commands/commands.h | 2 + cpp/tools/format/result_set_format.cc | 75 +++++++++++++++++ cpp/tools/format/result_set_format.h | 4 + 7 files changed, 218 insertions(+), 24 deletions(-) create mode 100644 cpp/tools/commands/cmd_sample.cc diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 4239f6bac..464eb9597 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -142,17 +142,3 @@ TEST(RunCliTest, OffsetOnSampleIsUsageError) { EXPECT_NE(err.str().find("--offset is not valid for sample"), std::string::npos); } - -TEST(RunCliTest, NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen) { - for (const char* command : {"sample"}) { - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli( - {command, "definitely_missing.tsfile"}, out, err); - EXPECT_EQ(code, 1) << command; - EXPECT_NE(err.str().find("command not implemented yet"), - std::string::npos) - << command; - EXPECT_NE(err.str().find(command), std::string::npos) << command; - } -} diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 732e647e3..f2a520fa4 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -180,3 +180,26 @@ TEST(CliE2E, CountReportsSeriesCountsAndTotal) { EXPECT_NE(out.str().find("\ts1\t5"), std::string::npos); EXPECT_NE(out.str().find("total\t\t"), std::string::npos); } + +TEST(CliE2E, SampleIsReproducibleWithSeed) { + Fixture f; + std::ostringstream out1; + std::ostringstream err1; + std::ostringstream out2; + std::ostringstream err2; + + int code1 = tsfile_cli::run_cli( + {"sample", "-m", "s1", "-n", "3", "--seed", "7", "-f", "tsv", f.path}, + out1, err1); + int code2 = tsfile_cli::run_cli( + {"sample", "-m", "s1", "-n", "3", "--seed", "7", "-f", "tsv", f.path}, + out2, err2); + + EXPECT_EQ(code1, 0); + EXPECT_EQ(code2, 0); + EXPECT_TRUE(err1.str().empty()); + EXPECT_TRUE(err2.str().empty()); + EXPECT_EQ(out1.str(), out2.str()); + EXPECT_EQ(count_lines(out1.str()), 4u); + EXPECT_NE(out1.str().find("time\ts1\n"), std::string::npos); +} diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 784b44fa8..dfe4e4f62 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -69,11 +69,6 @@ bool is_known_command(const std::string& c) { return kCmds.count(c) != 0; } -bool is_unimplemented_command(const std::string& c) { - static const std::set kCmds = {"sample"}; - return kCmds.count(c) != 0; -} - bool validate_command_flags(const ParsedArgs& p, std::ostream& err) { if (p.has_seed && p.command != "sample") { err << "Error: --seed is only valid for sample\n"; @@ -139,11 +134,6 @@ int run_cli(const std::vector& args, std::ostream& out, print_usage(err); return kExitUsage; } - if (is_unimplemented_command(p.command)) { - err << "Error: command not implemented yet: " << p.command << "\n"; - print_usage(err); - return kExitUsage; - } storage::libtsfile_init(); storage::TsFileReader reader; @@ -171,6 +161,8 @@ int run_cli(const std::vector& args, std::ostream& out, code = cmd_cat(p, reader, fmt, out, err); } else if (p.command == "count") { code = cmd_count(p, reader, fmt, out, err); + } else if (p.command == "sample") { + code = cmd_sample(p, reader, fmt, out, err); } else { err << "Unknown command: " << p.command << "\n"; code = kExitUsage; diff --git a/cpp/tools/commands/cmd_sample.cc b/cpp/tools/commands/cmd_sample.cc new file mode 100644 index 000000000..75edc2f34 --- /dev/null +++ b/cpp/tools/commands/cmd_sample.cc @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include +#include +#include +#include + +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/device_id.h" +#include "common/schema.h" +#include "format/result_set_format.h" +#include "reader/tsfile_reader.h" + +namespace tsfile_cli { + +int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err) { + const int64_t start = args.has_start ? static_cast(args.start) + : std::numeric_limits::min(); + const int64_t end = args.has_end ? static_cast(args.end) + : std::numeric_limits::max(); + storage::ResultSet* rs = nullptr; + int qret = 0; + + if (is_table_model(args, reader)) { + std::string table_name = args.table; + if (table_name.empty()) { + auto schemas = reader.get_all_table_schemas(); + if (schemas.empty() || !schemas[0]) { + err << "Error: no table found in file\n"; + return kExitRuntime; + } + table_name = schemas[0]->get_table_name(); + } + std::vector cols = args.measurements; + if (cols.empty()) { + auto ts = reader.get_table_schema(table_name); + if (ts) { + cols = ts->get_measurement_names(); + } + } + qret = reader.query(table_name, cols, start, end, rs); + } else { + std::vector devices; + if (!args.device.empty()) { + devices.push_back(args.device); + } else { + for (auto& d : reader.get_all_device_ids()) { + if (d) { + devices.push_back(d->get_device_name()); + } + } + } + std::vector paths; + for (const std::string& dev : devices) { + std::vector ms = args.measurements; + if (ms.empty()) { + auto did = std::make_shared(dev); + std::vector sch; + if (reader.get_timeseries_schema(did, sch) == 0) { + for (auto& m : sch) { + ms.push_back(m.measurement_name_); + } + } + } + for (const std::string& m : ms) { + paths.push_back(dev + "." + m); + } + } + if (paths.empty()) { + err << "Error: no time series found\n"; + return kExitRuntime; + } + qret = reader.query(paths, start, end, rs); + } + + if (qret != 0 || rs == nullptr) { + err << "Error: query failed (code " << qret << ")\n"; + if (rs != nullptr) { + reader.destroy_query_data_set(rs); + } + return kExitRuntime; + } + + const long long limit = args.limit < 0 ? 10 : args.limit; + const unsigned long long seed = + args.has_seed ? static_cast(args.seed) : 0ULL; + int wret = + write_result_set_sampled(rs, fmt, args.no_header, out, limit, seed); + reader.destroy_query_data_set(rs); + return wret == 0 ? kExitOk : kExitRuntime; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/commands.h b/cpp/tools/commands/commands.h index dddbbb247..8ca49b994 100644 --- a/cpp/tools/commands/commands.h +++ b/cpp/tools/commands/commands.h @@ -51,6 +51,8 @@ int cmd_head(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, + OutputFormat fmt, std::ostream& out, std::ostream& err); } // namespace tsfile_cli diff --git a/cpp/tools/format/result_set_format.cc b/cpp/tools/format/result_set_format.cc index bf30f0a6f..4c847c5a2 100644 --- a/cpp/tools/format/result_set_format.cc +++ b/cpp/tools/format/result_set_format.cc @@ -21,6 +21,7 @@ #include #include +#include #include #include @@ -107,4 +108,78 @@ int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, return code; } +namespace { + +struct BufferedRow { + std::vector cells; + std::vector nulls; +}; + +BufferedRow read_current_row(storage::ResultSet* rs, + const std::vector& types) { + BufferedRow row; + const uint32_t ncol = static_cast(types.size()); + row.cells.assign(ncol, ""); + row.nulls.assign(ncol, false); + for (uint32_t i = 1; i <= ncol; ++i) { + if (rs->is_null(i)) { + row.nulls[i - 1] = true; + } else { + row.cells[i - 1] = cell_to_string(rs, i, types[i - 1]); + } + } + return row; +} + +} // namespace + +int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, + bool no_header, std::ostream& out, long long limit, + unsigned long long seed) { + if (limit < 0) { + limit = 10; + } + auto meta = rs->get_metadata(); + const uint32_t ncol = meta->get_column_count(); + std::vector header; + std::vector types; + header.reserve(ncol); + types.reserve(ncol); + for (uint32_t i = 1; i <= ncol; ++i) { + header.push_back(meta->get_column_name(i)); + types.push_back(meta->get_column_type(i)); + } + + std::vector reservoir; + reservoir.reserve(static_cast(limit)); + std::mt19937_64 rng(seed); + bool has_next = false; + int code = common::E_OK; + long long seen = 0; + while ((code = rs->next(has_next)) == common::E_OK && has_next) { + BufferedRow row = read_current_row(rs, types); + if (limit == 0) { + ++seen; + continue; + } + if (static_cast(reservoir.size()) < limit) { + reservoir.push_back(row); + } else { + std::uniform_int_distribution dist(0, seen); + long long idx = dist(rng); + if (idx < limit) { + reservoir[static_cast(idx)] = row; + } + } + ++seen; + } + + RowWriter writer(out, fmt, header, types, no_header); + for (const BufferedRow& row : reservoir) { + writer.write(row.cells, row.nulls); + } + writer.finish(); + return code; +} + } // namespace tsfile_cli diff --git a/cpp/tools/format/result_set_format.h b/cpp/tools/format/result_set_format.h index b49667a4d..076964850 100644 --- a/cpp/tools/format/result_set_format.h +++ b/cpp/tools/format/result_set_format.h @@ -36,6 +36,10 @@ int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, std::ostream& out, long long offset = 0, long long limit = -1); +int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, + bool no_header, std::ostream& out, long long limit, + unsigned long long seed); + } // namespace tsfile_cli #endif // TSFILE_CLI_RESULT_SET_FORMAT_H From fdec8ee1fb694e2227a448f07b444094eb97ed90 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:45:20 +0800 Subject: [PATCH 13/41] Format tsfile CLI sources with clang-format --- cpp/test/tools/command_e2e_test.cc | 12 ++++++------ cpp/tools/cli/run_cli.cc | 3 +-- cpp/tools/commands/stat_table.cc | 3 +-- 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index f2a520fa4..69929a2d8 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -137,9 +137,9 @@ TEST(CliE2E, CatWithTimeRange) { Fixture f; std::ostringstream out; std::ostringstream err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "2", "--end", - "3", "-f", "tsv", f.path}, - out, err); + int code = tsfile_cli::run_cli( + {"cat", "-m", "s1", "--start", "2", "--end", "3", "-f", "tsv", f.path}, + out, err); EXPECT_EQ(code, 0); EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); } @@ -148,9 +148,9 @@ TEST(CliE2E, CatJsonIsNdjson) { Fixture f; std::ostringstream out; std::ostringstream err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "0", "--end", - "0", "-f", "json", f.path}, - out, err); + int code = tsfile_cli::run_cli( + {"cat", "-m", "s1", "--start", "0", "--end", "0", "-f", "json", f.path}, + out, err); EXPECT_EQ(code, 0); EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); } diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index dfe4e4f62..488bcb4c4 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -64,8 +64,7 @@ void print_usage(std::ostream& os) { bool is_known_command(const std::string& c) { static const std::set kCmds = { - "ls", "schema", "meta", "stats", - "head", "cat", "count", "sample"}; + "ls", "schema", "meta", "stats", "head", "cat", "count", "sample"}; return kCmds.count(c) != 0; } diff --git a/cpp/tools/commands/stat_table.cc b/cpp/tools/commands/stat_table.cc index 73f94b00c..d09d4bd6a 100644 --- a/cpp/tools/commands/stat_table.cc +++ b/cpp/tools/commands/stat_table.cc @@ -184,8 +184,7 @@ FileSummary collect_file_summary(const ParsedArgs& args, FileSummary s; s.file = args.file; s.model = is_table_model(args, reader) ? "table" : "tree"; - s.device_count = - static_cast(reader.get_all_device_ids().size()); + s.device_count = static_cast(reader.get_all_device_ids().size()); s.table_count = static_cast(reader.get_all_table_schemas().size()); s.file_size_bytes = file_size(args.file); From cd4efe2de05534baf852ccb563f820850fc3825a Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:45:20 +0800 Subject: [PATCH 14/41] Document CMake 4.x/ANTLR4 build constraint in CLI plan --- docs/superpowers/plans/2026-06-02-tsfile-cli.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/superpowers/plans/2026-06-02-tsfile-cli.md b/docs/superpowers/plans/2026-06-02-tsfile-cli.md index 8f70a2307..5be080442 100644 --- a/docs/superpowers/plans/2026-06-02-tsfile-cli.md +++ b/docs/superpowers/plans/2026-06-02-tsfile-cli.md @@ -45,13 +45,22 @@ Google Test 1.12.1,现有 `storage::TsFileReader`、`storage::Statistic`、`Ro - 每个新建 `.h`/`.cc` 文件都以 Apache 2.0 块注释头(`/* ... */`)开头——从任一现有 `cpp/tools/**` 文件原样复制。下文代码块为简洁省略了该头,**新建文件时务必前置**。 - 所有 CLI 代码在 `namespace tsfile_cli` 内。 +- **构建环境注意(本机 CMake 4.3.2)**:bundled `third_party/antlr4-cpp-runtime-4` + 把已被移除的旧 CMake policy 设为 OLD,CMake 4.x 直接报错;必须 `--disable-antlr4` + 绕开(reader/CLI 不依赖 ANTLR4,已验证可编译可测试)。另外 `build.sh` 默认 + `build_test=0` 且无命令行开关,执行期间已临时改为 `build_test=1`(Task 6 收尾时 + `git checkout cpp/build.sh` 还原)。测试可执行文件落在 `build/Debug/test/lib/`。 - C++ 验证命令从 `cpp/` 目录运行: ```bash -bash build.sh -t=Debug -./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* +bash build.sh -t=Debug --disable-antlr4 +./build/Debug/test/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* ``` +> 计划各 Task 内 `Run:` 行仍写的是旧的 `bash build.sh -t=Debug` 和 +> `./build/Debug/lib/TsFile_Test`,请按上面这条「环境注意」统一替换为 +> `--disable-antlr4` 构建命令与 `build/Debug/test/lib/TsFile_Test` 测试路径。 + ## 起点:当前工作树状态(2026-06-02) - **已提交**(commit `a392a56f`,仅四个文件):`cpp/tools/cli/cli_args.h`、 From e9738aa704d15ff5d11e8407729b0ae87f6ec314 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 00:53:19 +0800 Subject: [PATCH 15/41] Simplify sampled writer: trust caller-normalized limit --- cpp/tools/format/result_set_format.cc | 7 ------- 1 file changed, 7 deletions(-) diff --git a/cpp/tools/format/result_set_format.cc b/cpp/tools/format/result_set_format.cc index 4c847c5a2..3f5aec471 100644 --- a/cpp/tools/format/result_set_format.cc +++ b/cpp/tools/format/result_set_format.cc @@ -136,9 +136,6 @@ BufferedRow read_current_row(storage::ResultSet* rs, int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, bool no_header, std::ostream& out, long long limit, unsigned long long seed) { - if (limit < 0) { - limit = 10; - } auto meta = rs->get_metadata(); const uint32_t ncol = meta->get_column_count(); std::vector header; @@ -158,10 +155,6 @@ int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, long long seen = 0; while ((code = rs->next(has_next)) == common::E_OK && has_next) { BufferedRow row = read_current_row(rs, types); - if (limit == 0) { - ++seen; - continue; - } if (static_cast(reservoir.size()) < limit) { reservoir.push_back(row); } else { From f16d9696e552baf9d15992220568de49de9c2eb6 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 01:18:13 +0800 Subject: [PATCH 16/41] Add tsfile-cli usage skill for inspecting .tsfile from the CLI --- .claude/skills/tsfile-cli/SKILL.md | 103 +++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 .claude/skills/tsfile-cli/SKILL.md diff --git a/.claude/skills/tsfile-cli/SKILL.md b/.claude/skills/tsfile-cli/SKILL.md new file mode 100644 index 000000000..56fe8a40b --- /dev/null +++ b/.claude/skills/tsfile-cli/SKILL.md @@ -0,0 +1,103 @@ +--- +name: tsfile-cli +description: Use when you need to inspect, preview, or export an Apache TsFile (.tsfile) from the command line — listing devices/tables, dumping schema, reading file/series metadata, counting rows, or sampling/previewing rows — via the project's read-only C++ `tsfile` CLI in cpp/tools. +--- + +# tsfile CLI + +## Overview + +`tsfile` is a single, read-only, pipe-friendly C++ binary for inspecting a `.tsfile` +without writing reader code — the TsFile analogue of `parquet-cli`/`pqrs`. Source: +`cpp/tools/`. Data goes to **stdout**, diagnostics/errors to **stderr**, so it composes +with `awk`, `jq`, `sort`, etc. + +It is **read-only**: there is no write/convert verb (see [Writing](#writing-a-tsfile)). + +## Locating / building the binary + +The executable is named **`tsfile`** (the CMake *target* is `tsfile_cli`, but the file is +`tsfile`). Look first, build only if missing: + +```sh +ls cpp/build/*/bin/tsfile # prebuilt? e.g. cpp/build/Debug/bin/tsfile +cd cpp && bash build.sh -t=Debug # build if absent (binary in build/Debug/bin/tsfile) +``` + +If CMake ≥ 4 aborts configuring the bundled ANTLR4 (`Policy CMP00xx may not be set to +OLD`), add `--disable-antlr4` — the reader and CLI don't use ANTLR4: + +```sh +cd cpp && bash build.sh -t=Debug --disable-antlr4 +``` + +## Commands + +``` +tsfile [options] +tsfile --help | --version | help +``` + +| Command | Output | Scans data pages? | +|---|---|---| +| `ls` | one device (tree model) or table (table model) per line | no | +| `schema` | `target, measurement, datatype, encoding, compression` | no | +| `meta` | file summary: model, version, device/table/series counts, time range, bloom, size | no | +| `stats` | per-series `count, start_time, end_time, min, max, first, last, sum` | no | +| `count` | per-series row counts + `total` row (from statistics) | no | +| `head` | first N rows (default 10, `-n`) | yes | +| `cat` | all matching rows (streamed) | yes | +| `sample` | reproducible reservoir sample (default 10, `-n` + `--seed`) | yes | + +Use the no-scan metadata verbs (`ls`/`schema`/`meta`/`stats`/`count`) first — they answer +most inspection questions cheaply and reliably. + +## Shared options + +| Option | Meaning | Applies to | +|---|---|---| +| `-f, --format csv\|tsv\|json\|table` | output format; auto = `table` on a TTY, `tsv` when piped | all | +| `-d, --device ` / `-t, --table ` | scope to one device / table (mutually exclusive) | row cmds, `schema`, `stats`, `count` | +| `-m, --measurements a,b,c` | column projection | `schema`, `head`, `cat`, `sample` | +| `-n, --limit N` / `--offset N` | row cap / skip (`--offset` invalid for `sample`) | `head`, `cat`, (`--offset`: not `sample`) | +| `--start ` / `--end ` | inclusive epoch-millisecond time range | `head`, `cat`, `sample` | +| `--seed N` | reproducible sampling seed (only valid for `sample`) | `sample` | +| `--no-header`, `--model tree\|table` | suppress header; force model (else auto-detected) | all | + +`json` is NDJSON (one object per line); numbers/booleans bare, others quoted, `null` as +`null`. CSV follows RFC 4180. Timestamps are raw epoch milliseconds. + +Exit codes: `0` ok · `1` usage/argument error · `2` file open/corrupt · `3` query/runtime. + +## Examples + +```sh +BIN=cpp/build/Debug/bin/tsfile +$BIN ls -f tsv data.tsfile # namespaces, one per line +$BIN meta data.tsfile # quick file overview +$BIN count -t table1 -f tsv data.tsfile # row counts, no page scan +$BIN cat -m temp,humidity --start 1700000000000 -f csv data.tsfile | head +$BIN sample -m temp -n 20 --seed 42 -f json data.tsfile | jq . +$BIN cat -f csv data.tsfile 2>/dev/null | awk -F, 'NR>1{n++} END{print n}' +``` + +## Known caveats + +- **Row commands can abort on some files.** `head`/`cat`/`sample` decode data pages and + may hit a reader assertion (`decode_cur_time_page_data`, `aligned_chunk_reader.cc`, + exit 134) on certain aligned files — including the bundled `cpp/examples/test_cpp.tsfile`. + This is a storage-engine/file issue, not a CLI bug; the metadata verbs still work on + such files. For row data, use a well-formed file (e.g. one you wrote yourself). +- **Garbled `target` for table model.** A table-model device id is built from tag-column + bytes, so `stats`/`count`/`schema` may print non-printable characters in `target`. +- **`schema` can list more columns than `meta`/`stats`/`count` report as series.** Tag/id + columns show up in `schema` but aren't always counted as field series, so `series_count` + and the `stats`/`count` rows may be fewer than the `schema` rows — not a discrepancy bug. +- **Build needs `--disable-antlr4` on CMake ≥ 4** (see above). + +## Writing a TsFile + +The CLI does **not** write. Produce a `.tsfile` with the C++ SDK — see +`cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter` / `TsFileWriter` + +`Tablet`), then inspect the result with this CLI. Java and Python writers exist under +`java/` and `python/`. From 6bb1d9cf89278ac7974b0ba9e1bc6e15018bf951 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 14:52:20 +0800 Subject: [PATCH 17/41] Rename CLI binary from tsfile to tsfile-cli --- .claude/skills/tsfile-cli/SKILL.md | 14 +++++++------- cpp/tools/CMakeLists.txt | 2 +- cpp/tools/cli/run_cli.cc | 4 ++-- docs/superpowers/plans/2026-06-02-tsfile-cli.md | 8 ++++---- .../specs/2026-06-02-tsfile-cli-design.md | 10 +++++----- 5 files changed, 19 insertions(+), 19 deletions(-) diff --git a/.claude/skills/tsfile-cli/SKILL.md b/.claude/skills/tsfile-cli/SKILL.md index 56fe8a40b..3be19f877 100644 --- a/.claude/skills/tsfile-cli/SKILL.md +++ b/.claude/skills/tsfile-cli/SKILL.md @@ -16,12 +16,12 @@ It is **read-only**: there is no write/convert verb (see [Writing](#writing-a-ts ## Locating / building the binary -The executable is named **`tsfile`** (the CMake *target* is `tsfile_cli`, but the file is -`tsfile`). Look first, build only if missing: +The executable is named **`tsfile-cli`** (the CMake *target* is `tsfile_cli`). Look first, +build only if missing: ```sh -ls cpp/build/*/bin/tsfile # prebuilt? e.g. cpp/build/Debug/bin/tsfile -cd cpp && bash build.sh -t=Debug # build if absent (binary in build/Debug/bin/tsfile) +ls cpp/build/*/bin/tsfile-cli # prebuilt? e.g. cpp/build/Debug/bin/tsfile-cli +cd cpp && bash build.sh -t=Debug # build if absent (binary in build/Debug/bin/tsfile-cli) ``` If CMake ≥ 4 aborts configuring the bundled ANTLR4 (`Policy CMP00xx may not be set to @@ -34,8 +34,8 @@ cd cpp && bash build.sh -t=Debug --disable-antlr4 ## Commands ``` -tsfile [options] -tsfile --help | --version | help +tsfile-cli [options] +tsfile-cli --help | --version | help ``` | Command | Output | Scans data pages? | @@ -72,7 +72,7 @@ Exit codes: `0` ok · `1` usage/argument error · `2` file open/corrupt · `3` q ## Examples ```sh -BIN=cpp/build/Debug/bin/tsfile +BIN=cpp/build/Debug/bin/tsfile-cli $BIN ls -f tsv data.tsfile # namespaces, one per line $BIN meta data.tsfile # quick file overview $BIN count -t table1 -f tsv data.tsfile # row counts, no page scan diff --git a/cpp/tools/CMakeLists.txt b/cpp/tools/CMakeLists.txt index e1408d67e..da4b072b5 100644 --- a/cpp/tools/CMakeLists.txt +++ b/cpp/tools/CMakeLists.txt @@ -41,7 +41,7 @@ add_executable(tsfile_cli tools_main.cc $) target_include_directories(tsfile_cli PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}) target_link_libraries(tsfile_cli tsfile) set_target_properties(tsfile_cli PROPERTIES - OUTPUT_NAME tsfile + OUTPUT_NAME tsfile-cli RUNTIME_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/bin) install(TARGETS tsfile_cli RUNTIME DESTINATION bin) diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 488bcb4c4..7084cd035 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -46,7 +46,7 @@ namespace tsfile_cli { namespace { void print_usage(std::ostream& os) { - os << "Usage: tsfile [options] \n" + os << "Usage: tsfile-cli [options] \n" "Commands:\n" " ls list devices (tree) or tables (table)\n" " schema per-measurement data type/encoding/compression\n" @@ -103,7 +103,7 @@ int run_cli(const std::vector& args, std::ostream& out, ParsedArgs p = parse_args(args); if (p.version) { - out << "tsfile (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; + out << "tsfile-cli (Apache TsFile C++) " << TSFILE_CLI_VERSION << "\n"; return kExitOk; } if (args.empty()) { diff --git a/docs/superpowers/plans/2026-06-02-tsfile-cli.md b/docs/superpowers/plans/2026-06-02-tsfile-cli.md index 5be080442..55106268e 100644 --- a/docs/superpowers/plans/2026-06-02-tsfile-cli.md +++ b/docs/superpowers/plans/2026-06-02-tsfile-cli.md @@ -181,7 +181,7 @@ Expected: 构建成功;选定测试全部通过。其中 `RunCliTest.SelectIsN `count`/`sample` 此时仍是 stub)。 > 若 `CliE2E.SchemaTableMeasurementFilterOnlyShowsRequestedColumn` 等已有断言因字符串 -> 细节失败,先用 `./build/Debug/bin/tsfile -f tsv ` 打印实际输出再对齐, +> 细节失败,先用 `./build/Debug/bin/tsfile-cli -f tsv ` 打印实际输出再对齐, > fixture 的数值(ts 0..4,s1=ts*10)是固定的。 - [ ] **Step 6: 手动确认 `select` 已不可用、help 不含 select** @@ -189,7 +189,7 @@ Expected: 构建成功;选定测试全部通过。其中 `RunCliTest.SelectIsN Run: ```bash -cd cpp && ./build/Debug/bin/tsfile --help | grep -i select; echo "rc=$?" +cd cpp && ./build/Debug/bin/tsfile-cli --help | grep -i select; echo "rc=$?" ``` Expected: 无输出,`rc=1`(grep 未命中);help 列出 `ls schema meta stats head cat @@ -1206,7 +1206,7 @@ Expected: 全部通过。若有与本计划无关的既有测试失败,记录 Run: ```bash -cd cpp && ./build/Debug/bin/tsfile --help +cd cpp && ./build/Debug/bin/tsfile-cli --help ``` Expected: stdout 含 `ls schema meta stats head cat count sample`;不含 `select`、 @@ -1218,7 +1218,7 @@ Run(样例为 table 模型): ```bash cd cpp -BIN=./build/Debug/bin/tsfile +BIN=./build/Debug/bin/tsfile-cli F=examples/test_cpp.tsfile $BIN ls -f tsv $F $BIN meta -f tsv $F diff --git a/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md b/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md index 6208d994b..32fd96611 100644 --- a/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md +++ b/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md @@ -37,9 +37,9 @@ 为 TsFile 提供一个单二进制、可组合、适合管道使用的 C++ 命令行工具: ```sh -tsfile [options] -tsfile --help | --version -tsfile help +tsfile-cli [options] +tsfile-cli --help | --version +tsfile-cli help ``` 让用户能像查看其他自描述数据文件一样查看 `.tsfile`:发现命名空间、查看 schema 和 @@ -78,7 +78,7 @@ HDF5 和 netCDF 提供命名空间与 header 经验:`h5ls`、`h5dump -H`、`nc 包含: -- 一个名为 `tsfile` 的多命令二进制。 +- 一个名为 `tsfile-cli` 的多命令二进制。 - 只读命令:`ls`、`schema`、`meta`、`stats`、`head`、`cat`、`count`、`sample`。 - 输出格式、模型选择、列投影、行数限制、offset、时间范围、抽样种子等共享参数。 - 基于现有 `storage::TsFileReader` 读路径实现,不修改存储引擎。 @@ -247,7 +247,7 @@ null 在 CSV/TSV 中输出为空字段。时间戳输出存储中的 epoch milli ```text cpp/tools/ -├── CMakeLists.txt # OBJECT 库 tsfile_cli_obj + 可执行文件 tsfile +├── CMakeLists.txt # OBJECT 库 tsfile_cli_obj + 可执行文件 tsfile-cli ├── tools_main.cc # main(): 转发 argv 给 run_cli ├── cli/ │ ├── exit_codes.h # kExitOk/kExitUsage/kExitFile/kExitRuntime From 751daa952bcb0077cb21b3f21be357cc3172f1d3 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 15:18:17 +0800 Subject: [PATCH 18/41] Add tsfile CLI write (CSV/TSV import) design spec --- .../2026-06-03-tsfile-cli-write-design.md | 198 ++++++++++++++++++ 1 file changed, 198 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md diff --git a/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md b/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md new file mode 100644 index 000000000..8544732f8 --- /dev/null +++ b/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md @@ -0,0 +1,198 @@ + + +# Design: TsFile CLI 写入(`tsfile-cli write`) + +- **日期**:2026-06-03 +- **模块**:`cpp/`(扩展 `cpp/tools/`、`cpp/test/tools/`) +- **状态**:设计已批准,待编写实现计划 +- **关系**:在只读 CLI(`docs/superpowers/specs/2026-06-02-tsfile-cli-design.md`)之上新增 + 第一个写入命令;该读侧设计把写入列为「后续工作」,本文把其中「文本导入」这一块具体化。 +- **调研依据**:`/Users/zhanghongyin/reasearchNotes/research/tsfile/调研报告/各文件格式CLI工具调研.md` + 第 2、3、5 章的写路径动词(Parquet `convert-csv`、ORC `convert`、Avro `fromjson`)。 + +## 1. 目标 + +为 `tsfile-cli` 增加一个 `write` 命令,把 **CSV/TSV 行数据导入成一个新的 table 模型 +`.tsfile`**。与读侧 `cat -f csv|tsv` 的输出对称,使读出的数据能经管道重新写回: + +```sh +tsfile-cli cat -m s1 -f csv in.tsfile | tsfile-cli write --table t1 \ + --columns "s1:INT64:field" -o out.tsfile +``` + +设计原则与读侧一致:单二进制、可组合、stdout/stderr 分离、零新第三方依赖、不修改存储 +引擎(仅调用现有 `storage::TsFileTableWriter` 写路径)。 + +## 2. 范围 + +包含: + +- 一个 `write` 命令:CSV/TSV → 单个 table 模型 `.tsfile`。 +- 显式 schema(`--columns` + `--table`),**零类型推断**。 +- 输入来自文件或 stdin;输出到 `-o` 指定的 `.tsfile`(覆盖写)。 + +不包含(YAGNI,列入后续工作): + +- tree 模型导入。 +- JSON / NDJSON 输入。 +- 类型推断。 +- 编码 / 压缩选择(v1 固定 `PLAIN` / `UNCOMPRESSED`)。 +- append / 合并 / `tsfile → tsfile` 转换 / 重写。 +- 引号字段内的换行(v1 假设每条记录占一行)。 + +## 3. 命令形态 + +``` +tsfile-cli write --table --columns -o \ + [-f csv|tsv] [--no-header] [ | -] +``` + +| 参数 | 含义 | 必填 | +|---|---|---| +| `` 位置参数 | 输入文件路径;省略或 `-` 表示从 **stdin** 读 | 否(默认 stdin) | +| `-o, --output ` | 输出 `.tsfile` 路径;已存在则覆盖(`O_TRUNC`) | 是 | +| `--table ` | 输出表名 | 是 | +| `--columns ` | 数据列规格(见 §5),按序描述**除时间列外**的列 | 是 | +| `-f, --format csv\|tsv` | 输入分隔符,默认 `csv`;`json`/`table` 视为 usage error | 否 | +| `--no-header` | 输入无表头行(默认认为首行是表头并跳过) | 否 | + +`write` 只使用上述参数;读侧的 `-d/--device`、`-m/--measurements`、`-n/--limit`、 +`--offset`、`--start/--end`、`--seed` 对 `write` 无意义,**出现即按 usage error 处理** +(退出码 `1`),以免静默误用。 + +## 4. 输入格式与行约定 + +- 一行一条记录,字段用分隔符分隔(`csv` = `,`,`tsv` = `\t`)。 +- **第一列固定是时间戳**:epoch 毫秒整数(`INT64`)。它不出现在 `--columns` 里。 +- 其余字段按 `--columns` 的顺序一一对应;每条数据行的字段数必须等于 + `1 + len(--columns)`,否则报错(§7)。 +- 默认首行为表头并跳过;表头内容**不做校验**(列身份完全由 `--columns` 决定)。 + `--no-header` 时不跳过首行。 +- **空单元格 = null**:该行该列不写入(`Tablet` 不 `add_value`,留 null)。 +- CSV 解析遵循 RFC 4180 引号规则(字段含分隔符/引号时用 `"` 包裹,内部 `"` 双写); + TSV 按 `\t` 切分、不做引号处理。引号字段内不支持换行(v1)。 + +## 5. Schema 规格(`--columns`) + +逗号分隔的列项,每项 `name:TYPE:category`: + +- `name`:列名,不含 `:` 和 `,`。 +- `TYPE`:TSDataType 名,**大小写不敏感**;v1 支持 + `BOOLEAN | INT32 | INT64 | FLOAT | DOUBLE | STRING | TEXT`。 +- `category`:`tag` 或 `field`(小写)。 + +示例:`--columns "id1:STRING:tag,id2:STRING:tag,s1:INT64:field"`。 + +解析为有序的 `ColumnDef{name, type, category}` 列表,任何一项缺字段、类型名未知、 +category 非法都按 usage error 处理(退出码 `1`,stderr 给出错误项)。 + +## 6. 写入路径 + +1. `TableSchema(table, [ColumnSchema(def.name, def.type, common::UNCOMPRESSED, + common::PLAIN, def.category) for def in columns])`。 +2. `storage::WriteFile`:`create(output, O_WRONLY|O_CREAT|O_TRUNC[, O_BINARY], 0666)`。 +3. `storage::TsFileTableWriter(&file, schema)`。 +4. 构造一个批量 `Tablet`(列 = `--columns` 的列名/类型/类别,容量如 `1024` 行):逐行 + `add_timestamp(i, ts)`;非空单元格按列类型 `add_value(i, name, typedValue)`。 +5. 批满即 `write_table(tablet)` 后复用/重置 tablet;EOF 后写出残余批。 +6. `flush()` → `close()`。 + +类型转换:单元格字符串 → 列类型。`INT32/INT64` 用 `strtoll`,`FLOAT/DOUBLE` 用 +`strtod`,`BOOLEAN` 接受 `true/false`(大小写不敏感)与 `1/0`,`STRING/TEXT` 原样。 +不可解析 → 运行时错误(§7)。 + +## 7. 退出码与输出 + +| 退出码 | 条件 | +|---|---| +| `0` | 成功 | +| `1` | usage / 参数错误(缺 `--table`/`--columns`/`-o`,`--columns` 语法错,`-f json|table`,混入读侧 flag) | +| `2` | 输入打不开 / 输出创建失败 | +| `3` | 行级错误:字段数不符、单元格类型解析失败、写库返回错误(stderr 标出行号) | + +`write` 不向 stdout 输出数据;进度/诊断/错误一律走 stderr。成功时 stdout 为空,并向 +stderr 打印一行摘要:`wrote rows to `。 + +## 8. 架构 + +新增/改动文件: + +```text +cpp/tools/ +├── cli/ +│ ├── cli_args.h / .cc # 新增 output(-o/--output)、columns(--columns) 字段与解析 +│ └── run_cli.cc # 注册 write;在 reader.open 之前特判 write 并分发 +├── commands/ +│ ├── commands.h # 声明 cmd_write +│ └── cmd_write.cc # 读输入/构 schema+tablet/写出 +└── format/ + ├── input_format.h / .cc # parse_columns_spec、split_delimited(csv/tsv)、parse_cell +cpp/test/tools/ +├── input_format_test.cc # 列规格解析、行切分、单元格类型解析(含 null/错误) +└── command_e2e_test.cc # 追加 write→读回 的往返 E2E +``` + +关键设计点 —— **`write` 是第一个不打开 `TsFileReader` 的命令**。当前 `run_cli` 对所有命令 +都 `reader.open(p.file)`,而 `write` 的位置参数是**输入 CSV**(或 stdin),不是要打开的 +`.tsfile`。因此在 `run_cli` 中: + +- 把 `write` 加入 `is_known_command`。 +- 新增 `validate_write_flags`(缺 `--table`/`--columns`/`-o`、`-f` 非 csv/tsv、混入读侧 + flag → usage error)。 +- 在 `storage::libtsfile_init()` 之后、构造 `TsFileReader` 之前插入: + `if (p.command == "write") return cmd_write(p, out, err);` —— 完全跳过 reader 路径。 + +`cmd_write` 签名不同于读侧命令(无 reader、无 OutputFormat): + +```cpp +int cmd_write(const ParsedArgs& args, std::ostream& out, std::ostream& err); +``` + +`input_format` 为纯层(不依赖 reader):列规格解析、按分隔符切行(引号感知)、单元格→ +类型转换,便于单测。`cmd_write` 负责打开输入流(文件或 `std::cin`)、串起 schema/tablet/ +writer。复用现有 `cli/exit_codes.h`。 + +## 9. 测试 + +- **单元**(`input_format_test.cc`):`parse_columns_spec` 正例与各类错误;`split_delimited` + 的 csv 引号/转义、tsv 切分;`parse_cell` 各类型正例、空=null、解析失败。 +- **E2E**(追加到 `command_e2e_test.cc`):把一段 CSV 写到临时文件,`run_cli({"write", + "--table","t1","--columns","s1:INT64:field","-o",out,csv})`,断言退出 0;随后在进程内 + 用读路径 `run_cli({"schema"/"count"/"cat", out})` 回读,断言表名、列、行数、行值与输入 + 一致(往返)。另覆盖:缺 `--columns` → 1;行字段数不符 → 3;输出到不可写路径 → 2。 + +只验证 CLI/写库行为,不新增存储引擎行为。 + +## 10. 被拒绝的方案 + +- **类型推断**:拒绝。CSV 类型推断(`1` vs `1.0` vs `"01"`)易误判;显式 `--columns` + 零歧义、实现最简,符合「先稳后省事」。推断可作后续便利项。 +- **首列以外某命名列作时间**:v1 拒绝(约定首列即时间,最简单且与读侧输出对齐); + `--time-column` 可后续再加。 +- **第二个位置参数作输出**:拒绝。现有 parser 只有一个位置参数;用 `-o/--output` 更显式, + 也避免改动位置参数语义。 +- **同时支持 tree 模型 / JSON**:本阶段拒绝(YAGNI),列入后续。 + +## 11. 后续工作 + +- tree 模型导入(device + measurements,aligned/非 aligned)。 +- JSON/NDJSON 输入(与读侧 `-f json` 对称)。 +- 类型推断、`--time-column`、编码/压缩 flag。 +- `tsfile → tsfile` 的 convert/rewrite/merge。 From 0a698c6f50872baea994d8091204656e8f4998c6 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 15:32:45 +0800 Subject: [PATCH 19/41] Refine write spec: silent-by-default (-v summary) and opt-in --header-match --- .../2026-06-03-tsfile-cli-write-design.md | 21 ++++++++++++------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md b/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md index 8544732f8..ae978afc1 100644 --- a/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md +++ b/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md @@ -61,7 +61,7 @@ tsfile-cli cat -m s1 -f csv in.tsfile | tsfile-cli write --table t1 \ ``` tsfile-cli write --table --columns -o \ - [-f csv|tsv] [--no-header] [ | -] + [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -] ``` | 参数 | 含义 | 必填 | @@ -72,6 +72,8 @@ tsfile-cli write --table --columns -o \ | `--columns ` | 数据列规格(见 §5),按序描述**除时间列外**的列 | 是 | | `-f, --format csv\|tsv` | 输入分隔符,默认 `csv`;`json`/`table` 视为 usage error | 否 | | `--no-header` | 输入无表头行(默认认为首行是表头并跳过) | 否 | +| `--header-match` | 校验首行表头列名与 `--columns`(及首列 `time`)一致,不符即报错 | 否 | +| `-v, --verbose` | 成功后向 stderr 打印一行摘要;默认静默 | 否 | `write` 只使用上述参数;读侧的 `-d/--device`、`-m/--measurements`、`-n/--limit`、 `--offset`、`--start/--end`、`--seed` 对 `write` 无意义,**出现即按 usage error 处理** @@ -83,8 +85,9 @@ tsfile-cli write --table --columns -o \ - **第一列固定是时间戳**:epoch 毫秒整数(`INT64`)。它不出现在 `--columns` 里。 - 其余字段按 `--columns` 的顺序一一对应;每条数据行的字段数必须等于 `1 + len(--columns)`,否则报错(§7)。 -- 默认首行为表头并跳过;表头内容**不做校验**(列身份完全由 `--columns` 决定)。 - `--no-header` 时不跳过首行。 +- 默认首行为表头并跳过;表头内容**默认不校验**(列身份完全由 `--columns` 决定)。 + `--no-header` 时不跳过首行。加 `--header-match` 时校验首行:首列名任意(约定为 `time`), + 其余列名须与 `--columns` 顺序逐一相等,不符即报错(§7)。 - **空单元格 = null**:该行该列不写入(`Tablet` 不 `add_value`,留 null)。 - CSV 解析遵循 RFC 4180 引号规则(字段含分隔符/引号时用 `"` 包裹,内部 `"` 双写); TSV 按 `\t` 切分、不做引号处理。引号字段内不支持换行(v1)。 @@ -125,10 +128,11 @@ category 非法都按 usage error 处理(退出码 `1`,stderr 给出错误 | `0` | 成功 | | `1` | usage / 参数错误(缺 `--table`/`--columns`/`-o`,`--columns` 语法错,`-f json|table`,混入读侧 flag) | | `2` | 输入打不开 / 输出创建失败 | -| `3` | 行级错误:字段数不符、单元格类型解析失败、写库返回错误(stderr 标出行号) | +| `3` | 行级错误:字段数不符、`--header-match` 下表头不符、单元格类型解析失败、写库返回错误(stderr 标出行号) | -`write` 不向 stdout 输出数据;进度/诊断/错误一律走 stderr。成功时 stdout 为空,并向 -stderr 打印一行摘要:`wrote rows to `。 +`write` 不向 stdout 输出数据;进度/诊断/错误一律走 stderr。**成功时默认全静默**(无 stdout、 +无 stderr 输出,遵循 Unix「silence is golden」);仅当加 `-v/--verbose` 时向 stderr 打印一行 +摘要:`wrote rows to `。 ## 8. 架构 @@ -137,7 +141,7 @@ stderr 打印一行摘要:`wrote rows to `。 ```text cpp/tools/ ├── cli/ -│ ├── cli_args.h / .cc # 新增 output(-o/--output)、columns(--columns) 字段与解析 +│ ├── cli_args.h / .cc # 新增 output(-o)、columns(--columns)、verbose(-v)、header_match(--header-match) │ └── run_cli.cc # 注册 write;在 reader.open 之前特判 write 并分发 ├── commands/ │ ├── commands.h # 声明 cmd_write @@ -176,7 +180,8 @@ writer。复用现有 `cli/exit_codes.h`。 - **E2E**(追加到 `command_e2e_test.cc`):把一段 CSV 写到临时文件,`run_cli({"write", "--table","t1","--columns","s1:INT64:field","-o",out,csv})`,断言退出 0;随后在进程内 用读路径 `run_cli({"schema"/"count"/"cat", out})` 回读,断言表名、列、行数、行值与输入 - 一致(往返)。另覆盖:缺 `--columns` → 1;行字段数不符 → 3;输出到不可写路径 → 2。 + 一致(往返)。另覆盖:缺 `--columns` → 1;行字段数不符 → 3;`--header-match` 下表头不符 + → 3;输出到不可写路径 → 2;成功默认静默、仅 `-v` 才有摘要。 只验证 CLI/写库行为,不新增存储引擎行为。 From 9baaa222e0986a389dcbd621b3949d4068ec0fe3 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 15:51:01 +0800 Subject: [PATCH 20/41] Add tsfile CLI write implementation plan --- .../plans/2026-06-03-tsfile-cli-write.md | 947 ++++++++++++++++++ 1 file changed, 947 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-03-tsfile-cli-write.md diff --git a/docs/superpowers/plans/2026-06-03-tsfile-cli-write.md b/docs/superpowers/plans/2026-06-03-tsfile-cli-write.md new file mode 100644 index 000000000..8f848ea1d --- /dev/null +++ b/docs/superpowers/plans/2026-06-03-tsfile-cli-write.md @@ -0,0 +1,947 @@ + + +# TsFile CLI 写入(`tsfile-cli write`)Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** 给 `tsfile-cli` 增加一个 `write` 命令,把 CSV/TSV 行数据导入成一个新的 table +模型 `.tsfile`(显式 `--columns`,零类型推断)。 + +**Architecture:** 新增纯解析层 `format/input_format.*`(列规格 / 行切分 / 类型名解析,重 +单测);`cli_args` 增 `-o/--output`、`--columns`、`-v/--verbose`、`--header-match`; +`commands/cmd_write.cc` 串起「读输入 → 构 `TableSchema`/`Tablet` → `TsFileTableWriter` +写出」;`run_cli` 把 `write` 注册为**第一个不打开 `TsFileReader` 的命令**,在 reader.open +之前特判分发。不修改存储引擎。 + +**Tech Stack:** C++11/14(测试目标 `-std=c++14`),CMake `BUILD_TOOLS`,Google Test, +现有 `storage::TsFileTableWriter`、`storage::TableSchema`、`common::ColumnSchema`、 +`storage::Tablet`、`storage::WriteFile`。 + +**Spec:** `docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md` + +--- + +## 执行前提 + +- 工作目录 `/Users/zhanghongyin/iotdb/tsfile`;git 操作从仓库根运行;不要暂存 `.codegraph/` + 或测试产生的 `cpp/*.tsfile`/`*.dat` 临时文件。 +- 新建 `.h`/`.cc` 前置 Apache 2.0 块注释头(从任一 `cpp/tools/**` 文件复制)。 +- **构建/测试**(本机 CMake 4.x 与 bundled ANTLR4 旧 policy 冲突,必须 `--disable-antlr4`; + `build.sh` 默认 `build_test=0` 无开关,执行期间临时 `sed -i '' 's/^build_test=0/build_test=1/' + cpp/build.sh`,Task 4 收尾 `git checkout cpp/build.sh` 还原): + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 +./build/Debug/test/lib/TsFile_Test --gtest_filter='InputFormatTest.*:ParseArgsTest.*:RunCliTest.*:CliE2E.*' +``` + +- 构建退出码 2(`make install` 拷 libsnappy 到 `/usr/local/lib` 权限不足)属预期,编译/链接 + 与测试不受影响;判定成功看 `grep -c "Built target TsFile_Test"` 与测试结果。 + +## 已核验的 SDK 事实(编译依据) + +- `enum class common::ColumnCategory { TAG, FIELD, ATTRIBUTE, TIME }`(`utils/db_utils.h`)。 +- `common::ColumnSchema(name, TSDataType, CompressionType, TSEncoding, ColumnCategory)`(`common/schema.h`)。 +- `storage::TableSchema(table_name, std::vector)`——**会把表名转小写**。 +- `storage::TsFileTableWriter(storage::WriteFile*, storage::TableSchema*)`(模板 ctor,附加参数有默认值)。 +- `int TsFileTableWriter::write_table(Tablet&) const` / `int flush()` / `int close()`。 +- `int WriteFile::create(const std::string&, int flags, mode_t mode)`。 +- `storage::Tablet(target_name, names, types, categories, max_rows)`; + `int add_timestamp(uint32_t, int64_t)`;`template int add_value(uint32_t, const std::string& name, T)`。 + 未 `add_value` 的单元格默认为 null。 + +## 文件结构 + +新增: +- `cpp/tools/format/input_format.h` / `.cc`:`ColumnDef`、`parse_columns_spec`、 + `split_line`、`parse_datatype_name`、`parse_category`、`parse_bool_cell`(纯层,无 reader 依赖)。 +- `cpp/tools/commands/cmd_write.cc`:`cmd_write`。 +- `cpp/test/tools/input_format_test.cc`:纯层单测。 + +修改: +- `cpp/tools/cli/cli_args.h` / `.cc`:`ParsedArgs` 增 `output/columns/verbose/header_match` 与解析。 +- `cpp/tools/commands/commands.h`:声明 `cmd_write`。 +- `cpp/tools/cli/run_cli.cc`:注册 `write`、`validate_write_flags`、reader 旁路分发、usage 文案。 +- `cpp/test/tools/cli_args_test.cc`:write 参数解析测试。 +- `cpp/test/tools/command_e2e_test.cc`:write→读回往返 E2E。 + +--- + +### Task 1: `input_format` 纯解析层 + +**Files:** +- Create: `cpp/tools/format/input_format.h` +- Create: `cpp/tools/format/input_format.cc` +- Create: `cpp/test/tools/input_format_test.cc` + +- [ ] **Step 1: 写失败单测** — `cpp/test/tools/input_format_test.cc`(前置 license 头) + +```cpp +#include "format/input_format.h" + +#include + +#include "common/db_common.h" +#include "utils/db_utils.h" + +TEST(InputFormatTest, ParseColumnsSpecValid) { + std::vector cols; + std::string err; + EXPECT_TRUE(tsfile_cli::parse_columns_spec("id1:STRING:tag,s1:INT64:field", + cols, err)); + ASSERT_EQ(cols.size(), 2u); + EXPECT_EQ(cols[0].name, "id1"); + EXPECT_EQ(cols[0].type, common::STRING); + EXPECT_EQ(cols[0].category, common::ColumnCategory::TAG); + EXPECT_EQ(cols[1].type, common::INT64); + EXPECT_EQ(cols[1].category, common::ColumnCategory::FIELD); +} + +TEST(InputFormatTest, ParseColumnsSpecCaseInsensitiveType) { + std::vector cols; + std::string err; + EXPECT_TRUE(tsfile_cli::parse_columns_spec("s1:int64:field", cols, err)); + EXPECT_EQ(cols[0].type, common::INT64); +} + +TEST(InputFormatTest, ParseColumnsSpecErrors) { + std::vector cols; + std::string err; + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:NOPE:field", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64:bogus", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec("", cols, err)); +} + +TEST(InputFormatTest, SplitLineTsv) { + std::vector f = tsfile_cli::split_line("0\t10\t20", '\t', false); + ASSERT_EQ(f.size(), 3u); + EXPECT_EQ(f[0], "0"); + EXPECT_EQ(f[2], "20"); +} + +TEST(InputFormatTest, SplitLineCsvQuotes) { + std::vector f = + tsfile_cli::split_line("1,\"a,b\",\"she \"\"hi\"\"\"", ',', true); + ASSERT_EQ(f.size(), 3u); + EXPECT_EQ(f[1], "a,b"); + EXPECT_EQ(f[2], "she \"hi\""); +} + +TEST(InputFormatTest, SplitLineEmptyFields) { + std::vector f = tsfile_cli::split_line("0,,5", ',', true); + ASSERT_EQ(f.size(), 3u); + EXPECT_EQ(f[1], ""); +} + +TEST(InputFormatTest, ParseBoolCell) { + bool b = false; + EXPECT_TRUE(tsfile_cli::parse_bool_cell("true", b)); + EXPECT_TRUE(b); + EXPECT_TRUE(tsfile_cli::parse_bool_cell("0", b)); + EXPECT_FALSE(b); + EXPECT_FALSE(tsfile_cli::parse_bool_cell("maybe", b)); +} +``` + +- [ ] **Step 2: 运行确认失败** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 +``` + +Expected: 构建失败,`format/input_format.h` 不存在。 + +- [ ] **Step 3: 创建 `cpp/tools/format/input_format.h`**(前置 license 头) + +```cpp +#ifndef TSFILE_CLI_INPUT_FORMAT_H +#define TSFILE_CLI_INPUT_FORMAT_H + +#include +#include + +#include "common/db_common.h" +#include "utils/db_utils.h" + +namespace tsfile_cli { + +struct ColumnDef { + std::string name; + common::TSDataType type; + common::ColumnCategory category; +}; + +bool parse_datatype_name(const std::string& s, common::TSDataType& out); +bool parse_category(const std::string& s, common::ColumnCategory& out); +bool parse_columns_spec(const std::string& spec, std::vector& out, + std::string& error); +std::vector split_line(const std::string& line, char delim, + bool csv_quotes); +bool parse_bool_cell(const std::string& s, bool& out); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_INPUT_FORMAT_H +``` + +- [ ] **Step 4: 创建 `cpp/tools/format/input_format.cc`**(前置 license 头) + +```cpp +#include "format/input_format.h" + +#include + +namespace tsfile_cli { + +bool parse_datatype_name(const std::string& s, common::TSDataType& out) { + std::string u; + u.reserve(s.size()); + for (char c : s) { + u += static_cast(std::toupper(static_cast(c))); + } + if (u == "BOOLEAN") { + out = common::BOOLEAN; + } else if (u == "INT32") { + out = common::INT32; + } else if (u == "INT64") { + out = common::INT64; + } else if (u == "FLOAT") { + out = common::FLOAT; + } else if (u == "DOUBLE") { + out = common::DOUBLE; + } else if (u == "STRING") { + out = common::STRING; + } else if (u == "TEXT") { + out = common::TEXT; + } else { + return false; + } + return true; +} + +bool parse_category(const std::string& s, common::ColumnCategory& out) { + if (s == "tag") { + out = common::ColumnCategory::TAG; + } else if (s == "field") { + out = common::ColumnCategory::FIELD; + } else { + return false; + } + return true; +} + +std::vector split_line(const std::string& line, char delim, + bool csv_quotes) { + std::vector out; + std::string field; + if (!csv_quotes) { + for (char c : line) { + if (c == delim) { + out.push_back(field); + field.clear(); + } else { + field += c; + } + } + out.push_back(field); + return out; + } + bool in_quotes = false; + for (size_t i = 0; i < line.size(); ++i) { + char c = line[i]; + if (in_quotes) { + if (c == '"') { + if (i + 1 < line.size() && line[i + 1] == '"') { + field += '"'; + ++i; + } else { + in_quotes = false; + } + } else { + field += c; + } + } else if (c == '"') { + in_quotes = true; + } else if (c == delim) { + out.push_back(field); + field.clear(); + } else { + field += c; + } + } + out.push_back(field); + return out; +} + +bool parse_columns_spec(const std::string& spec, std::vector& out, + std::string& error) { + out.clear(); + if (spec.empty()) { + error = "empty --columns"; + return false; + } + std::vector items = split_line(spec, ',', false); + for (const std::string& item : items) { + std::vector parts = split_line(item, ':', false); + if (parts.size() != 3) { + error = "bad column '" + item + "' (want name:TYPE:category)"; + return false; + } + ColumnDef def; + def.name = parts[0]; + if (def.name.empty()) { + error = "empty column name in '" + item + "'"; + return false; + } + if (!parse_datatype_name(parts[1], def.type)) { + error = "unknown type '" + parts[1] + "'"; + return false; + } + if (!parse_category(parts[2], def.category)) { + error = "bad category '" + parts[2] + "' (want tag|field)"; + return false; + } + out.push_back(def); + } + return true; +} + +bool parse_bool_cell(const std::string& s, bool& out) { + std::string l; + l.reserve(s.size()); + for (char c : s) { + l += static_cast(std::tolower(static_cast(c))); + } + if (l == "true" || l == "1") { + out = true; + return true; + } + if (l == "false" || l == "0") { + out = false; + return true; + } + return false; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: 构建并运行确认通过** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='InputFormatTest.*' +``` + +Expected: 构建成功;7 个 `InputFormatTest` 全过。 + +- [ ] **Step 6: 提交** + +```bash +git add cpp/tools/format/input_format.h cpp/tools/format/input_format.cc cpp/test/tools/input_format_test.cc +git commit -m "Add tsfile CLI input_format parsing layer" +``` + +--- + +### Task 2: `cli_args` 增加 write 参数 + +**Files:** +- Modify: `cpp/tools/cli/cli_args.h` +- Modify: `cpp/tools/cli/cli_args.cc` +- Modify: `cpp/test/tools/cli_args_test.cc` + +- [ ] **Step 1: 写失败测试** — 追加到 `cpp/test/tools/cli_args_test.cc` + +```cpp +TEST(ParseArgsTest, WriteFlagsParsed) { + auto p = tsfile_cli::parse_args({"write", "--table", "t1", "--columns", + "s1:INT64:field", "-o", "out.tsfile", "-v", + "--header-match", "in.csv"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "write"); + EXPECT_EQ(p.table, "t1"); + EXPECT_EQ(p.columns, "s1:INT64:field"); + EXPECT_EQ(p.output, "out.tsfile"); + EXPECT_TRUE(p.verbose); + EXPECT_TRUE(p.header_match); + EXPECT_EQ(p.file, "in.csv"); +} + +TEST(ParseArgsTest, OutputFlagNeedsValue) { + auto p = tsfile_cli::parse_args({"write", "-o"}); + EXPECT_FALSE(p.error.empty()); +} +``` + +- [ ] **Step 2: 运行确认失败** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 +``` + +Expected: 编译失败,`ParsedArgs` 无 `output`/`columns`/`verbose`/`header_match`。 + +- [ ] **Step 3: `cli_args.h` 增字段** — 在 `ParsedArgs` 的 `model` 字段之后加入: + +```cpp + std::string output; + std::string columns; + bool verbose = false; + bool header_match = false; +``` + +- [ ] **Step 4: `cli_args.cc` 解析** — 在 `parse_args` 循环中,把这些分支放在 `--model` + 分支之前: + +```cpp + } else if (a == "-o" || a == "--output") { + if (!need_value(a, p.output)) { + return p; + } + } else if (a == "--columns") { + if (!need_value(a, p.columns)) { + return p; + } + } else if (a == "-v" || a == "--verbose") { + p.verbose = true; + } else if (a == "--header-match") { + p.header_match = true; +``` + +- [ ] **Step 5: 构建并运行确认通过** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='ParseArgsTest.*' +``` + +Expected: 构建成功;`ParseArgsTest` 全过。 + +- [ ] **Step 6: 提交** + +```bash +git add cpp/tools/cli/cli_args.h cpp/tools/cli/cli_args.cc cpp/test/tools/cli_args_test.cc +git commit -m "Add tsfile CLI write argument parsing" +``` + +--- + +### Task 3: `cmd_write` 与 `run_cli` 接线 + +**Files:** +- Create: `cpp/tools/commands/cmd_write.cc` +- Modify: `cpp/tools/commands/commands.h` +- Modify: `cpp/tools/cli/run_cli.cc` +- Modify: `cpp/test/tools/command_e2e_test.cc` + +- [ ] **Step 1: 写失败 E2E** — 追加到 `cpp/test/tools/command_e2e_test.cc` + +```cpp +TEST(CliE2E, WriteThenReadRoundTrip) { + std::string csv_path = "tsfile_cli_write_in.csv"; + { + std::ofstream o(csv_path.c_str()); + o << "time,id1,s1\n0,dev,0\n1,dev,10\n2,dev,20\n"; + } + std::string out_path = "tsfile_cli_write_out.tsfile"; + + std::ostringstream wout; + std::ostringstream werr; + int wc = tsfile_cli::run_cli( + {"write", "--table", "t1", "--columns", "id1:STRING:tag,s1:INT64:field", + "-o", out_path, csv_path}, + wout, werr); + EXPECT_EQ(wc, 0) << werr.str(); + + std::ostringstream cout_; + std::ostringstream cerr_; + int cc = tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); + EXPECT_EQ(cc, 0); + EXPECT_NE(cout_.str().find("\ts1\t3"), std::string::npos) << cout_.str(); + + std::ostringstream rout; + std::ostringstream rerr; + int rc = tsfile_cli::run_cli({"cat", "-m", "s1", "-f", "tsv", out_path}, + rout, rerr); + EXPECT_EQ(rc, 0); + EXPECT_EQ(rout.str(), "time\ts1\n0\t0\n1\t10\n2\t20\n"); + + std::remove(csv_path.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, WriteMissingColumnsIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"write", "--table", "t1", "-o", "x.tsfile", "in.csv"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--columns"), std::string::npos); +} +``` + +> `command_e2e_test.cc` 顶部已 `#include `(`std::remove`);新增需要 ``, +> 若未包含则在该文件 include 区加 `#include `。 + +- [ ] **Step 2: 运行确认失败** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 +``` + +Expected: 构建或测试失败(`write` 未注册/未实现)。 + +- [ ] **Step 3: 声明 `cmd_write`** — 在 `cpp/tools/commands/commands.h` 的 `cmd_sample` + 声明之后加入(注意签名无 reader、无 OutputFormat): + +```cpp +int cmd_write(const ParsedArgs& args, std::ostream& out, std::ostream& err); +``` + +- [ ] **Step 4: 创建 `cpp/tools/commands/cmd_write.cc`**(前置 license 头) + +```cpp +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "cli/cli_args.h" +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/schema.h" +#include "common/tablet.h" +#include "file/write_file.h" +#include "format/input_format.h" +#include "writer/tsfile_table_writer.h" + +namespace tsfile_cli { +namespace { + +struct DataRow { + long long line_no; + int64_t timestamp; + std::vector cells; +}; + +void strip_cr(std::string& s) { + if (!s.empty() && s.back() == '\r') { + s.pop_back(); + } +} + +bool add_typed_value(storage::Tablet& tablet, uint32_t row, + const ColumnDef& def, const std::string& cell, + std::string& error) { + if (cell.empty()) { + return true; // null + } + char* e = nullptr; + switch (def.type) { + case common::BOOLEAN: { + bool v = false; + if (!parse_bool_cell(cell, v)) { + error = "bad BOOLEAN '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, v); + return true; + } + case common::INT32: { + long v = std::strtol(cell.c_str(), &e, 10); + if (e == nullptr || *e != '\0') { + error = "bad INT32 '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, static_cast(v)); + return true; + } + case common::INT64: { + long long v = std::strtoll(cell.c_str(), &e, 10); + if (e == nullptr || *e != '\0') { + error = "bad INT64 '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, static_cast(v)); + return true; + } + case common::FLOAT: { + float v = std::strtof(cell.c_str(), &e); + if (e == nullptr || *e != '\0') { + error = "bad FLOAT '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, v); + return true; + } + case common::DOUBLE: { + double v = std::strtod(cell.c_str(), &e); + if (e == nullptr || *e != '\0') { + error = "bad DOUBLE '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, v); + return true; + } + case common::STRING: + case common::TEXT: { + tablet.add_value(row, def.name, cell); + return true; + } + default: + error = "unsupported column type"; + return false; + } +} + +} // namespace + +int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, + std::ostream& err) { + std::vector columns; + std::string perr; + if (!parse_columns_spec(args.columns, columns, perr)) { + err << "Error: " << perr << "\n"; + return kExitUsage; + } + + std::istream* in = &std::cin; + std::ifstream fin; + if (!args.file.empty() && args.file != "-") { + fin.open(args.file.c_str()); + if (!fin.is_open()) { + err << "Error: cannot open input: " << args.file << "\n"; + return kExitFile; + } + in = &fin; + } + + const char delim = (args.format == ParsedArgs::Format::kTsv) ? '\t' : ','; + const bool csv_quotes = (delim == ','); + + std::string line; + long long line_no = 0; + if (!args.no_header) { + if (std::getline(*in, line)) { + ++line_no; + strip_cr(line); + if (args.header_match) { + std::vector h = split_line(line, delim, csv_quotes); + bool ok = (h.size() == columns.size() + 1); + for (size_t i = 0; ok && i < columns.size(); ++i) { + if (h[i + 1] != columns[i].name) { + ok = false; + } + } + if (!ok) { + err << "Error: header does not match --columns (line 1)\n"; + return kExitRuntime; + } + } + } + } + + std::vector rows; + while (std::getline(*in, line)) { + ++line_no; + strip_cr(line); + if (line.empty()) { + continue; + } + std::vector fields = split_line(line, delim, csv_quotes); + if (fields.size() != columns.size() + 1) { + err << "Error: expected " << (columns.size() + 1) << " fields, got " + << fields.size() << " (line " << line_no << ")\n"; + return kExitRuntime; + } + char* e = nullptr; + long long ts = std::strtoll(fields[0].c_str(), &e, 10); + if (e == nullptr || *e != '\0') { + err << "Error: bad timestamp '" << fields[0] << "' (line " << line_no + << ")\n"; + return kExitRuntime; + } + DataRow r; + r.line_no = line_no; + r.timestamp = static_cast(ts); + r.cells.assign(fields.begin() + 1, fields.end()); + rows.push_back(r); + } + + std::vector names; + std::vector types; + std::vector cats; + std::vector col_schemas; + for (const ColumnDef& d : columns) { + names.push_back(d.name); + types.push_back(d.type); + cats.push_back(d.category); + col_schemas.push_back(common::ColumnSchema( + d.name, d.type, common::UNCOMPRESSED, common::PLAIN, d.category)); + } + + storage::WriteFile file; + int flags = O_WRONLY | O_CREAT | O_TRUNC; +#ifdef _WIN32 + flags |= O_BINARY; +#endif + if (file.create(args.output, flags, 0666) != 0) { + err << "Error: cannot create output: " << args.output << "\n"; + return kExitFile; + } + auto* schema = new storage::TableSchema(args.table, col_schemas); + auto* writer = new storage::TsFileTableWriter(&file, schema); + + int rc = kExitOk; + const size_t kBatch = 1024; + for (size_t start = 0; start < rows.size() && rc == kExitOk; + start += kBatch) { + size_t end = std::min(start + kBatch, rows.size()); + storage::Tablet tablet(args.table, names, types, cats, + static_cast(end - start)); + for (size_t i = start; i < end && rc == kExitOk; ++i) { + uint32_t r = static_cast(i - start); + tablet.add_timestamp(r, rows[i].timestamp); + for (size_t j = 0; j < columns.size(); ++j) { + std::string cerr; + if (!add_typed_value(tablet, r, columns[j], rows[i].cells[j], + cerr)) { + err << "Error: " << cerr << " (line " << rows[i].line_no + << ")\n"; + rc = kExitRuntime; + break; + } + } + } + if (rc == kExitOk && writer->write_table(tablet) != 0) { + err << "Error: write_table failed\n"; + rc = kExitRuntime; + } + } + + if (rc == kExitOk) { + if (writer->flush() != 0 || writer->close() != 0) { + err << "Error: flush/close failed\n"; + rc = kExitRuntime; + } + } else { + writer->close(); + } + delete writer; + delete schema; + + if (rc == kExitOk && args.verbose) { + err << "wrote " << rows.size() << " rows to " << args.output << "\n"; + } + return rc; +} + +} // namespace tsfile_cli +``` + +- [ ] **Step 5: `run_cli.cc` 注册 write + 校验 + reader 旁路** + +在 `cpp/tools/cli/run_cli.cc` 中: + +1. `is_known_command` 集合加入 `"write"`: + +```cpp + static const std::set kCmds = { + "ls", "schema", "meta", "stats", "head", + "cat", "count", "sample", "write"}; +``` + +2. 在匿名 namespace 内新增 `validate_write_flags`(放在 `validate_command_flags` 之后): + +```cpp +bool validate_write_flags(const ParsedArgs& p, std::ostream& err) { + if (p.table.empty()) { + err << "Error: write requires --table\n"; + return false; + } + if (p.columns.empty()) { + err << "Error: write requires --columns\n"; + return false; + } + if (p.output.empty()) { + err << "Error: write requires -o/--output\n"; + return false; + } + if (p.format == ParsedArgs::Format::kJson || + p.format == ParsedArgs::Format::kTable) { + err << "Error: write input format must be csv or tsv\n"; + return false; + } + if (!p.measurements.empty() || !p.device.empty() || p.has_start || + p.has_end || p.has_seed || p.limit != -1 || p.offset != 0) { + err << "Error: read-only flags are not valid for write\n"; + return false; + } + return true; +} +``` + +3. 把 usage 的 Commands 段在 `cat` 行之后加入 write(在 `count`/`sample` 行附近,保持 + 可读即可): + +```cpp + " write import CSV/TSV rows into a new table tsfile " + "(--table, --columns, -o)\n" +``` + + 并把 Options 段追加一行: + +```cpp + "Write options: --table, --columns name:TYPE:tag|field,..., -o/--output,\n" + " --header-match, -v/--verbose\n" +``` + +4. 把文件缺失检查改为对 write 放行(write 的位置参数是输入 CSV,可为 stdin): + +```cpp + if (p.command != "write" && p.file.empty()) { + err << "Error: missing argument\n"; + return kExitUsage; + } +``` + +5. 在 `validate_command_flags` 调用之后、`storage::libtsfile_init();` 之前加入 write 分发: + +```cpp + if (p.command == "write") { + if (!validate_write_flags(p, err)) { + print_usage(err); + return kExitUsage; + } + storage::libtsfile_init(); + return cmd_write(p, out, err); + } +``` + +- [ ] **Step 6: 构建并运行确认通过** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='CliE2E.WriteThenReadRoundTrip:CliE2E.WriteMissingColumnsIsUsageError' +``` + +Expected: 构建成功;两个测试通过。 + +> 若 `WriteThenReadRoundTrip` 的 `cat` 断言因列顺序/空值细节失败,先用 +> `./build/Debug/bin/tsfile-cli cat -m s1 -f tsv tsfile_cli_write_out.tsfile`(先手动跑一次 +> write)打印实际输出再对齐;count=3 与 schema 是稳的。若 `add_value`/null 行为与预期不符, +> 对照 `cpp/test/tools/cli_test_util.h`(已验证可写读的 table fixture)排查。 + +- [ ] **Step 7: 提交** + +```bash +git add cpp/tools/commands/cmd_write.cc cpp/tools/commands/commands.h cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc +git commit -m "Add tsfile CLI write command (CSV/TSV import)" +``` + +--- + +### Task 4: 全量验证、格式化、收尾 + +**Files:** +- Modify: `docs/superpowers/plans/2026-06-03-tsfile-cli-write.md` 仅当执行中需修正执行笔记。 + +- [ ] **Step 1: 跑完整 CLI 相关测试** + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='InputFormatTest.*:CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:StatTableTest.*' +``` + +Expected: 构建成功;全部通过。 + +- [ ] **Step 2: 跑完整测试可执行文件** + +```bash +cd cpp && ./build/Debug/test/lib/TsFile_Test 2>&1 | tail -3 +``` + +Expected: 全部通过(无回归)。 + +- [ ] **Step 3: 手动冒烟(含 stdin 与默认静默)** + +```bash +cd cpp +BIN=./build/Debug/bin/tsfile-cli +printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' | $BIN write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o /tmp/w.tsfile -; echo "rc=$? (静默,无输出)" +$BIN write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o /tmp/w.tsfile -v <<< $'time,id1,s1\n0,dev,0' 2>&1 # -v 才有 "wrote N rows" +$BIN count -f tsv /tmp/w.tsfile +``` + +Expected: 默认无 stdout/stderr;`-v` 时 stderr 一行 `wrote ... rows`;count 回读正常。 + +- [ ] **Step 4: 格式化与暂存范围** + +```bash +cd /Users/zhanghongyin/iotdb/tsfile && clang-format -i cpp/tools/format/input_format.h cpp/tools/format/input_format.cc cpp/tools/commands/cmd_write.cc cpp/tools/cli/run_cli.cc cpp/tools/cli/cli_args.cc cpp/tools/cli/cli_args.h cpp/test/tools/input_format_test.cc cpp/test/tools/cli_args_test.cc cpp/test/tools/command_e2e_test.cc +git checkout cpp/build.sh +git diff --check +git status --short +``` + +Expected: `git diff --check` 退出 0;`build.sh` 已还原;status 仅含本次 write 工作 + 若 +clang-format 有改动则一并提交。 + +- [ ] **Step 5: 最终提交(如格式化有改动)** + +```bash +git add -u cpp/tools cpp/test/tools +git commit -m "Format tsfile CLI write sources" +``` + +若无改动则不创建空提交。 + +## 覆盖检查(plan self-review) + +| Spec 要求 | 对应 | +|---|---| +| `write` 命令、CSV/TSV → table tsfile | Task 3 | +| `--columns name:TYPE:category` 显式、零推断 | Task 1(`parse_columns_spec`)、Task 2 | +| 首列即时间、字段数校验、空=null | Task 3(`cmd_write`) | +| `-o/--output`、stdin/`-`、覆盖写 | Task 2、Task 3 | +| `-f csv|tsv`(json/table → usage error) | Task 3(`validate_write_flags`) | +| `--no-header` 默认跳表头 / `--header-match` 校验 | Task 2、Task 3 | +| 成功默认静默、`-v` 才出摘要 | Task 3(`cmd_write` 末尾),Task 4 Step 3 验证 | +| 退出码 0/1/2/3、stdout 无数据/诊断走 stderr | Task 3、Task 1 错误返回 | +| 拒绝读侧 flag | Task 3(`validate_write_flags`) | +| reader 旁路(write 不开 reader) | Task 3 Step 5 | +| 测试:列规格/行切分/类型、write→读回往返 | Task 1、Task 3 | + +**占位扫描**:无 TBD/TODO;所有代码块完整。 + +**类型一致性**:`ColumnDef{name,type,category}`、`parse_columns_spec`、`split_line`、 +`parse_bool_cell`、`cmd_write(args,out,err)`、`ParsedArgs` 的 `output/columns/verbose/ +header_match` 在各 Task 间一致;SDK 调用(`TableSchema`/`ColumnSchema`/`Tablet`/ +`TsFileTableWriter`/`WriteFile`)均按「已核验的 SDK 事实」一节。 + +**已知残留风险(执行中验证)**: +1. 未 `add_value` 的单元格是否默认 null —— 对照 `cli_test_util.h` 已验证路径;E2E 若 null + 行为异常则调整。 +2. 零 tag 列的 table 是否可写读 —— E2E 用了 1 个 tag 列规避;纯 field 表留作后续验证。 +3. `cat` 回读新写文件理论上正常(E2E fixture 同型可 cat),若触发 aligned-chunk 断言则 + 说明是存储引擎层问题(超出本计划范围),改用 `count`/`schema` 断言往返。 From dcef8282cb832158d359a2746e542fd4d4b1aca6 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 15:54:31 +0800 Subject: [PATCH 21/41] Add tsfile CLI input_format parsing layer --- cpp/test/tools/input_format_test.cc | 85 +++++++++++++++ cpp/tools/format/input_format.cc | 156 ++++++++++++++++++++++++++++ cpp/tools/format/input_format.h | 47 +++++++++ 3 files changed, 288 insertions(+) create mode 100644 cpp/test/tools/input_format_test.cc create mode 100644 cpp/tools/format/input_format.cc create mode 100644 cpp/tools/format/input_format.h diff --git a/cpp/test/tools/input_format_test.cc b/cpp/test/tools/input_format_test.cc new file mode 100644 index 000000000..f73a72c5c --- /dev/null +++ b/cpp/test/tools/input_format_test.cc @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "format/input_format.h" + +#include + +#include "common/db_common.h" +#include "utils/db_utils.h" + +TEST(InputFormatTest, ParseColumnsSpecValid) { + std::vector cols; + std::string err; + EXPECT_TRUE(tsfile_cli::parse_columns_spec("id1:STRING:tag,s1:INT64:field", + cols, err)); + ASSERT_EQ(cols.size(), 2u); + EXPECT_EQ(cols[0].name, "id1"); + EXPECT_EQ(cols[0].type, common::STRING); + EXPECT_EQ(cols[0].category, common::ColumnCategory::TAG); + EXPECT_EQ(cols[1].type, common::INT64); + EXPECT_EQ(cols[1].category, common::ColumnCategory::FIELD); +} + +TEST(InputFormatTest, ParseColumnsSpecCaseInsensitiveType) { + std::vector cols; + std::string err; + EXPECT_TRUE(tsfile_cli::parse_columns_spec("s1:int64:field", cols, err)); + EXPECT_EQ(cols[0].type, common::INT64); +} + +TEST(InputFormatTest, ParseColumnsSpecErrors) { + std::vector cols; + std::string err; + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:NOPE:field", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64:bogus", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec("", cols, err)); +} + +TEST(InputFormatTest, SplitLineTsv) { + std::vector f = + tsfile_cli::split_line("0\t10\t20", '\t', false); + ASSERT_EQ(f.size(), 3u); + EXPECT_EQ(f[0], "0"); + EXPECT_EQ(f[2], "20"); +} + +TEST(InputFormatTest, SplitLineCsvQuotes) { + std::vector f = + tsfile_cli::split_line("1,\"a,b\",\"she \"\"hi\"\"\"", ',', true); + ASSERT_EQ(f.size(), 3u); + EXPECT_EQ(f[1], "a,b"); + EXPECT_EQ(f[2], "she \"hi\""); +} + +TEST(InputFormatTest, SplitLineEmptyFields) { + std::vector f = tsfile_cli::split_line("0,,5", ',', true); + ASSERT_EQ(f.size(), 3u); + EXPECT_EQ(f[1], ""); +} + +TEST(InputFormatTest, ParseBoolCell) { + bool b = false; + EXPECT_TRUE(tsfile_cli::parse_bool_cell("true", b)); + EXPECT_TRUE(b); + EXPECT_TRUE(tsfile_cli::parse_bool_cell("0", b)); + EXPECT_FALSE(b); + EXPECT_FALSE(tsfile_cli::parse_bool_cell("maybe", b)); +} diff --git a/cpp/tools/format/input_format.cc b/cpp/tools/format/input_format.cc new file mode 100644 index 000000000..789ee153e --- /dev/null +++ b/cpp/tools/format/input_format.cc @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "format/input_format.h" + +#include + +namespace tsfile_cli { + +bool parse_datatype_name(const std::string& s, common::TSDataType& out) { + std::string u; + u.reserve(s.size()); + for (char c : s) { + u += static_cast(std::toupper(static_cast(c))); + } + if (u == "BOOLEAN") { + out = common::BOOLEAN; + } else if (u == "INT32") { + out = common::INT32; + } else if (u == "INT64") { + out = common::INT64; + } else if (u == "FLOAT") { + out = common::FLOAT; + } else if (u == "DOUBLE") { + out = common::DOUBLE; + } else if (u == "STRING") { + out = common::STRING; + } else if (u == "TEXT") { + out = common::TEXT; + } else { + return false; + } + return true; +} + +bool parse_category(const std::string& s, common::ColumnCategory& out) { + if (s == "tag") { + out = common::ColumnCategory::TAG; + } else if (s == "field") { + out = common::ColumnCategory::FIELD; + } else { + return false; + } + return true; +} + +std::vector split_line(const std::string& line, char delim, + bool csv_quotes) { + std::vector out; + std::string field; + if (!csv_quotes) { + for (char c : line) { + if (c == delim) { + out.push_back(field); + field.clear(); + } else { + field += c; + } + } + out.push_back(field); + return out; + } + bool in_quotes = false; + for (size_t i = 0; i < line.size(); ++i) { + char c = line[i]; + if (in_quotes) { + if (c == '"') { + if (i + 1 < line.size() && line[i + 1] == '"') { + field += '"'; + ++i; + } else { + in_quotes = false; + } + } else { + field += c; + } + } else if (c == '"') { + in_quotes = true; + } else if (c == delim) { + out.push_back(field); + field.clear(); + } else { + field += c; + } + } + out.push_back(field); + return out; +} + +bool parse_columns_spec(const std::string& spec, std::vector& out, + std::string& error) { + out.clear(); + if (spec.empty()) { + error = "empty --columns"; + return false; + } + std::vector items = split_line(spec, ',', false); + for (const std::string& item : items) { + std::vector parts = split_line(item, ':', false); + if (parts.size() != 3) { + error = "bad column '" + item + "' (want name:TYPE:category)"; + return false; + } + ColumnDef def; + def.name = parts[0]; + if (def.name.empty()) { + error = "empty column name in '" + item + "'"; + return false; + } + if (!parse_datatype_name(parts[1], def.type)) { + error = "unknown type '" + parts[1] + "'"; + return false; + } + if (!parse_category(parts[2], def.category)) { + error = "bad category '" + parts[2] + "' (want tag|field)"; + return false; + } + out.push_back(def); + } + return true; +} + +bool parse_bool_cell(const std::string& s, bool& out) { + std::string l; + l.reserve(s.size()); + for (char c : s) { + l += static_cast(std::tolower(static_cast(c))); + } + if (l == "true" || l == "1") { + out = true; + return true; + } + if (l == "false" || l == "0") { + out = false; + return true; + } + return false; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/format/input_format.h b/cpp/tools/format/input_format.h new file mode 100644 index 000000000..2838b0ff4 --- /dev/null +++ b/cpp/tools/format/input_format.h @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef TSFILE_CLI_INPUT_FORMAT_H +#define TSFILE_CLI_INPUT_FORMAT_H + +#include +#include + +#include "common/db_common.h" +#include "utils/db_utils.h" + +namespace tsfile_cli { + +struct ColumnDef { + std::string name; + common::TSDataType type; + common::ColumnCategory category; +}; + +bool parse_datatype_name(const std::string& s, common::TSDataType& out); +bool parse_category(const std::string& s, common::ColumnCategory& out); +bool parse_columns_spec(const std::string& spec, std::vector& out, + std::string& error); +std::vector split_line(const std::string& line, char delim, + bool csv_quotes); +bool parse_bool_cell(const std::string& s, bool& out); + +} // namespace tsfile_cli + +#endif // TSFILE_CLI_INPUT_FORMAT_H From a151e650da81b2688f6c15ca321696d17ff56f5e Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 15:57:18 +0800 Subject: [PATCH 22/41] Add tsfile CLI write argument parsing --- cpp/test/tools/cli_args_test.cc | 19 +++++++++++++++++++ cpp/tools/cli/cli_args.cc | 12 ++++++++++++ cpp/tools/cli/cli_args.h | 4 ++++ 3 files changed, 35 insertions(+) diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 464eb9597..bfb58636e 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -98,6 +98,25 @@ TEST(ParseArgsTest, MissingFileIsAllowedAtParseTime) { EXPECT_TRUE(p.file.empty()); } +TEST(ParseArgsTest, WriteFlagsParsed) { + auto p = tsfile_cli::parse_args({"write", "--table", "t1", "--columns", + "s1:INT64:field", "-o", "out.tsfile", "-v", + "--header-match", "in.csv"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.command, "write"); + EXPECT_EQ(p.table, "t1"); + EXPECT_EQ(p.columns, "s1:INT64:field"); + EXPECT_EQ(p.output, "out.tsfile"); + EXPECT_TRUE(p.verbose); + EXPECT_TRUE(p.header_match); + EXPECT_EQ(p.file, "in.csv"); +} + +TEST(ParseArgsTest, OutputFlagNeedsValue) { + auto p = tsfile_cli::parse_args({"write", "-o"}); + EXPECT_FALSE(p.error.empty()); +} + TEST(ParseArgsTest, SeedFlagParsed) { auto p = tsfile_cli::parse_args( {"sample", "-m", "s1", "-n", "3", "--seed", "42", "data.tsfile"}); diff --git a/cpp/tools/cli/cli_args.cc b/cpp/tools/cli/cli_args.cc index 15a587491..4747f6141 100644 --- a/cpp/tools/cli/cli_args.cc +++ b/cpp/tools/cli/cli_args.cc @@ -158,6 +158,18 @@ ParsedArgs parse_args(const std::vector& args) { return p; } p.has_seed = true; + } else if (a == "-o" || a == "--output") { + if (!need_value(a, p.output)) { + return p; + } + } else if (a == "--columns") { + if (!need_value(a, p.columns)) { + return p; + } + } else if (a == "-v" || a == "--verbose") { + p.verbose = true; + } else if (a == "--header-match") { + p.header_match = true; } else if (a == "--model") { if (!need_value(a, val)) { return p; diff --git a/cpp/tools/cli/cli_args.h b/cpp/tools/cli/cli_args.h index e979d7391..276bcc0e2 100644 --- a/cpp/tools/cli/cli_args.h +++ b/cpp/tools/cli/cli_args.h @@ -45,6 +45,10 @@ struct ParsedArgs { Format format = Format::kAuto; bool no_header = false; std::string model; + std::string output; + std::string columns; + bool verbose = false; + bool header_match = false; bool help = false; bool version = false; std::string error; From 591f3bbb5aa8e7ff3d32cbab2f9ebc88241272b3 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:02:47 +0800 Subject: [PATCH 23/41] Add tsfile CLI write command (CSV/TSV import) --- cpp/test/tools/command_e2e_test.cc | 43 +++++ cpp/tools/cli/run_cli.cc | 47 +++++- cpp/tools/commands/cmd_write.cc | 261 +++++++++++++++++++++++++++++ cpp/tools/commands/commands.h | 1 + 4 files changed, 349 insertions(+), 3 deletions(-) create mode 100644 cpp/tools/commands/cmd_write.cc diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 69929a2d8..344236751 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -20,6 +20,7 @@ #include #include +#include #include #include @@ -203,3 +204,45 @@ TEST(CliE2E, SampleIsReproducibleWithSeed) { EXPECT_EQ(count_lines(out1.str()), 4u); EXPECT_NE(out1.str().find("time\ts1\n"), std::string::npos); } + +TEST(CliE2E, WriteThenReadRoundTrip) { + std::string csv_path = "tsfile_cli_write_in.csv"; + { + std::ofstream o(csv_path.c_str()); + o << "time,id1,s1\n0,dev,0\n1,dev,10\n2,dev,20\n"; + } + std::string out_path = "tsfile_cli_write_out.tsfile"; + + std::ostringstream wout; + std::ostringstream werr; + int wc = tsfile_cli::run_cli( + {"write", "--table", "t1", "--columns", "id1:STRING:tag,s1:INT64:field", + "-o", out_path, csv_path}, + wout, werr); + EXPECT_EQ(wc, 0) << werr.str(); + + std::ostringstream cout_; + std::ostringstream cerr_; + int cc = tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); + EXPECT_EQ(cc, 0); + EXPECT_NE(cout_.str().find("\ts1\t3"), std::string::npos) << cout_.str(); + + std::ostringstream rout; + std::ostringstream rerr; + int rc = tsfile_cli::run_cli({"cat", "-m", "s1", "-f", "tsv", out_path}, + rout, rerr); + EXPECT_EQ(rc, 0); + EXPECT_EQ(rout.str(), "time\ts1\n0\t0\n1\t10\n2\t20\n"); + + std::remove(csv_path.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, WriteMissingColumnsIsUsageError) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"write", "--table", "t1", "-o", "x.tsfile", "in.csv"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--columns"), std::string::npos); +} diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 7084cd035..994df3d99 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -56,15 +56,21 @@ void print_usage(std::ostream& os) { " cat all rows of a device/table\n" " count row count\n" " sample deterministic sample rows (use -n and --seed)\n" + " write import CSV/TSV rows into a new table tsfile " + "(--table, --columns, -o)\n" "Options: -f/--format csv|tsv|json|table, -d/--device, -t/--table,\n" " -m/--measurements a,b, -n/--limit, --offset, --seed,\n" " --start, --end,\n" - " --no-header, --model tree|table, -h/--help, --version\n"; + " --no-header, --model tree|table, -h/--help, --version\n" + "Write options: --table, --columns name:TYPE:tag|field,..., " + "-o/--output,\n" + " --header-match, -v/--verbose\n"; } bool is_known_command(const std::string& c) { static const std::set kCmds = { - "ls", "schema", "meta", "stats", "head", "cat", "count", "sample"}; + "ls", "schema", "meta", "stats", "head", + "cat", "count", "sample", "write"}; return kCmds.count(c) != 0; } @@ -96,6 +102,32 @@ bool validate_command_flags(const ParsedArgs& p, std::ostream& err) { return true; } +bool validate_write_flags(const ParsedArgs& p, std::ostream& err) { + if (p.table.empty()) { + err << "Error: write requires --table\n"; + return false; + } + if (p.columns.empty()) { + err << "Error: write requires --columns\n"; + return false; + } + if (p.output.empty()) { + err << "Error: write requires -o/--output\n"; + return false; + } + if (p.format == ParsedArgs::Format::kJson || + p.format == ParsedArgs::Format::kTable) { + err << "Error: write input format must be csv or tsv\n"; + return false; + } + if (!p.measurements.empty() || !p.device.empty() || p.has_start || + p.has_end || p.has_seed || p.limit != -1 || p.offset != 0) { + err << "Error: read-only flags are not valid for write\n"; + return false; + } + return true; +} + } // namespace int run_cli(const std::vector& args, std::ostream& out, @@ -125,7 +157,7 @@ int run_cli(const std::vector& args, std::ostream& out, print_usage(err); return kExitUsage; } - if (p.file.empty()) { + if (p.command != "write" && p.file.empty()) { err << "Error: missing argument\n"; return kExitUsage; } @@ -134,6 +166,15 @@ int run_cli(const std::vector& args, std::ostream& out, return kExitUsage; } + if (p.command == "write") { + if (!validate_write_flags(p, err)) { + print_usage(err); + return kExitUsage; + } + storage::libtsfile_init(); + return cmd_write(p, out, err); + } + storage::libtsfile_init(); storage::TsFileReader reader; int open_ret = reader.open(p.file); diff --git a/cpp/tools/commands/cmd_write.cc b/cpp/tools/commands/cmd_write.cc new file mode 100644 index 000000000..d27b94ab2 --- /dev/null +++ b/cpp/tools/commands/cmd_write.cc @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "cli/cli_args.h" +#include "cli/exit_codes.h" +#include "commands/commands.h" +#include "common/schema.h" +#include "common/tablet.h" +#include "file/write_file.h" +#include "format/input_format.h" +#include "writer/tsfile_table_writer.h" + +namespace tsfile_cli { +namespace { + +struct DataRow { + long long line_no; + int64_t timestamp; + std::vector cells; +}; + +void strip_cr(std::string& s) { + if (!s.empty() && s.back() == '\r') { + s.pop_back(); + } +} + +bool add_typed_value(storage::Tablet& tablet, uint32_t row, + const ColumnDef& def, const std::string& cell, + std::string& error) { + if (cell.empty()) { + return true; // null + } + char* e = nullptr; + switch (def.type) { + case common::BOOLEAN: { + bool v = false; + if (!parse_bool_cell(cell, v)) { + error = "bad BOOLEAN '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, v); + return true; + } + case common::INT32: { + long v = std::strtol(cell.c_str(), &e, 10); + if (e == nullptr || *e != '\0') { + error = "bad INT32 '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, static_cast(v)); + return true; + } + case common::INT64: { + long long v = std::strtoll(cell.c_str(), &e, 10); + if (e == nullptr || *e != '\0') { + error = "bad INT64 '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, static_cast(v)); + return true; + } + case common::FLOAT: { + float v = std::strtof(cell.c_str(), &e); + if (e == nullptr || *e != '\0') { + error = "bad FLOAT '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, v); + return true; + } + case common::DOUBLE: { + double v = std::strtod(cell.c_str(), &e); + if (e == nullptr || *e != '\0') { + error = "bad DOUBLE '" + cell + "'"; + return false; + } + tablet.add_value(row, def.name, v); + return true; + } + case common::STRING: + case common::TEXT: { + tablet.add_value(row, def.name, cell); + return true; + } + default: + error = "unsupported column type"; + return false; + } +} + +} // namespace + +int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, + std::ostream& err) { + std::vector columns; + std::string perr; + if (!parse_columns_spec(args.columns, columns, perr)) { + err << "Error: " << perr << "\n"; + return kExitUsage; + } + + std::istream* in = &std::cin; + std::ifstream fin; + if (!args.file.empty() && args.file != "-") { + fin.open(args.file.c_str()); + if (!fin.is_open()) { + err << "Error: cannot open input: " << args.file << "\n"; + return kExitFile; + } + in = &fin; + } + + const char delim = (args.format == ParsedArgs::Format::kTsv) ? '\t' : ','; + const bool csv_quotes = (delim == ','); + + std::string line; + long long line_no = 0; + if (!args.no_header) { + if (std::getline(*in, line)) { + ++line_no; + strip_cr(line); + if (args.header_match) { + std::vector h = + split_line(line, delim, csv_quotes); + bool ok = (h.size() == columns.size() + 1); + for (size_t i = 0; ok && i < columns.size(); ++i) { + if (h[i + 1] != columns[i].name) { + ok = false; + } + } + if (!ok) { + err << "Error: header does not match --columns (line 1)\n"; + return kExitRuntime; + } + } + } + } + + std::vector rows; + while (std::getline(*in, line)) { + ++line_no; + strip_cr(line); + if (line.empty()) { + continue; + } + std::vector fields = split_line(line, delim, csv_quotes); + if (fields.size() != columns.size() + 1) { + err << "Error: expected " << (columns.size() + 1) << " fields, got " + << fields.size() << " (line " << line_no << ")\n"; + return kExitRuntime; + } + char* e = nullptr; + long long ts = std::strtoll(fields[0].c_str(), &e, 10); + if (e == nullptr || *e != '\0') { + err << "Error: bad timestamp '" << fields[0] << "' (line " << line_no + << ")\n"; + return kExitRuntime; + } + DataRow r; + r.line_no = line_no; + r.timestamp = static_cast(ts); + r.cells.assign(fields.begin() + 1, fields.end()); + rows.push_back(r); + } + + std::vector names; + std::vector types; + std::vector cats; + std::vector col_schemas; + for (const ColumnDef& d : columns) { + names.push_back(d.name); + types.push_back(d.type); + cats.push_back(d.category); + col_schemas.push_back(common::ColumnSchema( + d.name, d.type, common::UNCOMPRESSED, common::PLAIN, d.category)); + } + + storage::WriteFile file; + int flags = O_WRONLY | O_CREAT | O_TRUNC; +#ifdef _WIN32 + flags |= O_BINARY; +#endif + if (file.create(args.output, flags, 0666) != 0) { + err << "Error: cannot create output: " << args.output << "\n"; + return kExitFile; + } + auto* schema = new storage::TableSchema(args.table, col_schemas); + auto* writer = new storage::TsFileTableWriter(&file, schema); + + int rc = kExitOk; + const size_t kBatch = 1024; + for (size_t start = 0; start < rows.size() && rc == kExitOk; + start += kBatch) { + size_t end = std::min(start + kBatch, rows.size()); + storage::Tablet tablet(args.table, names, types, cats, + static_cast(end - start)); + for (size_t i = start; i < end && rc == kExitOk; ++i) { + uint32_t r = static_cast(i - start); + tablet.add_timestamp(r, rows[i].timestamp); + for (size_t j = 0; j < columns.size(); ++j) { + std::string cerr; + if (!add_typed_value(tablet, r, columns[j], rows[i].cells[j], + cerr)) { + err << "Error: " << cerr << " (line " << rows[i].line_no + << ")\n"; + rc = kExitRuntime; + break; + } + } + } + if (rc == kExitOk && writer->write_table(tablet) != 0) { + err << "Error: write_table failed\n"; + rc = kExitRuntime; + } + } + + if (rc == kExitOk) { + if (writer->flush() != 0 || writer->close() != 0) { + err << "Error: flush/close failed\n"; + rc = kExitRuntime; + } + } else { + writer->close(); + } + delete writer; + delete schema; + + if (rc == kExitOk && args.verbose) { + err << "wrote " << rows.size() << " rows to " << args.output << "\n"; + } + return rc; +} + +} // namespace tsfile_cli diff --git a/cpp/tools/commands/commands.h b/cpp/tools/commands/commands.h index 8ca49b994..085a54822 100644 --- a/cpp/tools/commands/commands.h +++ b/cpp/tools/commands/commands.h @@ -53,6 +53,7 @@ int cmd_cat(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& err); +int cmd_write(const ParsedArgs& args, std::ostream& out, std::ostream& err); } // namespace tsfile_cli From 9116ac439f07c8709aa37a6fc3b33daf670f66e1 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:09:05 +0800 Subject: [PATCH 24/41] Treat '-' as stdin positional, not an unknown flag --- cpp/test/tools/cli_args_test.cc | 7 +++++++ cpp/tools/cli/cli_args.cc | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index bfb58636e..362cf3d9c 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -117,6 +117,13 @@ TEST(ParseArgsTest, OutputFlagNeedsValue) { EXPECT_FALSE(p.error.empty()); } +TEST(ParseArgsTest, DashIsStdinPositional) { + auto p = tsfile_cli::parse_args({"write", "--table", "t1", "--columns", + "s1:INT64:field", "-o", "out.tsfile", "-"}); + EXPECT_TRUE(p.error.empty()); + EXPECT_EQ(p.file, "-"); +} + TEST(ParseArgsTest, SeedFlagParsed) { auto p = tsfile_cli::parse_args( {"sample", "-m", "s1", "-n", "3", "--seed", "42", "data.tsfile"}); diff --git a/cpp/tools/cli/cli_args.cc b/cpp/tools/cli/cli_args.cc index 4747f6141..cfdb9d75d 100644 --- a/cpp/tools/cli/cli_args.cc +++ b/cpp/tools/cli/cli_args.cc @@ -185,7 +185,7 @@ ParsedArgs parse_args(const std::vector& args) { p.help = true; } else if (a == "--version") { p.version = true; - } else if (!a.empty() && a[0] == '-') { + } else if (a.size() > 1 && a[0] == '-') { p.error = "Unknown flag: " + a; return p; } else { From a57529284326f4b27728dfef91ddc9cf7b30afe5 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:10:34 +0800 Subject: [PATCH 25/41] Format tsfile CLI write sources with clang-format --- cpp/test/tools/cli_args_test.cc | 5 +++-- cpp/test/tools/command_e2e_test.cc | 3 ++- cpp/tools/cli/run_cli.cc | 6 +++--- cpp/tools/commands/cmd_write.cc | 4 ++-- 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 362cf3d9c..08042f741 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -118,8 +118,9 @@ TEST(ParseArgsTest, OutputFlagNeedsValue) { } TEST(ParseArgsTest, DashIsStdinPositional) { - auto p = tsfile_cli::parse_args({"write", "--table", "t1", "--columns", - "s1:INT64:field", "-o", "out.tsfile", "-"}); + auto p = + tsfile_cli::parse_args({"write", "--table", "t1", "--columns", + "s1:INT64:field", "-o", "out.tsfile", "-"}); EXPECT_TRUE(p.error.empty()); EXPECT_EQ(p.file, "-"); } diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 344236751..3ee86a19f 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -223,7 +223,8 @@ TEST(CliE2E, WriteThenReadRoundTrip) { std::ostringstream cout_; std::ostringstream cerr_; - int cc = tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); + int cc = + tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); EXPECT_EQ(cc, 0); EXPECT_NE(cout_.str().find("\ts1\t3"), std::string::npos) << cout_.str(); diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 994df3d99..837c43a64 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -68,9 +68,9 @@ void print_usage(std::ostream& os) { } bool is_known_command(const std::string& c) { - static const std::set kCmds = { - "ls", "schema", "meta", "stats", "head", - "cat", "count", "sample", "write"}; + static const std::set kCmds = {"ls", "schema", "meta", + "stats", "head", "cat", + "count", "sample", "write"}; return kCmds.count(c) != 0; } diff --git a/cpp/tools/commands/cmd_write.cc b/cpp/tools/commands/cmd_write.cc index d27b94ab2..e08bd405d 100644 --- a/cpp/tools/commands/cmd_write.cc +++ b/cpp/tools/commands/cmd_write.cc @@ -179,8 +179,8 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, char* e = nullptr; long long ts = std::strtoll(fields[0].c_str(), &e, 10); if (e == nullptr || *e != '\0') { - err << "Error: bad timestamp '" << fields[0] << "' (line " << line_no - << ")\n"; + err << "Error: bad timestamp '" << fields[0] << "' (line " + << line_no << ")\n"; return kExitRuntime; } DataRow r; From f41ccab64c1d97f4370bc72b0aa562bfb220d3f9 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:16:18 +0800 Subject: [PATCH 26/41] Update tsfile-cli skill to document the write command --- .claude/skills/tsfile-cli/SKILL.md | 51 +++++++++++++++++++++++------- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/.claude/skills/tsfile-cli/SKILL.md b/.claude/skills/tsfile-cli/SKILL.md index 3be19f877..286f765c8 100644 --- a/.claude/skills/tsfile-cli/SKILL.md +++ b/.claude/skills/tsfile-cli/SKILL.md @@ -1,18 +1,17 @@ --- name: tsfile-cli -description: Use when you need to inspect, preview, or export an Apache TsFile (.tsfile) from the command line — listing devices/tables, dumping schema, reading file/series metadata, counting rows, or sampling/previewing rows — via the project's read-only C++ `tsfile` CLI in cpp/tools. +description: Use when you need to inspect, preview, export, OR import an Apache TsFile (.tsfile) from the command line — listing devices/tables, dumping schema, reading file/series metadata, counting rows, sampling/previewing rows, or writing CSV/TSV rows into a new .tsfile — via the project's C++ `tsfile-cli` in cpp/tools. --- # tsfile CLI ## Overview -`tsfile` is a single, read-only, pipe-friendly C++ binary for inspecting a `.tsfile` -without writing reader code — the TsFile analogue of `parquet-cli`/`pqrs`. Source: -`cpp/tools/`. Data goes to **stdout**, diagnostics/errors to **stderr**, so it composes -with `awk`, `jq`, `sort`, etc. - -It is **read-only**: there is no write/convert verb (see [Writing](#writing-a-tsfile)). +`tsfile-cli` is a single, pipe-friendly C++ binary for inspecting **and** importing +`.tsfile` data without writing reader/writer code — the TsFile analogue of +`parquet-cli`/`pqrs`. Source: `cpp/tools/`. Read commands send data to **stdout** and +diagnostics to **stderr** (so they compose with `awk`, `jq`, `sort`); the `write` command +imports CSV/TSV into a new `.tsfile` (see **Writing** below). ## Locating / building the binary @@ -95,9 +94,37 @@ $BIN cat -f csv data.tsfile 2>/dev/null | awk -F, 'NR>1{n++} END{print n}' and the `stats`/`count` rows may be fewer than the `schema` rows — not a discrepancy bug. - **Build needs `--disable-antlr4` on CMake ≥ 4** (see above). -## Writing a TsFile +## Writing (`write`): import CSV/TSV → tsfile + +`tsfile-cli write` imports rows into a **new table-model** `.tsfile` (output is overwritten). +The first input column is the timestamp (epoch ms); the rest are declared explicitly with +`--columns` — **no type inference**. + +``` +tsfile-cli write --table --columns -o \ + [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -] +``` + +| Option | Meaning | +|---|---| +| `--table ` | output table name (lower-cased) | +| `--columns "id1:STRING:tag,s1:INT64:field"` | ordered data columns; category `tag\|field`; type ∈ BOOLEAN/INT32/INT64/FLOAT/DOUBLE/STRING/TEXT (case-insensitive) | +| `-o, --output ` | output `.tsfile` (required, overwritten) | +| `` / `-` | input file, or `-`/omitted = **stdin** | +| `-f csv\|tsv` | input delimiter (default csv; `json`/`table` rejected) | +| `--no-header` / `--header-match` | input has no header / validate header names vs `--columns` | +| `-v, --verbose` | print `wrote N rows to ` to stderr (else **silent on success**) | + +Empty cell = null. Exit codes: `1` usage (missing `--table`/`--columns`/`-o`, bad +`--columns`, read-only flags), `2` input/output open fail, `3` bad row (field count / type +/ header mismatch). + +```sh +printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' \ + | tsfile-cli write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o out.tsfile - +tsfile-cli count -f tsv out.tsfile # -> t1.dev s1 2 +``` -The CLI does **not** write. Produce a `.tsfile` with the C++ SDK — see -`cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter` / `TsFileWriter` + -`Tablet`), then inspect the result with this CLI. Java and Python writers exist under -`java/` and `python/`. +For **tree-model** writes, JSON input, or programmatic use, use the C++ SDK — +`cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter`/`TsFileWriter` + `Tablet`); +Java/Python writers live under `java/`, `python/`. From 9cee6c51cb15783c5bded54f20c5be403d92488f Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:23:56 +0800 Subject: [PATCH 27/41] Condense tsfile-cli skill with formal notation --- .claude/skills/tsfile-cli/SKILL.md | 156 +++++++++++------------------ 1 file changed, 57 insertions(+), 99 deletions(-) diff --git a/.claude/skills/tsfile-cli/SKILL.md b/.claude/skills/tsfile-cli/SKILL.md index 286f765c8..48ff61729 100644 --- a/.claude/skills/tsfile-cli/SKILL.md +++ b/.claude/skills/tsfile-cli/SKILL.md @@ -1,130 +1,88 @@ --- name: tsfile-cli -description: Use when you need to inspect, preview, export, OR import an Apache TsFile (.tsfile) from the command line — listing devices/tables, dumping schema, reading file/series metadata, counting rows, sampling/previewing rows, or writing CSV/TSV rows into a new .tsfile — via the project's C++ `tsfile-cli` in cpp/tools. +description: Use when you need to inspect, preview, export, OR import an Apache TsFile (.tsfile) from the command line — list devices/tables, dump schema, read file/series metadata, count rows, sample/preview rows, or write CSV/TSV into a new .tsfile — via the project's C++ `tsfile-cli` in cpp/tools. --- -# tsfile CLI +# tsfile-cli -## Overview +Single pipe-friendly C++ binary to inspect **and** import `.tsfile` (TsFile's analogue of +`parquet-cli`/`pqrs`). Source `cpp/tools/`. Read data → stdout, diagnostics → stderr; +`write` imports CSV/TSV → a new file. -`tsfile-cli` is a single, pipe-friendly C++ binary for inspecting **and** importing -`.tsfile` data without writing reader/writer code — the TsFile analogue of -`parquet-cli`/`pqrs`. Source: `cpp/tools/`. Read commands send data to **stdout** and -diagnostics to **stderr** (so they compose with `awk`, `jq`, `sort`); the `write` command -imports CSV/TSV into a new `.tsfile` (see **Writing** below). +## Binary -## Locating / building the binary +- Name `tsfile-cli` (CMake target `tsfile_cli`). Find: `ls cpp/build/*/bin/tsfile-cli`. +- Build only if missing: `cd cpp && bash build.sh -t=Debug`. +- CMake ≥4 aborts on bundled ANTLR4 (`Policy CMP00xx ... OLD`) → add `--disable-antlr4` + (reader/CLI don't use ANTLR4). -The executable is named **`tsfile-cli`** (the CMake *target* is `tsfile_cli`). Look first, -build only if missing: +## Read -```sh -ls cpp/build/*/bin/tsfile-cli # prebuilt? e.g. cpp/build/Debug/bin/tsfile-cli -cd cpp && bash build.sh -t=Debug # build if absent (binary in build/Debug/bin/tsfile-cli) -``` - -If CMake ≥ 4 aborts configuring the bundled ANTLR4 (`Policy CMP00xx may not be set to -OLD`), add `--disable-antlr4` — the reader and CLI don't use ANTLR4: - -```sh -cd cpp && bash build.sh -t=Debug --disable-antlr4 -``` +`tsfile-cli [opts] ` · `tsfile-cli --help | --version | help ` -## Commands - -``` -tsfile-cli [options] -tsfile-cli --help | --version | help -``` - -| Command | Output | Scans data pages? | +| cmd | output | scans pages | |---|---|---| -| `ls` | one device (tree model) or table (table model) per line | no | -| `schema` | `target, measurement, datatype, encoding, compression` | no | -| `meta` | file summary: model, version, device/table/series counts, time range, bloom, size | no | -| `stats` | per-series `count, start_time, end_time, min, max, first, last, sum` | no | -| `count` | per-series row counts + `total` row (from statistics) | no | +| `ls` | device (tree) / table (table) per line | no | +| `schema` | `target,measurement,datatype,encoding,compression` | no | +| `meta` | model, version, device/table/series counts, time range, bloom, size | no | +| `stats` | per-series `count,start,end,min,max,first,last,sum` | no | +| `count` | per-series counts + `total` row | no | | `head` | first N rows (default 10, `-n`) | yes | | `cat` | all matching rows (streamed) | yes | -| `sample` | reproducible reservoir sample (default 10, `-n` + `--seed`) | yes | - -Use the no-scan metadata verbs (`ls`/`schema`/`meta`/`stats`/`count`) first — they answer -most inspection questions cheaply and reliably. +| `sample` | reservoir sample (default 10, `-n` + `--seed`) | yes | -## Shared options +Prefer no-scan verbs (`ls/schema/meta/stats/count`) — cheap and never hit the page-decode caveat. -| Option | Meaning | Applies to | -|---|---|---| -| `-f, --format csv\|tsv\|json\|table` | output format; auto = `table` on a TTY, `tsv` when piped | all | -| `-d, --device ` / `-t, --table ` | scope to one device / table (mutually exclusive) | row cmds, `schema`, `stats`, `count` | -| `-m, --measurements a,b,c` | column projection | `schema`, `head`, `cat`, `sample` | -| `-n, --limit N` / `--offset N` | row cap / skip (`--offset` invalid for `sample`) | `head`, `cat`, (`--offset`: not `sample`) | -| `--start ` / `--end ` | inclusive epoch-millisecond time range | `head`, `cat`, `sample` | -| `--seed N` | reproducible sampling seed (only valid for `sample`) | `sample` | -| `--no-header`, `--model tree\|table` | suppress header; force model (else auto-detected) | all | - -`json` is NDJSON (one object per line); numbers/booleans bare, others quoted, `null` as -`null`. CSV follows RFC 4180. Timestamps are raw epoch milliseconds. - -Exit codes: `0` ok · `1` usage/argument error · `2` file open/corrupt · `3` query/runtime. - -## Examples +``` +opts: -f csv|tsv|json|table (default TTY→table, pipe→tsv) + -d | -t (mutually exclusive) + -m a,b,c (projection) · -n N · --offset N · --start · --end (inclusive) + --seed N · --no-header · --model tree|table (else auto) +applies: -m → schema/head/cat/sample · -d/-t → row cmds/schema/stats/count · --offset ∉ sample +json=NDJSON (num/bool bare, else quoted, null→null) · csv=RFC4180 · ts=raw epoch ms +exit: 0 ok · 1 usage · 2 file open/corrupt · 3 query/runtime +``` ```sh -BIN=cpp/build/Debug/bin/tsfile-cli -$BIN ls -f tsv data.tsfile # namespaces, one per line -$BIN meta data.tsfile # quick file overview -$BIN count -t table1 -f tsv data.tsfile # row counts, no page scan -$BIN cat -m temp,humidity --start 1700000000000 -f csv data.tsfile | head -$BIN sample -m temp -n 20 --seed 42 -f json data.tsfile | jq . -$BIN cat -f csv data.tsfile 2>/dev/null | awk -F, 'NR>1{n++} END{print n}' +B=cpp/build/Debug/bin/tsfile-cli +$B meta data.tsfile; $B count -t table1 -f tsv data.tsfile +$B cat -m temp --start 1700000000000 -f csv data.tsfile 2>/dev/null | head ``` -## Known caveats - -- **Row commands can abort on some files.** `head`/`cat`/`sample` decode data pages and - may hit a reader assertion (`decode_cur_time_page_data`, `aligned_chunk_reader.cc`, - exit 134) on certain aligned files — including the bundled `cpp/examples/test_cpp.tsfile`. - This is a storage-engine/file issue, not a CLI bug; the metadata verbs still work on - such files. For row data, use a well-formed file (e.g. one you wrote yourself). -- **Garbled `target` for table model.** A table-model device id is built from tag-column - bytes, so `stats`/`count`/`schema` may print non-printable characters in `target`. -- **`schema` can list more columns than `meta`/`stats`/`count` report as series.** Tag/id - columns show up in `schema` but aren't always counted as field series, so `series_count` - and the `stats`/`count` rows may be fewer than the `schema` rows — not a discrepancy bug. -- **Build needs `--disable-antlr4` on CMake ≥ 4** (see above). +## Write -## Writing (`write`): import CSV/TSV → tsfile +`tsfile-cli write --table --columns -o [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -]` -`tsfile-cli write` imports rows into a **new table-model** `.tsfile` (output is overwritten). -The first input column is the timestamp (epoch ms); the rest are declared explicitly with -`--columns` — **no type inference**. +Imports rows into a **new table-model** file (overwritten). Input col 0 = timestamp +(epoch ms, int); remaining cols declared by `--columns` — **no type inference**. ``` -tsfile-cli write --table --columns -o \ - [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -] +spec := col (',' col)* +col := name ':' TYPE ':' ('tag' | 'field') # TYPE case-insensitive +TYPE ∈ { BOOLEAN, INT32, INT64, FLOAT, DOUBLE, STRING, TEXT } +input := file | '-' | omitted # '-' or omitted = stdin ``` -| Option | Meaning | -|---|---| -| `--table ` | output table name (lower-cased) | -| `--columns "id1:STRING:tag,s1:INT64:field"` | ordered data columns; category `tag\|field`; type ∈ BOOLEAN/INT32/INT64/FLOAT/DOUBLE/STRING/TEXT (case-insensitive) | -| `-o, --output ` | output `.tsfile` (required, overwritten) | -| `` / `-` | input file, or `-`/omitted = **stdin** | -| `-f csv\|tsv` | input delimiter (default csv; `json`/`table` rejected) | -| `--no-header` / `--header-match` | input has no header / validate header names vs `--columns` | -| `-v, --verbose` | print `wrote N rows to ` to stderr (else **silent on success**) | - -Empty cell = null. Exit codes: `1` usage (missing `--table`/`--columns`/`-o`, bad -`--columns`, read-only flags), `2` input/output open fail, `3` bad row (field count / type -/ header mismatch). +- `-o` required (overwritten); `-f` default csv (json/table → usage error). +- header: first line skipped by default · `--no-header` if none · `--header-match` validates + header names vs `--columns`. +- empty cell = null · `--table` is lower-cased · success **silent**, `-v` → `wrote N rows to ` on stderr. +- exit: `1` usage (missing `--table`/`--columns`/`-o`, bad spec, read-only flag) · `2` IO open · `3` row (field-count / type / header mismatch). ```sh printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' \ | tsfile-cli write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o out.tsfile - -tsfile-cli count -f tsv out.tsfile # -> t1.dev s1 2 +tsfile-cli count -f tsv out.tsfile # -> t1.dev s1 2 ``` -For **tree-model** writes, JSON input, or programmatic use, use the C++ SDK — -`cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter`/`TsFileWriter` + `Tablet`); -Java/Python writers live under `java/`, `python/`. +Tree-model / JSON / programmatic writes → C++ SDK `cpp/examples/cpp_examples/demo_write.cpp` +(`TsFileTableWriter`/`TsFileWriter` + `Tablet`); Java/Python writers under `java/`, `python/`. + +## Caveats + +- `head`/`cat`/`sample` decode pages → may abort (`decode_cur_time_page_data`, exit 134) on + some aligned files incl. bundled `cpp/examples/test_cpp.tsfile`. Storage-engine/file issue, + not a CLI bug; metadata verbs still work. Use a well-formed (e.g. self-written) file for rows. +- table-model `target` is derived from tag bytes → may show non-printable chars in `stats/count/schema`. +- `schema` lists all columns; `meta/stats/count` count only field series → `series_count` can be + fewer than `schema` rows (not a bug). From 3ce68ccf333e435a78274724fdf428b4967bba1f Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:37:59 +0800 Subject: [PATCH 28/41] Ignore .codegraph and cpp/root test-run artifacts --- .gitignore | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/.gitignore b/.gitignore index d27277565..170a17a07 100644 --- a/.gitignore +++ b/.gitignore @@ -52,3 +52,13 @@ cpp/third_party/zlib-1.3.1/zlib-1.3.1/treebuild.xml .claude/todos/ .claude/worktrees/ .claude/scheduled_tasks.json + +# CodeGraph local index +.codegraph/ + +# Test-run artifacts (temp .tsfile/.dat written to the working dir or repo root) +cpp/cwrapper_*.tsfile +cpp/tsfile_writer_*.tsfile +cpp/*.dat +/*.tsfile +/*.dat From 49b8cb39e37ba6c1dcb154a4a9ee3b06c1eae9ba Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:43:12 +0800 Subject: [PATCH 29/41] Add QA learning log for tsfile CLI work --- QA_Log.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 QA_Log.md diff --git a/QA_Log.md b/QA_Log.md new file mode 100644 index 000000000..d2b60779c --- /dev/null +++ b/QA_Log.md @@ -0,0 +1,20 @@ +# 学习问答日志(QA_Log) + +> 概念性 / 学习性问题的问答记录,从新到旧倒序。 + +## 2026-06-03 | 分析一下现在这个方案是否符合 unix 的一贯用法 + +- **答案核心**:对照 Unix CLI 惯例分析 `tsfile-cli write`(CSV/TSV→tsfile)设计—— + - **符合**:stdin 输入 + `-` 约定、`-o` 给二进制输出(同 gcc/tar/zip/parquet/sqlite)、data 无 stdout 诊断走 stderr + 0/1/2/3 退出码、`--columns` 冒号逗号小语法(同 mount -o/ps -o/cut)、`-o` 静默覆盖(同 cp/sort -o)。 + - **被格式逼出来的合理偏离**:`write` 是 sink 不是 filter(输出只能落文件、不能进 stdout/续管道),因为 `.tsfile` 尾部写 footer 需 seek、pipe 不可 seek——与 tar/zip/parquet 同理,不必改。 + - **真正值得改(唯一)**:成功时打印摘要违反 *silence is golden*;应默认静默、摘要放 `-v/--verbose`。 + - **次要可商榷**:三个必填 flag + 不推断 + 跳过表头不校验 + 首列隐含为时间,是「正确性换便利」的取舍(惯例内);「按表头名校验」可作可选 `--header-match`。 +- **相关引用**:`docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md`;读侧 `2026-06-02-tsfile-cli-design.md` §8(stdout/stderr、退出码);调研 `各文件格式CLI工具调研.md`(Parquet convert-csv / ORC convert / Avro fromjson 写路径)。 + +- **答案核心**:`.claude/skills/tsfile-cli/SKILL.md` 这类 project skill 是**给模型(Claude)用的说明书**,不是用户手动运行的命令。机制: + - Claude Code 在**会话启动**时扫描 `.claude/skills/*/SKILL.md`,只载入 frontmatter 的 `name` + `description`; + - 当用户请求与 `description` 的触发条件匹配时,模型用 **Skill 工具**把整篇正文拉进上下文再执行; + - 用户用法:① 提相关需求自动触发(如“看 X.tsfile 的 schema/行数”),② 显式“用 tsfile-cli skill …”强制触发; + - **刚创建的 skill 要新开会话才会被注册**;且必须在该仓库、含此文件的分支(当前在 `feat/tsfile-cli`,未合回 develop)下才可见; + - 验证:新会话里问“列一下可用 skills”或丢个 `.tsfile` 让模型查,看是否声明 `Using tsfile-cli skill`。 +- **相关引用**:`.claude/skills/tsfile-cli/SKILL.md`;superpowers `writing-skills`(CSO:`description` 只写“何时用”、不写流程);本仓库 `docs/superpowers/specs/2026-06-02-tsfile-cli-design.md`。 From 80ea038a360bc6982711b9de3315a6e6ef1c11b6 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 16:59:32 +0800 Subject: [PATCH 30/41] Move tsfile-cli skill to cpp/tools/skills; stop tracking .claude --- .gitignore | 7 ++----- {.claude => cpp/tools}/skills/tsfile-cli/SKILL.md | 0 2 files changed, 2 insertions(+), 5 deletions(-) rename {.claude => cpp/tools}/skills/tsfile-cli/SKILL.md (100%) diff --git a/.gitignore b/.gitignore index 170a17a07..9f4c3b7a4 100644 --- a/.gitignore +++ b/.gitignore @@ -47,11 +47,8 @@ build/* cpp/third_party/zlib-1.3.1/treebuild.xml cpp/third_party/zlib-1.3.1/zlib-1.3.1/treebuild.xml -# Claude Code -.claude/settings.local.json -.claude/todos/ -.claude/worktrees/ -.claude/scheduled_tasks.json +# Claude Code (local AI tooling — not uploaded; skill lives in cpp/tools/skills) +.claude/ # CodeGraph local index .codegraph/ diff --git a/.claude/skills/tsfile-cli/SKILL.md b/cpp/tools/skills/tsfile-cli/SKILL.md similarity index 100% rename from .claude/skills/tsfile-cli/SKILL.md rename to cpp/tools/skills/tsfile-cli/SKILL.md From 604dcc77e735669959db7919eb8c92027a21001f Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 17:01:32 +0800 Subject: [PATCH 31/41] Stop tracking docs/superpowers and QA_Log.md (keep local, out of PR) --- .gitignore | 4 + QA_Log.md | 20 - .../plans/2026-06-02-tsfile-cli.md | 1288 ----------------- .../plans/2026-06-03-tsfile-cli-write.md | 947 ------------ .../specs/2026-06-02-tsfile-cli-design.md | 334 ----- .../2026-06-03-tsfile-cli-write-design.md | 203 --- 6 files changed, 4 insertions(+), 2792 deletions(-) delete mode 100644 QA_Log.md delete mode 100644 docs/superpowers/plans/2026-06-02-tsfile-cli.md delete mode 100644 docs/superpowers/plans/2026-06-03-tsfile-cli-write.md delete mode 100644 docs/superpowers/specs/2026-06-02-tsfile-cli-design.md delete mode 100644 docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md diff --git a/.gitignore b/.gitignore index 9f4c3b7a4..fcd5f2b6b 100644 --- a/.gitignore +++ b/.gitignore @@ -59,3 +59,7 @@ cpp/tsfile_writer_*.tsfile cpp/*.dat /*.tsfile /*.dat + +# AI workflow artifacts (kept local, not uploaded) +docs/superpowers/ +/QA_Log.md diff --git a/QA_Log.md b/QA_Log.md deleted file mode 100644 index d2b60779c..000000000 --- a/QA_Log.md +++ /dev/null @@ -1,20 +0,0 @@ -# 学习问答日志(QA_Log) - -> 概念性 / 学习性问题的问答记录,从新到旧倒序。 - -## 2026-06-03 | 分析一下现在这个方案是否符合 unix 的一贯用法 - -- **答案核心**:对照 Unix CLI 惯例分析 `tsfile-cli write`(CSV/TSV→tsfile)设计—— - - **符合**:stdin 输入 + `-` 约定、`-o` 给二进制输出(同 gcc/tar/zip/parquet/sqlite)、data 无 stdout 诊断走 stderr + 0/1/2/3 退出码、`--columns` 冒号逗号小语法(同 mount -o/ps -o/cut)、`-o` 静默覆盖(同 cp/sort -o)。 - - **被格式逼出来的合理偏离**:`write` 是 sink 不是 filter(输出只能落文件、不能进 stdout/续管道),因为 `.tsfile` 尾部写 footer 需 seek、pipe 不可 seek——与 tar/zip/parquet 同理,不必改。 - - **真正值得改(唯一)**:成功时打印摘要违反 *silence is golden*;应默认静默、摘要放 `-v/--verbose`。 - - **次要可商榷**:三个必填 flag + 不推断 + 跳过表头不校验 + 首列隐含为时间,是「正确性换便利」的取舍(惯例内);「按表头名校验」可作可选 `--header-match`。 -- **相关引用**:`docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md`;读侧 `2026-06-02-tsfile-cli-design.md` §8(stdout/stderr、退出码);调研 `各文件格式CLI工具调研.md`(Parquet convert-csv / ORC convert / Avro fromjson 写路径)。 - -- **答案核心**:`.claude/skills/tsfile-cli/SKILL.md` 这类 project skill 是**给模型(Claude)用的说明书**,不是用户手动运行的命令。机制: - - Claude Code 在**会话启动**时扫描 `.claude/skills/*/SKILL.md`,只载入 frontmatter 的 `name` + `description`; - - 当用户请求与 `description` 的触发条件匹配时,模型用 **Skill 工具**把整篇正文拉进上下文再执行; - - 用户用法:① 提相关需求自动触发(如“看 X.tsfile 的 schema/行数”),② 显式“用 tsfile-cli skill …”强制触发; - - **刚创建的 skill 要新开会话才会被注册**;且必须在该仓库、含此文件的分支(当前在 `feat/tsfile-cli`,未合回 develop)下才可见; - - 验证:新会话里问“列一下可用 skills”或丢个 `.tsfile` 让模型查,看是否声明 `Using tsfile-cli skill`。 -- **相关引用**:`.claude/skills/tsfile-cli/SKILL.md`;superpowers `writing-skills`(CSO:`description` 只写“何时用”、不写流程);本仓库 `docs/superpowers/specs/2026-06-02-tsfile-cli-design.md`。 diff --git a/docs/superpowers/plans/2026-06-02-tsfile-cli.md b/docs/superpowers/plans/2026-06-02-tsfile-cli.md deleted file mode 100644 index 55106268e..000000000 --- a/docs/superpowers/plans/2026-06-02-tsfile-cli.md +++ /dev/null @@ -1,1288 +0,0 @@ - - -# TsFile CLI(`tsfile`)Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** 从当前「半迁移」工作树出发,把 C++ `tsfile` CLI 收尾为完整的只读 8 动词工具 -(`ls / schema / meta / stats / head / cat / count / sample`),并清掉残留的 `select` -死代码,使整套实现可构建、测试通过、可提交。 - -**Architecture:** 保留现有 `cpp/tools/` 分层:`cli/` 负责参数解析与分发,`commands/` -负责读 metadata 或 row query,`format/` 负责 `RowWriter` 与 `ResultSet` 输出。新增 -`stat_table.*` 复用 `Statistic` 格式化逻辑给 `stats`/`count`/`meta` 共用;新增 -sampled result-set writer 复用现有 cell extraction。不修改存储引擎。 - -**Tech Stack:** C++11/C++14 兼容代码(测试目标 `-std=c++14`),CMake `BUILD_TOOLS`, -Google Test 1.12.1,现有 `storage::TsFileReader`、`storage::Statistic`、`RowWriter`、 -`write_result_set`。 - -**Spec:** `docs/superpowers/specs/2026-06-02-tsfile-cli-design.md` - ---- - -## 执行前提 - -- 工作目录:`/Users/zhanghongyin/iotdb/tsfile`。 -- 执行前 `git status --short`,确认未把 `.codegraph/` 或无关改动纳入暂存。 -- 每个新建 `.h`/`.cc` 文件都以 Apache 2.0 块注释头(`/* ... */`)开头——从任一现有 - `cpp/tools/**` 文件原样复制。下文代码块为简洁省略了该头,**新建文件时务必前置**。 -- 所有 CLI 代码在 `namespace tsfile_cli` 内。 -- **构建环境注意(本机 CMake 4.3.2)**:bundled `third_party/antlr4-cpp-runtime-4` - 把已被移除的旧 CMake policy 设为 OLD,CMake 4.x 直接报错;必须 `--disable-antlr4` - 绕开(reader/CLI 不依赖 ANTLR4,已验证可编译可测试)。另外 `build.sh` 默认 - `build_test=0` 且无命令行开关,执行期间已临时改为 `build_test=1`(Task 6 收尾时 - `git checkout cpp/build.sh` 还原)。测试可执行文件落在 `build/Debug/test/lib/`。 -- C++ 验证命令从 `cpp/` 目录运行: - -```bash -bash build.sh -t=Debug --disable-antlr4 -./build/Debug/test/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* -``` - -> 计划各 Task 内 `Run:` 行仍写的是旧的 `bash build.sh -t=Debug` 和 -> `./build/Debug/lib/TsFile_Test`,请按上面这条「环境注意」统一替换为 -> `--disable-antlr4` 构建命令与 `build/Debug/test/lib/TsFile_Test` 测试路径。 - -## 起点:当前工作树状态(2026-06-02) - -- **已提交**(commit `a392a56f`,仅四个文件):`cpp/tools/cli/cli_args.h`、 - `cli_args.cc`、`run_cli.cc`、`cpp/test/tools/cli_args_test.cc`。命令面已是 8 动词, - 含 `--seed` 解析、`validate_command_flags`;`select` 不在白名单;`meta`/`count`/ - `sample` 被 `is_unimplemented_command` 拦截返回 “command not implemented yet”。 -- **未提交(untracked)**:`cpp/tools/CMakeLists.txt`、`tools_main.cc`、 - `cli/exit_codes.h`、`cli/run_cli.h`、`commands/`(`commands.h`、`row_query.cc`、 - `cmd_ls/cmd_schema/cmd_stats/cmd_head/cmd_cat/cmd_select.cc`)、`format/` - (`output_format.*`、`result_set_format.*`)、`cpp/test/tools/cli_test_util.h`、 - `command_e2e_test.cc`、`output_format_test.cc`。 -- **已修改(tracked)**:`cpp/CMakeLists.txt`(`BUILD_TOOLS`)、 - `cpp/src/file/read_file.cc`(open 错误改 stderr)、`cpp/test/CMakeLists.txt` - (glob tools 测试、链接 `tsfile_cli_obj`)。 -- **遗留不一致(Task 1 修复)**:`cmd_select.cc` 与其声明是死代码; - `command_e2e_test.cc` 仍以 `select` 命令测试,与已移除 `select` 的命令面冲突。 - -## 文件结构 - -保留职责:`cli/cli_args.*`、`cli/run_cli.cc`、`commands/commands.h`、 -`commands/row_query.cc`、`format/output_format.*`、`format/result_set_format.*`。 - -新增文件: - -- `cpp/tools/commands/stat_table.h` / `.cc`:`SeriesStatRow`、`FileSummary`、 - `StatisticCells`,以及 `collect_series_stats`、`collect_file_summary`、 - `statistic_value_cells`,供 `stats`/`count`/`meta` 共用。 -- `cpp/tools/commands/cmd_meta.cc`、`cmd_count.cc`、`cmd_sample.cc`。 -- `cpp/test/tools/stat_table_test.cc`:统计值格式化 helper 单元测试。 - -删除文件: - -- `cpp/tools/commands/cmd_select.cc`:`select` 能力已并入 `cat/head/sample` 共享参数。 - ---- - -### Task 1: 调和基线 —— 移除 `select` 死代码、构建变绿、提交既有实现 - -**Files:** -- Delete: `cpp/tools/commands/cmd_select.cc` -- Modify: `cpp/tools/commands/commands.h` -- Modify: `cpp/test/tools/command_e2e_test.cc` -- Modify: `cpp/test/tools/cli_args_test.cc` - -本任务不引入新功能,只让 untracked 实现与已提交的 8 动词命令面一致,并把整套实现提交为 -工作基线。 - -- [ ] **Step 1: 删除 `cmd_select.cc`** - -```bash -rm cpp/tools/commands/cmd_select.cc -``` - -- [ ] **Step 2: 从 `commands.h` 删除 `cmd_select` 声明** - -删除 `cpp/tools/commands/commands.h` 中这段: - -```cpp -int cmd_select(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -``` - -- [ ] **Step 3: 把 `select` E2E 改写为 `cat`** - -在 `cpp/test/tools/command_e2e_test.cc` 中,将 `SelectWithTimeRange` 改为: - -```cpp -TEST(CliE2E, CatWithTimeRange) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "2", "--end", - "3", "-f", "tsv", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "time\ts1\n2\t20\n3\t30\n"); -} -``` - -将 `SelectJsonIsNdjson` 改为: - -```cpp -TEST(CliE2E, CatJsonIsNdjson) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"cat", "-m", "s1", "--start", "0", "--end", - "0", "-f", "json", f.path}, - out, err); - EXPECT_EQ(code, 0); - EXPECT_EQ(out.str(), "{\"time\":0,\"s1\":0}\n"); -} -``` - -- [ ] **Step 4: 修正过时的解析测试命令名** - -在 `cpp/test/tools/cli_args_test.cc` 的 `MeasurementsSplitOnComma` 中,把命令从 -`select` 改为 `cat`(仅 cosmetic,`parse_args` 不校验命令名): - -```cpp -TEST(ParseArgsTest, MeasurementsSplitOnComma) { - auto p = tsfile_cli::parse_args({"cat", "-m", "s1,s2,s3", "data.tsfile"}); - ASSERT_EQ(p.measurements.size(), 3u); - EXPECT_EQ(p.measurements[1], "s2"); -} -``` - -- [ ] **Step 5: 构建并运行 CLI 测试,确认基线全绿** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.* -``` - -Expected: 构建成功;选定测试全部通过。其中 `RunCliTest.SelectIsNoLongerKnownCommand`、 -`RunCliTest.NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen` 仍通过(`meta`/ -`count`/`sample` 此时仍是 stub)。 - -> 若 `CliE2E.SchemaTableMeasurementFilterOnlyShowsRequestedColumn` 等已有断言因字符串 -> 细节失败,先用 `./build/Debug/bin/tsfile-cli -f tsv ` 打印实际输出再对齐, -> fixture 的数值(ts 0..4,s1=ts*10)是固定的。 - -- [ ] **Step 6: 手动确认 `select` 已不可用、help 不含 select** - -Run: - -```bash -cd cpp && ./build/Debug/bin/tsfile-cli --help | grep -i select; echo "rc=$?" -``` - -Expected: 无输出,`rc=1`(grep 未命中);help 列出 `ls schema meta stats head cat -count sample`。 - -- [ ] **Step 7: 提交工作基线** - -```bash -git add cpp/CMakeLists.txt cpp/test/CMakeLists.txt cpp/src/file/read_file.cc \ - cpp/tools/CMakeLists.txt cpp/tools/tools_main.cc \ - cpp/tools/cli/exit_codes.h cpp/tools/cli/run_cli.h \ - cpp/tools/commands cpp/tools/format \ - cpp/test/tools/cli_test_util.h cpp/test/tools/command_e2e_test.cc \ - cpp/test/tools/output_format_test.cc cpp/test/tools/cli_args_test.cc -git commit -m "Add tsfile CLI ls/schema/stats/head/cat implementation and tests" -``` - -> 注意:`git add cpp/tools/commands` 会把已被 `rm` 的 `cmd_select.cc` 记为删除。提交前 -> `git status --short` 确认未纳入 `.codegraph/`。 - ---- - -### Task 2: 统计 helper 与 `stats` 扩展到 min/max/first/last/sum - -**Files:** -- Create: `cpp/tools/commands/stat_table.h` -- Create: `cpp/tools/commands/stat_table.cc` -- Modify: `cpp/tools/commands/cmd_stats.cc` -- Create: `cpp/test/tools/stat_table_test.cc` -- Modify: `cpp/test/tools/command_e2e_test.cc` - -- [ ] **Step 1: 写失败测试,直接覆盖统计值格式化** — `cpp/test/tools/stat_table_test.cc` - -```cpp -#include "commands/stat_table.h" - -#include - -#include "common/statistic.h" - -TEST(StatTableTest, Int64StatisticCellsContainValueSummaries) { - storage::Int64Statistic st; - st.update(1, static_cast(10)); - st.update(3, static_cast(30)); - tsfile_cli::StatisticCells cells = tsfile_cli::statistic_value_cells(&st); - EXPECT_EQ(cells.values[0], "10"); - EXPECT_EQ(cells.values[1], "30"); - EXPECT_EQ(cells.values[2], "10"); - EXPECT_EQ(cells.values[3], "30"); - EXPECT_EQ(cells.values[4], "40"); - EXPECT_EQ(cells.is_null, - std::vector({false, false, false, false, false})); -} - -TEST(StatTableTest, BooleanStatisticLeavesMinMaxNull) { - storage::BooleanStatistic st; - st.update(1, true); - st.update(2, false); - tsfile_cli::StatisticCells cells = tsfile_cli::statistic_value_cells(&st); - EXPECT_TRUE(cells.is_null[0]); - EXPECT_TRUE(cells.is_null[1]); - EXPECT_EQ(cells.values[2], "true"); - EXPECT_EQ(cells.values[3], "false"); - EXPECT_EQ(cells.values[4], "1"); -} -``` - -- [ ] **Step 2: 运行测试确认失败** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug -``` - -Expected: 构建失败,因为 `commands/stat_table.h` 不存在。 - -- [ ] **Step 3: 创建 `cpp/tools/commands/stat_table.h`**(前置 license 头) - -```cpp -#ifndef TSFILE_CLI_STAT_TABLE_H -#define TSFILE_CLI_STAT_TABLE_H - -#include -#include - -#include "cli/cli_args.h" - -namespace storage { -class Statistic; -class TsFileReader; -} // namespace storage - -namespace tsfile_cli { - -struct StatisticCells { - std::vector values; - std::vector is_null; -}; - -struct SeriesStatRow { - std::string target; - std::string measurement; - long long count = 0; - long long start_time = 0; - long long end_time = 0; - StatisticCells value_cells; -}; - -struct FileSummary { - std::string file; - std::string model; - long long device_count = 0; - long long table_count = 0; - long long series_count = 0; - long long start_time = 0; - long long end_time = 0; - bool has_time_range = false; - long long file_size_bytes = 0; -}; - -StatisticCells statistic_value_cells(storage::Statistic* st); -std::vector collect_series_stats(const ParsedArgs& args, - storage::TsFileReader& reader); -FileSummary collect_file_summary(const ParsedArgs& args, - storage::TsFileReader& reader); - -} // namespace tsfile_cli - -#endif // TSFILE_CLI_STAT_TABLE_H -``` - -- [ ] **Step 4: 创建 `cpp/tools/commands/stat_table.cc`**(前置 license 头) - -```cpp -#include "commands/stat_table.h" - -#include -#include -#include -#include - -#include "commands/commands.h" -#include "common/statistic.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { -namespace { - -template -std::string value_to_string(T value) { - std::ostringstream ss; - ss << value; - return ss.str(); -} - -std::string bool_to_string(bool value) { return value ? "true" : "false"; } - -std::string string_to_std(const common::String& value) { - return value.to_std_string(); -} - -long long file_size(const std::string& path) { - std::ifstream in(path.c_str(), std::ios::binary | std::ios::ate); - if (!in.good()) { - return 0; - } - return static_cast(in.tellg()); -} - -} // namespace - -StatisticCells statistic_value_cells(storage::Statistic* st) { - StatisticCells cells; - cells.values.assign(5, ""); - cells.is_null.assign(5, true); - if (st == nullptr || st->get_count() == 0) { - return cells; - } - - switch (st->get_type()) { - case common::BOOLEAN: { - auto* s = static_cast(st); - cells.values = {"", "", bool_to_string(s->first_value_), - bool_to_string(s->last_value_), - value_to_string(s->sum_value_)}; - cells.is_null = {true, true, false, false, false}; - break; - } - case common::INT32: - case common::DATE: { - auto* s = static_cast(st); - cells.values = {value_to_string(s->min_value_), - value_to_string(s->max_value_), - value_to_string(s->first_value_), - value_to_string(s->last_value_), - value_to_string(s->sum_value_)}; - cells.is_null = {false, false, false, false, false}; - break; - } - case common::INT64: - case common::TIMESTAMP: { - auto* s = static_cast(st); - cells.values = {value_to_string(s->min_value_), - value_to_string(s->max_value_), - value_to_string(s->first_value_), - value_to_string(s->last_value_), - value_to_string(s->sum_value_)}; - cells.is_null = {false, false, false, false, false}; - break; - } - case common::FLOAT: { - auto* s = static_cast(st); - cells.values = {value_to_string(s->min_value_), - value_to_string(s->max_value_), - value_to_string(s->first_value_), - value_to_string(s->last_value_), - value_to_string(s->sum_value_)}; - cells.is_null = {false, false, false, false, false}; - break; - } - case common::DOUBLE: { - auto* s = static_cast(st); - cells.values = {value_to_string(s->min_value_), - value_to_string(s->max_value_), - value_to_string(s->first_value_), - value_to_string(s->last_value_), - value_to_string(s->sum_value_)}; - cells.is_null = {false, false, false, false, false}; - break; - } - case common::STRING: { - auto* s = static_cast(st); - cells.values = {string_to_std(s->min_value_), - string_to_std(s->max_value_), - string_to_std(s->first_value_), - string_to_std(s->last_value_), ""}; - cells.is_null = {false, false, false, false, true}; - break; - } - case common::TEXT: { - auto* s = static_cast(st); - cells.values = {"", "", string_to_std(s->first_value_), - string_to_std(s->last_value_), ""}; - cells.is_null = {true, true, false, false, true}; - break; - } - default: - break; - } - return cells; -} - -std::vector collect_series_stats(const ParsedArgs& args, - storage::TsFileReader& reader) { - std::vector rows; - storage::DeviceTimeseriesMetadataMap meta = - reader.get_timeseries_metadata(); - for (auto& kv : meta) { - std::string target = kv.first ? kv.first->get_device_name() : ""; - if (!args.device.empty() && target != args.device) { - continue; - } - if (!args.table.empty() && kv.first && - kv.first->get_table_name() != args.table) { - continue; - } - for (auto& ts : kv.second) { - if (!ts) { - continue; - } - std::string measurement = - ts->get_measurement_name().to_std_string(); - if (!args.measurements.empty() && - std::find(args.measurements.begin(), args.measurements.end(), - measurement) == args.measurements.end()) { - continue; - } - storage::Statistic* st = ts->get_statistic(); - SeriesStatRow row; - row.target = target; - row.measurement = measurement; - if (st != nullptr) { - row.count = st->get_count(); - row.start_time = st->start_time_; - row.end_time = st->end_time_; - row.value_cells = statistic_value_cells(st); - } else { - row.value_cells.values.assign(5, ""); - row.value_cells.is_null.assign(5, true); - } - rows.push_back(row); - } - } - return rows; -} - -FileSummary collect_file_summary(const ParsedArgs& args, - storage::TsFileReader& reader) { - FileSummary s; - s.file = args.file; - s.model = is_table_model(args, reader) ? "table" : "tree"; - s.device_count = - static_cast(reader.get_all_device_ids().size()); - s.table_count = - static_cast(reader.get_all_table_schemas().size()); - s.file_size_bytes = file_size(args.file); - - ParsedArgs all = args; - all.device.clear(); - all.table.clear(); - all.measurements.clear(); - std::vector rows = collect_series_stats(all, reader); - s.series_count = static_cast(rows.size()); - long long min_start = std::numeric_limits::max(); - long long max_end = std::numeric_limits::min(); - for (const SeriesStatRow& row : rows) { - if (row.count <= 0) { - continue; - } - min_start = std::min(min_start, row.start_time); - max_end = std::max(max_end, row.end_time); - s.has_time_range = true; - } - if (s.has_time_range) { - s.start_time = min_start; - s.end_time = max_end; - } - return s; -} - -} // namespace tsfile_cli -``` - -> **编译风险提示**:上面对 `storage::Statistic` 子类字段(`min_value_`、`max_value_`、 -> `first_value_`、`last_value_`、`sum_value_`、`start_time_`、`end_time_`)和访问器 -> (`get_count()`、`get_type()`、`get_statistic()`)的引用,应在编译失败时对照 -> `cpp/src/common/statistic.h` 校正名称,不要改测试期望值。 - -- [ ] **Step 5: 用 helper 改写 `cmd_stats.cc`,输出 10 列** - -将 `cpp/tools/commands/cmd_stats.cc` 整个命令体替换为: - -```cpp -#include -#include - -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "commands/stat_table.h" - -namespace tsfile_cli { - -int cmd_stats(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - RowWriter w(out, fmt, - {"target", "measurement", "count", "start_time", "end_time", - "min", "max", "first", "last", "sum"}, - {common::STRING, common::STRING, common::INT64, common::INT64, - common::INT64, common::STRING, common::STRING, common::STRING, - common::STRING, common::STRING}, - args.no_header); - - std::vector rows = collect_series_stats(args, reader); - for (const SeriesStatRow& row : rows) { - std::vector cells = { - row.target, row.measurement, std::to_string(row.count), - std::to_string(row.start_time), std::to_string(row.end_time)}; - cells.insert(cells.end(), row.value_cells.values.begin(), - row.value_cells.values.end()); - - std::vector nulls = {false, false, false, row.count == 0, - row.count == 0}; - nulls.insert(nulls.end(), row.value_cells.is_null.begin(), - row.value_cells.is_null.end()); - w.write(cells, nulls); - } - w.finish(); - return kExitOk; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 6: 更新 `stats` E2E 断言表头与值** - -在 `cpp/test/tools/command_e2e_test.cc` 中,把 `StatsReportsCountAndTimeRange` 的两条 -`EXPECT_NE` 替换为: - -```cpp - EXPECT_NE(out.str().find("target\tmeasurement\tcount\tstart_time\tend_" - "time\tmin\tmax\tfirst\tlast\tsum"), - std::string::npos); - EXPECT_NE(out.str().find("s1\t5\t0\t4\t0\t40\t0\t40\t100"), - std::string::npos); -``` - -- [ ] **Step 7: 构建并运行测试确认通过** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=StatTableTest.*:CliE2E.StatsReportsCountAndTimeRange -``` - -Expected: 构建成功;选定测试通过。 - -- [ ] **Step 8: 提交** - -```bash -git add cpp/tools/commands/stat_table.h cpp/tools/commands/stat_table.cc \ - cpp/tools/commands/cmd_stats.cc cpp/test/tools/stat_table_test.cc \ - cpp/test/tools/command_e2e_test.cc -git commit -m "Extend tsfile stats with value summaries and shared stat helpers" -``` - ---- - -### Task 3: 实现 `meta` - -**Files:** -- Create: `cpp/tools/commands/cmd_meta.cc` -- Modify: `cpp/tools/commands/commands.h` -- Modify: `cpp/tools/cli/run_cli.cc` -- Modify: `cpp/test/tools/command_e2e_test.cc` -- Modify: `cpp/test/tools/cli_args_test.cc` - -- [ ] **Step 1: 写失败 E2E 测试** - -在 `cpp/test/tools/command_e2e_test.cc` 末尾追加: - -```cpp -TEST(CliE2E, MetaReportsFileSummary) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"meta", "-f", "tsv", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_TRUE(err.str().empty()); - EXPECT_NE(out.str().find("file\tmodel\tversion\tdevice_count\ttable_" - "count\tseries_count\tstart_time\tend_time\tbloom_" - "filter\tfile_size_bytes"), - std::string::npos); - EXPECT_NE(out.str().find("\ttable\t"), std::string::npos); -} -``` - -- [ ] **Step 2: 把 `meta` 从「未实现」集合移除** - -在 `cpp/test/tools/cli_args_test.cc` 的 -`NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen` 中,把循环范围从 -`{"meta", "count", "sample"}` 改为: - -```cpp - for (const char* command : {"count", "sample"}) { -``` - -- [ ] **Step 3: 运行测试确认失败** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary -``` - -Expected: 测试失败——`meta` 仍被 `is_unimplemented_command` 拦截,返回退出码 1。 - -- [ ] **Step 4: 声明 `cmd_meta`** - -在 `cpp/tools/commands/commands.h` 的 `cmd_schema` 声明之后加入: - -```cpp -int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -``` - -- [ ] **Step 5: 创建 `cpp/tools/commands/cmd_meta.cc`**(前置 license 头) - -```cpp -#include "commands/commands.h" - -#include "cli/exit_codes.h" -#include "commands/stat_table.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - RowWriter w(out, fmt, - {"file", "model", "version", "device_count", "table_count", - "series_count", "start_time", "end_time", "bloom_filter", - "file_size_bytes"}, - {common::STRING, common::STRING, common::STRING, common::INT64, - common::INT64, common::INT64, common::INT64, common::INT64, - common::STRING, common::INT64}, - args.no_header); - - FileSummary s = collect_file_summary(args, reader); - w.write({s.file, s.model, "", std::to_string(s.device_count), - std::to_string(s.table_count), std::to_string(s.series_count), - s.has_time_range ? std::to_string(s.start_time) : "", - s.has_time_range ? std::to_string(s.end_time) : "", "", - std::to_string(s.file_size_bytes)}, - {false, false, true, false, false, false, !s.has_time_range, - !s.has_time_range, true, false}); - w.finish(); - return kExitOk; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 6: 在 `run_cli.cc` 中放开 `meta` 并加入分发** - -在 `cpp/tools/cli/run_cli.cc` 中: - -1. 把 `is_unimplemented_command` 的集合改为: - -```cpp - static const std::set kCmds = {"count", "sample"}; -``` - -2. 在分发链的 `cmd_schema` 分支之后、`cmd_stats` 分支之前插入: - -```cpp - } else if (p.command == "meta") { - code = cmd_meta(p, reader, fmt, out, err); -``` - -- [ ] **Step 7: 构建并运行测试确认通过** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.MetaReportsFileSummary:RunCliTest.NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen -``` - -Expected: 构建成功;两个测试通过。 - -- [ ] **Step 8: 提交** - -```bash -git add cpp/tools/commands/cmd_meta.cc cpp/tools/commands/commands.h \ - cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc \ - cpp/test/tools/cli_args_test.cc -git commit -m "Add tsfile meta command" -``` - ---- - -### Task 4: 实现 `count` - -**Files:** -- Create: `cpp/tools/commands/cmd_count.cc` -- Modify: `cpp/tools/commands/commands.h` -- Modify: `cpp/tools/cli/run_cli.cc` -- Modify: `cpp/test/tools/command_e2e_test.cc` -- Modify: `cpp/test/tools/cli_args_test.cc` - -- [ ] **Step 1: 写失败 E2E 测试** - -在 `cpp/test/tools/command_e2e_test.cc` 末尾追加: - -```cpp -TEST(CliE2E, CountReportsSeriesCountsAndTotal) { - Fixture f; - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli({"count", "-f", "tsv", f.path}, out, err); - EXPECT_EQ(code, 0); - EXPECT_TRUE(err.str().empty()); - EXPECT_NE(out.str().find("target\tmeasurement\tcount"), std::string::npos); - EXPECT_NE(out.str().find("\ts1\t5"), std::string::npos); - EXPECT_NE(out.str().find("total\t\t"), std::string::npos); -} -``` - -- [ ] **Step 2: 把 `count` 从「未实现」集合移除** - -在 `cpp/test/tools/cli_args_test.cc` 的 -`NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen` 中,把循环范围改为: - -```cpp - for (const char* command : {"sample"}) { -``` - -- [ ] **Step 3: 运行测试确认失败** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.CountReportsSeriesCountsAndTotal -``` - -Expected: 测试失败——`count` 仍被拦截。 - -- [ ] **Step 4: 声明 `cmd_count`** - -在 `cpp/tools/commands/commands.h` 的 `cmd_meta` 声明之后加入: - -```cpp -int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -``` - -- [ ] **Step 5: 创建 `cpp/tools/commands/cmd_count.cc`**(前置 license 头) - -```cpp -#include "commands/commands.h" - -#include "cli/exit_codes.h" -#include "commands/stat_table.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -int cmd_count(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { - RowWriter w(out, fmt, {"target", "measurement", "count"}, - {common::STRING, common::STRING, common::INT64}, - args.no_header); - - long long total = 0; - std::vector rows = collect_series_stats(args, reader); - for (const SeriesStatRow& row : rows) { - total += row.count; - w.write({row.target, row.measurement, std::to_string(row.count)}, - {false, false, false}); - } - w.write({"total", "", std::to_string(total)}, {false, true, false}); - w.finish(); - return kExitOk; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 6: 在 `run_cli.cc` 中放开 `count` 并加入分发** - -在 `cpp/tools/cli/run_cli.cc` 中: - -1. 把 `is_unimplemented_command` 的集合改为: - -```cpp - static const std::set kCmds = {"sample"}; -``` - -2. 在分发链的 `cmd_cat` 分支之后插入: - -```cpp - } else if (p.command == "count") { - code = cmd_count(p, reader, fmt, out, err); -``` - -- [ ] **Step 7: 构建并运行测试确认通过** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.CountReportsSeriesCountsAndTotal:RunCliTest.NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen -``` - -Expected: 构建成功;两个测试通过。 - -- [ ] **Step 8: 提交** - -```bash -git add cpp/tools/commands/cmd_count.cc cpp/tools/commands/commands.h \ - cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc \ - cpp/test/tools/cli_args_test.cc -git commit -m "Add tsfile count command" -``` - ---- - -### Task 5: 实现确定性 `sample`,并彻底移除「未实现」拦截 - -**Files:** -- Modify: `cpp/tools/format/result_set_format.h` -- Modify: `cpp/tools/format/result_set_format.cc` -- Create: `cpp/tools/commands/cmd_sample.cc` -- Modify: `cpp/tools/commands/commands.h` -- Modify: `cpp/tools/cli/run_cli.cc` -- Modify: `cpp/test/tools/command_e2e_test.cc` -- Modify: `cpp/test/tools/cli_args_test.cc` - -- [ ] **Step 1: 写失败 E2E 测试** - -在 `cpp/test/tools/command_e2e_test.cc` 末尾追加: - -```cpp -TEST(CliE2E, SampleIsReproducibleWithSeed) { - Fixture f; - std::ostringstream out1; - std::ostringstream err1; - std::ostringstream out2; - std::ostringstream err2; - - int code1 = tsfile_cli::run_cli( - {"sample", "-m", "s1", "-n", "3", "--seed", "7", "-f", "tsv", f.path}, - out1, err1); - int code2 = tsfile_cli::run_cli( - {"sample", "-m", "s1", "-n", "3", "--seed", "7", "-f", "tsv", f.path}, - out2, err2); - - EXPECT_EQ(code1, 0); - EXPECT_EQ(code2, 0); - EXPECT_TRUE(err1.str().empty()); - EXPECT_TRUE(err2.str().empty()); - EXPECT_EQ(out1.str(), out2.str()); - EXPECT_EQ(count_lines(out1.str()), 4u); - EXPECT_NE(out1.str().find("time\ts1\n"), std::string::npos); -} -``` - -- [ ] **Step 2: 删除 `cli_args_test.cc` 中的「未实现」测试** - -`meta`/`count`/`sample` 都将实现,删除 `cpp/test/tools/cli_args_test.cc` 中整个 -`TEST(RunCliTest, NewCommandsAreExplicitlyUnimplementedBeforeReaderOpen) { ... }`。 - -- [ ] **Step 3: 运行测试确认失败** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed -``` - -Expected: 测试失败——`sample` 仍被拦截。 - -- [ ] **Step 4: 声明 sampled writer** - -在 `cpp/tools/format/result_set_format.h` 中,`write_result_set` 声明之后追加: - -```cpp -int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, - bool no_header, std::ostream& out, long long limit, - unsigned long long seed); -``` - -- [ ] **Step 5: 实现 sampled writer** - -在 `cpp/tools/format/result_set_format.cc` 顶部 include 区加入: - -```cpp -#include -``` - -在 `write_result_set` 定义之后追加: - -```cpp -namespace { - -struct BufferedRow { - std::vector cells; - std::vector nulls; -}; - -BufferedRow read_current_row(storage::ResultSet* rs, - const std::vector& types) { - BufferedRow row; - const uint32_t ncol = static_cast(types.size()); - row.cells.assign(ncol, ""); - row.nulls.assign(ncol, false); - for (uint32_t i = 1; i <= ncol; ++i) { - if (rs->is_null(i)) { - row.nulls[i - 1] = true; - } else { - row.cells[i - 1] = cell_to_string(rs, i, types[i - 1]); - } - } - return row; -} - -} // namespace - -int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, - bool no_header, std::ostream& out, long long limit, - unsigned long long seed) { - if (limit < 0) { - limit = 10; - } - auto meta = rs->get_metadata(); - const uint32_t ncol = meta->get_column_count(); - std::vector header; - std::vector types; - header.reserve(ncol); - types.reserve(ncol); - for (uint32_t i = 1; i <= ncol; ++i) { - header.push_back(meta->get_column_name(i)); - types.push_back(meta->get_column_type(i)); - } - - std::vector reservoir; - reservoir.reserve(static_cast(limit)); - std::mt19937_64 rng(seed); - bool has_next = false; - int code = common::E_OK; - long long seen = 0; - while ((code = rs->next(has_next)) == common::E_OK && has_next) { - BufferedRow row = read_current_row(rs, types); - if (limit == 0) { - ++seen; - continue; - } - if (static_cast(reservoir.size()) < limit) { - reservoir.push_back(row); - } else { - std::uniform_int_distribution dist(0, seen); - long long idx = dist(rng); - if (idx < limit) { - reservoir[static_cast(idx)] = row; - } - } - ++seen; - } - - RowWriter writer(out, fmt, header, types, no_header); - for (const BufferedRow& row : reservoir) { - writer.write(row.cells, row.nulls); - } - writer.finish(); - return code; -} -``` - -- [ ] **Step 6: 声明 `cmd_sample`** - -在 `cpp/tools/commands/commands.h` 的 `cmd_count` 声明之后加入: - -```cpp -int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err); -``` - -- [ ] **Step 7: 创建 `cpp/tools/commands/cmd_sample.cc`**(前置 license 头) - -```cpp -#include "commands/commands.h" - -#include -#include -#include -#include - -#include "cli/exit_codes.h" -#include "common/device_id.h" -#include "common/schema.h" -#include "format/result_set_format.h" -#include "reader/tsfile_reader.h" - -namespace tsfile_cli { - -int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, - OutputFormat fmt, std::ostream& out, std::ostream& err) { - const int64_t start = args.has_start ? static_cast(args.start) - : std::numeric_limits::min(); - const int64_t end = args.has_end ? static_cast(args.end) - : std::numeric_limits::max(); - storage::ResultSet* rs = nullptr; - int qret = 0; - - if (is_table_model(args, reader)) { - std::string table_name = args.table; - if (table_name.empty()) { - auto schemas = reader.get_all_table_schemas(); - if (schemas.empty() || !schemas[0]) { - err << "Error: no table found in file\n"; - return kExitRuntime; - } - table_name = schemas[0]->get_table_name(); - } - std::vector cols = args.measurements; - if (cols.empty()) { - auto ts = reader.get_table_schema(table_name); - if (ts) { - cols = ts->get_measurement_names(); - } - } - qret = reader.query(table_name, cols, start, end, rs); - } else { - std::vector devices; - if (!args.device.empty()) { - devices.push_back(args.device); - } else { - for (auto& d : reader.get_all_device_ids()) { - if (d) { - devices.push_back(d->get_device_name()); - } - } - } - std::vector paths; - for (const std::string& dev : devices) { - std::vector ms = args.measurements; - if (ms.empty()) { - auto did = std::make_shared(dev); - std::vector sch; - if (reader.get_timeseries_schema(did, sch) == 0) { - for (auto& m : sch) { - ms.push_back(m.measurement_name_); - } - } - } - for (const std::string& m : ms) { - paths.push_back(dev + "." + m); - } - } - if (paths.empty()) { - err << "Error: no time series found\n"; - return kExitRuntime; - } - qret = reader.query(paths, start, end, rs); - } - - if (qret != 0 || rs == nullptr) { - err << "Error: query failed (code " << qret << ")\n"; - if (rs != nullptr) { - reader.destroy_query_data_set(rs); - } - return kExitRuntime; - } - - const long long limit = args.limit < 0 ? 10 : args.limit; - const unsigned long long seed = - args.has_seed ? static_cast(args.seed) : 0ULL; - int wret = - write_result_set_sampled(rs, fmt, args.no_header, out, limit, seed); - reader.destroy_query_data_set(rs); - return wret == 0 ? kExitOk : kExitRuntime; -} - -} // namespace tsfile_cli -``` - -> `cmd_sample` 的 query 构造与 `commands/row_query.cc::run_row_query` 几乎相同;二者唯一 -> 差异是 `sample` 走 `write_result_set_sampled`、不接受 `--offset`。先保持各自独立, -> 待第二个真实共享点出现再抽取——不要为消除这一处重复提前抽象(YAGNI)。 - -- [ ] **Step 8: 移除「未实现」拦截,加入 `sample` 分发** - -在 `cpp/tools/cli/run_cli.cc` 中: - -1. 删除整个 `is_unimplemented_command` 函数定义。 - -2. 删除 `run_cli` 中调用它的守卫块: - -```cpp - if (is_unimplemented_command(p.command)) { - err << "Error: command not implemented yet: " << p.command << "\n"; - print_usage(err); - return kExitUsage; - } -``` - -3. 在分发链的 `cmd_count` 分支之后插入: - -```cpp - } else if (p.command == "sample") { - code = cmd_sample(p, reader, fmt, out, err); -``` - -- [ ] **Step 9: 构建并运行测试确认通过** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.SampleIsReproducibleWithSeed:CliE2E.*:RunCliTest.* -``` - -Expected: 构建成功;`sample` 可复现测试与全部 CLI 测试通过;不再有 `RunCliTest` -引用 `command not implemented yet`。 - -- [ ] **Step 10: 提交** - -```bash -git add cpp/tools/format/result_set_format.h cpp/tools/format/result_set_format.cc \ - cpp/tools/commands/cmd_sample.cc cpp/tools/commands/commands.h \ - cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc \ - cpp/test/tools/cli_args_test.cc -git commit -m "Add deterministic tsfile sample command" -``` - ---- - -### Task 6: 全量验证、help 快照与最终检查 - -**Files:** -- Modify: `docs/superpowers/plans/2026-06-02-tsfile-cli.md`(仅当执行中需修正执行笔记)。 - -- [ ] **Step 1: 跑完整 CLI 相关测试** - -Run: - -```bash -cd cpp && bash build.sh -t=Debug && ./build/Debug/lib/TsFile_Test --gtest_filter=CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:ResolveFormatTest.*:CsvEscapeTest.*:JsonEscapeTest.*:TypeNameTest.*:EncodingNameTest.*:CompressionNameTest.*:StatTableTest.* -``` - -Expected: 构建成功;选定测试全部通过。 - -- [ ] **Step 2: 跑完整 C++ 测试可执行文件** - -Run: - -```bash -cd cpp && ./build/Debug/lib/TsFile_Test -``` - -Expected: 全部通过。若有与本计划无关的既有测试失败,记录确切失败名与输出,再决定是否 -缩小验证范围。 - -- [ ] **Step 3: 手动检查 help 与命令面** - -Run: - -```bash -cd cpp && ./build/Debug/bin/tsfile-cli --help -``` - -Expected: stdout 含 `ls schema meta stats head cat count sample`;不含 `select`、 -不含 “not implemented”。 - -- [ ] **Step 4: 针对自带样例手动冒烟** - -Run(样例为 table 模型): - -```bash -cd cpp -BIN=./build/Debug/bin/tsfile-cli -F=examples/test_cpp.tsfile -$BIN ls -f tsv $F -$BIN meta -f tsv $F -$BIN stats -f tsv $F -$BIN count -f tsv $F -$BIN head -n 3 -f tsv $F -$BIN sample -m s1 -n 3 --seed 7 -f tsv $F -echo "missing file:"; $BIN ls nope.tsfile; echo "rc=$?" -``` - -Expected: 数据在 stdout、诊断在 stderr;`ls nope.tsfile` 退出码 2 且错误在 stderr。 - -- [ ] **Step 5: 格式化与暂存范围检查** - -Run: - -```bash -cd /Users/zhanghongyin/iotdb/tsfile && ./mvnw spotless:apply -P with-cpp 2>&1 | tail -5 && ./mvnw spotless:check -P with-cpp 2>&1 | tail -5 -git diff --check -git status --short -``` - -Expected: clang-format 干净通过;`git diff --check` 退出 0;`git status --short` 仅含本 -CLI 工作,`.codegraph/` 等无关项未被暂存。 - -- [ ] **Step 6: 最终提交(如有格式化/笔记改动)** - -```bash -git add -u cpp/tools cpp/test/tools -git commit -m "Format tsfile CLI sources" -``` - -若本任务未产生文件改动,不创建空提交。 - -## 覆盖检查(plan self-review) - -| Spec 要求 | 对应 | -|---|---| -| 单 `tsfile` 二进制、git 式子命令分发 | 已实现(基线,Task 1 提交) | -| `ls`/`schema`/`head`/`cat` | 已实现(基线,Task 1 提交) | -| `select` 删除(动词 + 死代码) | 命令面已删(`a392a56f`);死代码 + 测试 Task 1 | -| `stats` 扩展 min/max/first/last/sum | Task 2 | -| `meta` | Task 3 | -| `count` | Task 4 | -| `sample` 与 `--seed` 可复现 | `--seed` 解析已提交;writer + 命令 Task 5 | -| 共享参数:投影/时间范围/limit/offset | 基线 `row_query.cc` 已实现;Task 1 `cat` E2E 覆盖时间范围 | -| 输出格式 csv/tsv/json/table、stdout/stderr 分离 | 基线 formatter + `read_file.cc` 改动,Task 1 提交、Task 6 验证 | -| tree/table 自动检测 + `--model` | 基线 `is_table_model`;`stats`/`count`/`meta` 经 `collect_*` 支持作用域 | -| 退出码 0/1/2/3 | `exit_codes.h`(基线);各命令返回值 | -| `BUILD_TOOLS` + `install()` | 基线 `cpp/tools/CMakeLists.txt`,Task 1 提交 | - -**占位扫描**:无 `TBD`/`TODO`/“implement later”。`run_cli.cc` 的 -`is_unimplemented_command` 拦截在 Task 3/4/5 逐步收窄并于 Task 5 完全删除。 - -**类型一致性**:`ParsedArgs`、`OutputFormat`、`RowWriter(out, fmt, header, types, -no_header)`、`write_result_set(rs, fmt, no_header, out, offset, limit)`、 -`write_result_set_sampled(rs, fmt, no_header, out, limit, seed)`、 -`collect_series_stats`/`collect_file_summary`/`statistic_value_cells`、各 -`cmd_*(args, reader, fmt, out, err)` 签名在各任务间一致。 - -**已知残留风险(执行中验证,非阻塞)**: -1. `storage::Statistic` 子类字段/访问器名称——Task 2 Step 4 注明编译失败时对照 - `cpp/src/common/statistic.h` 校正,不改测试期望。 -2. 行/列顺序导致的 E2E 字符串断言——用 `tsfile -f tsv ` 打印实际输出对齐; - fixture 数值(ts 0..4,s1=ts*10)固定。 -3. table 模型下 `get_timeseries_metadata()` 是否为每序列返回统计量——若 `meta`/`count`/ - `stats` 行数为空,对照基线 `cmd_schema.cc` 已验证的 metadata 读取路径排查。 diff --git a/docs/superpowers/plans/2026-06-03-tsfile-cli-write.md b/docs/superpowers/plans/2026-06-03-tsfile-cli-write.md deleted file mode 100644 index 8f848ea1d..000000000 --- a/docs/superpowers/plans/2026-06-03-tsfile-cli-write.md +++ /dev/null @@ -1,947 +0,0 @@ - - -# TsFile CLI 写入(`tsfile-cli write`)Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** 给 `tsfile-cli` 增加一个 `write` 命令,把 CSV/TSV 行数据导入成一个新的 table -模型 `.tsfile`(显式 `--columns`,零类型推断)。 - -**Architecture:** 新增纯解析层 `format/input_format.*`(列规格 / 行切分 / 类型名解析,重 -单测);`cli_args` 增 `-o/--output`、`--columns`、`-v/--verbose`、`--header-match`; -`commands/cmd_write.cc` 串起「读输入 → 构 `TableSchema`/`Tablet` → `TsFileTableWriter` -写出」;`run_cli` 把 `write` 注册为**第一个不打开 `TsFileReader` 的命令**,在 reader.open -之前特判分发。不修改存储引擎。 - -**Tech Stack:** C++11/14(测试目标 `-std=c++14`),CMake `BUILD_TOOLS`,Google Test, -现有 `storage::TsFileTableWriter`、`storage::TableSchema`、`common::ColumnSchema`、 -`storage::Tablet`、`storage::WriteFile`。 - -**Spec:** `docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md` - ---- - -## 执行前提 - -- 工作目录 `/Users/zhanghongyin/iotdb/tsfile`;git 操作从仓库根运行;不要暂存 `.codegraph/` - 或测试产生的 `cpp/*.tsfile`/`*.dat` 临时文件。 -- 新建 `.h`/`.cc` 前置 Apache 2.0 块注释头(从任一 `cpp/tools/**` 文件复制)。 -- **构建/测试**(本机 CMake 4.x 与 bundled ANTLR4 旧 policy 冲突,必须 `--disable-antlr4`; - `build.sh` 默认 `build_test=0` 无开关,执行期间临时 `sed -i '' 's/^build_test=0/build_test=1/' - cpp/build.sh`,Task 4 收尾 `git checkout cpp/build.sh` 还原): - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 -./build/Debug/test/lib/TsFile_Test --gtest_filter='InputFormatTest.*:ParseArgsTest.*:RunCliTest.*:CliE2E.*' -``` - -- 构建退出码 2(`make install` 拷 libsnappy 到 `/usr/local/lib` 权限不足)属预期,编译/链接 - 与测试不受影响;判定成功看 `grep -c "Built target TsFile_Test"` 与测试结果。 - -## 已核验的 SDK 事实(编译依据) - -- `enum class common::ColumnCategory { TAG, FIELD, ATTRIBUTE, TIME }`(`utils/db_utils.h`)。 -- `common::ColumnSchema(name, TSDataType, CompressionType, TSEncoding, ColumnCategory)`(`common/schema.h`)。 -- `storage::TableSchema(table_name, std::vector)`——**会把表名转小写**。 -- `storage::TsFileTableWriter(storage::WriteFile*, storage::TableSchema*)`(模板 ctor,附加参数有默认值)。 -- `int TsFileTableWriter::write_table(Tablet&) const` / `int flush()` / `int close()`。 -- `int WriteFile::create(const std::string&, int flags, mode_t mode)`。 -- `storage::Tablet(target_name, names, types, categories, max_rows)`; - `int add_timestamp(uint32_t, int64_t)`;`template int add_value(uint32_t, const std::string& name, T)`。 - 未 `add_value` 的单元格默认为 null。 - -## 文件结构 - -新增: -- `cpp/tools/format/input_format.h` / `.cc`:`ColumnDef`、`parse_columns_spec`、 - `split_line`、`parse_datatype_name`、`parse_category`、`parse_bool_cell`(纯层,无 reader 依赖)。 -- `cpp/tools/commands/cmd_write.cc`:`cmd_write`。 -- `cpp/test/tools/input_format_test.cc`:纯层单测。 - -修改: -- `cpp/tools/cli/cli_args.h` / `.cc`:`ParsedArgs` 增 `output/columns/verbose/header_match` 与解析。 -- `cpp/tools/commands/commands.h`:声明 `cmd_write`。 -- `cpp/tools/cli/run_cli.cc`:注册 `write`、`validate_write_flags`、reader 旁路分发、usage 文案。 -- `cpp/test/tools/cli_args_test.cc`:write 参数解析测试。 -- `cpp/test/tools/command_e2e_test.cc`:write→读回往返 E2E。 - ---- - -### Task 1: `input_format` 纯解析层 - -**Files:** -- Create: `cpp/tools/format/input_format.h` -- Create: `cpp/tools/format/input_format.cc` -- Create: `cpp/test/tools/input_format_test.cc` - -- [ ] **Step 1: 写失败单测** — `cpp/test/tools/input_format_test.cc`(前置 license 头) - -```cpp -#include "format/input_format.h" - -#include - -#include "common/db_common.h" -#include "utils/db_utils.h" - -TEST(InputFormatTest, ParseColumnsSpecValid) { - std::vector cols; - std::string err; - EXPECT_TRUE(tsfile_cli::parse_columns_spec("id1:STRING:tag,s1:INT64:field", - cols, err)); - ASSERT_EQ(cols.size(), 2u); - EXPECT_EQ(cols[0].name, "id1"); - EXPECT_EQ(cols[0].type, common::STRING); - EXPECT_EQ(cols[0].category, common::ColumnCategory::TAG); - EXPECT_EQ(cols[1].type, common::INT64); - EXPECT_EQ(cols[1].category, common::ColumnCategory::FIELD); -} - -TEST(InputFormatTest, ParseColumnsSpecCaseInsensitiveType) { - std::vector cols; - std::string err; - EXPECT_TRUE(tsfile_cli::parse_columns_spec("s1:int64:field", cols, err)); - EXPECT_EQ(cols[0].type, common::INT64); -} - -TEST(InputFormatTest, ParseColumnsSpecErrors) { - std::vector cols; - std::string err; - EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:NOPE:field", cols, err)); - EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64:bogus", cols, err)); - EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64", cols, err)); - EXPECT_FALSE(tsfile_cli::parse_columns_spec("", cols, err)); -} - -TEST(InputFormatTest, SplitLineTsv) { - std::vector f = tsfile_cli::split_line("0\t10\t20", '\t', false); - ASSERT_EQ(f.size(), 3u); - EXPECT_EQ(f[0], "0"); - EXPECT_EQ(f[2], "20"); -} - -TEST(InputFormatTest, SplitLineCsvQuotes) { - std::vector f = - tsfile_cli::split_line("1,\"a,b\",\"she \"\"hi\"\"\"", ',', true); - ASSERT_EQ(f.size(), 3u); - EXPECT_EQ(f[1], "a,b"); - EXPECT_EQ(f[2], "she \"hi\""); -} - -TEST(InputFormatTest, SplitLineEmptyFields) { - std::vector f = tsfile_cli::split_line("0,,5", ',', true); - ASSERT_EQ(f.size(), 3u); - EXPECT_EQ(f[1], ""); -} - -TEST(InputFormatTest, ParseBoolCell) { - bool b = false; - EXPECT_TRUE(tsfile_cli::parse_bool_cell("true", b)); - EXPECT_TRUE(b); - EXPECT_TRUE(tsfile_cli::parse_bool_cell("0", b)); - EXPECT_FALSE(b); - EXPECT_FALSE(tsfile_cli::parse_bool_cell("maybe", b)); -} -``` - -- [ ] **Step 2: 运行确认失败** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 -``` - -Expected: 构建失败,`format/input_format.h` 不存在。 - -- [ ] **Step 3: 创建 `cpp/tools/format/input_format.h`**(前置 license 头) - -```cpp -#ifndef TSFILE_CLI_INPUT_FORMAT_H -#define TSFILE_CLI_INPUT_FORMAT_H - -#include -#include - -#include "common/db_common.h" -#include "utils/db_utils.h" - -namespace tsfile_cli { - -struct ColumnDef { - std::string name; - common::TSDataType type; - common::ColumnCategory category; -}; - -bool parse_datatype_name(const std::string& s, common::TSDataType& out); -bool parse_category(const std::string& s, common::ColumnCategory& out); -bool parse_columns_spec(const std::string& spec, std::vector& out, - std::string& error); -std::vector split_line(const std::string& line, char delim, - bool csv_quotes); -bool parse_bool_cell(const std::string& s, bool& out); - -} // namespace tsfile_cli - -#endif // TSFILE_CLI_INPUT_FORMAT_H -``` - -- [ ] **Step 4: 创建 `cpp/tools/format/input_format.cc`**(前置 license 头) - -```cpp -#include "format/input_format.h" - -#include - -namespace tsfile_cli { - -bool parse_datatype_name(const std::string& s, common::TSDataType& out) { - std::string u; - u.reserve(s.size()); - for (char c : s) { - u += static_cast(std::toupper(static_cast(c))); - } - if (u == "BOOLEAN") { - out = common::BOOLEAN; - } else if (u == "INT32") { - out = common::INT32; - } else if (u == "INT64") { - out = common::INT64; - } else if (u == "FLOAT") { - out = common::FLOAT; - } else if (u == "DOUBLE") { - out = common::DOUBLE; - } else if (u == "STRING") { - out = common::STRING; - } else if (u == "TEXT") { - out = common::TEXT; - } else { - return false; - } - return true; -} - -bool parse_category(const std::string& s, common::ColumnCategory& out) { - if (s == "tag") { - out = common::ColumnCategory::TAG; - } else if (s == "field") { - out = common::ColumnCategory::FIELD; - } else { - return false; - } - return true; -} - -std::vector split_line(const std::string& line, char delim, - bool csv_quotes) { - std::vector out; - std::string field; - if (!csv_quotes) { - for (char c : line) { - if (c == delim) { - out.push_back(field); - field.clear(); - } else { - field += c; - } - } - out.push_back(field); - return out; - } - bool in_quotes = false; - for (size_t i = 0; i < line.size(); ++i) { - char c = line[i]; - if (in_quotes) { - if (c == '"') { - if (i + 1 < line.size() && line[i + 1] == '"') { - field += '"'; - ++i; - } else { - in_quotes = false; - } - } else { - field += c; - } - } else if (c == '"') { - in_quotes = true; - } else if (c == delim) { - out.push_back(field); - field.clear(); - } else { - field += c; - } - } - out.push_back(field); - return out; -} - -bool parse_columns_spec(const std::string& spec, std::vector& out, - std::string& error) { - out.clear(); - if (spec.empty()) { - error = "empty --columns"; - return false; - } - std::vector items = split_line(spec, ',', false); - for (const std::string& item : items) { - std::vector parts = split_line(item, ':', false); - if (parts.size() != 3) { - error = "bad column '" + item + "' (want name:TYPE:category)"; - return false; - } - ColumnDef def; - def.name = parts[0]; - if (def.name.empty()) { - error = "empty column name in '" + item + "'"; - return false; - } - if (!parse_datatype_name(parts[1], def.type)) { - error = "unknown type '" + parts[1] + "'"; - return false; - } - if (!parse_category(parts[2], def.category)) { - error = "bad category '" + parts[2] + "' (want tag|field)"; - return false; - } - out.push_back(def); - } - return true; -} - -bool parse_bool_cell(const std::string& s, bool& out) { - std::string l; - l.reserve(s.size()); - for (char c : s) { - l += static_cast(std::tolower(static_cast(c))); - } - if (l == "true" || l == "1") { - out = true; - return true; - } - if (l == "false" || l == "0") { - out = false; - return true; - } - return false; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 5: 构建并运行确认通过** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='InputFormatTest.*' -``` - -Expected: 构建成功;7 个 `InputFormatTest` 全过。 - -- [ ] **Step 6: 提交** - -```bash -git add cpp/tools/format/input_format.h cpp/tools/format/input_format.cc cpp/test/tools/input_format_test.cc -git commit -m "Add tsfile CLI input_format parsing layer" -``` - ---- - -### Task 2: `cli_args` 增加 write 参数 - -**Files:** -- Modify: `cpp/tools/cli/cli_args.h` -- Modify: `cpp/tools/cli/cli_args.cc` -- Modify: `cpp/test/tools/cli_args_test.cc` - -- [ ] **Step 1: 写失败测试** — 追加到 `cpp/test/tools/cli_args_test.cc` - -```cpp -TEST(ParseArgsTest, WriteFlagsParsed) { - auto p = tsfile_cli::parse_args({"write", "--table", "t1", "--columns", - "s1:INT64:field", "-o", "out.tsfile", "-v", - "--header-match", "in.csv"}); - EXPECT_TRUE(p.error.empty()); - EXPECT_EQ(p.command, "write"); - EXPECT_EQ(p.table, "t1"); - EXPECT_EQ(p.columns, "s1:INT64:field"); - EXPECT_EQ(p.output, "out.tsfile"); - EXPECT_TRUE(p.verbose); - EXPECT_TRUE(p.header_match); - EXPECT_EQ(p.file, "in.csv"); -} - -TEST(ParseArgsTest, OutputFlagNeedsValue) { - auto p = tsfile_cli::parse_args({"write", "-o"}); - EXPECT_FALSE(p.error.empty()); -} -``` - -- [ ] **Step 2: 运行确认失败** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 -``` - -Expected: 编译失败,`ParsedArgs` 无 `output`/`columns`/`verbose`/`header_match`。 - -- [ ] **Step 3: `cli_args.h` 增字段** — 在 `ParsedArgs` 的 `model` 字段之后加入: - -```cpp - std::string output; - std::string columns; - bool verbose = false; - bool header_match = false; -``` - -- [ ] **Step 4: `cli_args.cc` 解析** — 在 `parse_args` 循环中,把这些分支放在 `--model` - 分支之前: - -```cpp - } else if (a == "-o" || a == "--output") { - if (!need_value(a, p.output)) { - return p; - } - } else if (a == "--columns") { - if (!need_value(a, p.columns)) { - return p; - } - } else if (a == "-v" || a == "--verbose") { - p.verbose = true; - } else if (a == "--header-match") { - p.header_match = true; -``` - -- [ ] **Step 5: 构建并运行确认通过** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='ParseArgsTest.*' -``` - -Expected: 构建成功;`ParseArgsTest` 全过。 - -- [ ] **Step 6: 提交** - -```bash -git add cpp/tools/cli/cli_args.h cpp/tools/cli/cli_args.cc cpp/test/tools/cli_args_test.cc -git commit -m "Add tsfile CLI write argument parsing" -``` - ---- - -### Task 3: `cmd_write` 与 `run_cli` 接线 - -**Files:** -- Create: `cpp/tools/commands/cmd_write.cc` -- Modify: `cpp/tools/commands/commands.h` -- Modify: `cpp/tools/cli/run_cli.cc` -- Modify: `cpp/test/tools/command_e2e_test.cc` - -- [ ] **Step 1: 写失败 E2E** — 追加到 `cpp/test/tools/command_e2e_test.cc` - -```cpp -TEST(CliE2E, WriteThenReadRoundTrip) { - std::string csv_path = "tsfile_cli_write_in.csv"; - { - std::ofstream o(csv_path.c_str()); - o << "time,id1,s1\n0,dev,0\n1,dev,10\n2,dev,20\n"; - } - std::string out_path = "tsfile_cli_write_out.tsfile"; - - std::ostringstream wout; - std::ostringstream werr; - int wc = tsfile_cli::run_cli( - {"write", "--table", "t1", "--columns", "id1:STRING:tag,s1:INT64:field", - "-o", out_path, csv_path}, - wout, werr); - EXPECT_EQ(wc, 0) << werr.str(); - - std::ostringstream cout_; - std::ostringstream cerr_; - int cc = tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); - EXPECT_EQ(cc, 0); - EXPECT_NE(cout_.str().find("\ts1\t3"), std::string::npos) << cout_.str(); - - std::ostringstream rout; - std::ostringstream rerr; - int rc = tsfile_cli::run_cli({"cat", "-m", "s1", "-f", "tsv", out_path}, - rout, rerr); - EXPECT_EQ(rc, 0); - EXPECT_EQ(rout.str(), "time\ts1\n0\t0\n1\t10\n2\t20\n"); - - std::remove(csv_path.c_str()); - std::remove(out_path.c_str()); -} - -TEST(CliE2E, WriteMissingColumnsIsUsageError) { - std::ostringstream out; - std::ostringstream err; - int code = tsfile_cli::run_cli( - {"write", "--table", "t1", "-o", "x.tsfile", "in.csv"}, out, err); - EXPECT_EQ(code, 1); - EXPECT_NE(err.str().find("--columns"), std::string::npos); -} -``` - -> `command_e2e_test.cc` 顶部已 `#include `(`std::remove`);新增需要 ``, -> 若未包含则在该文件 include 区加 `#include `。 - -- [ ] **Step 2: 运行确认失败** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 -``` - -Expected: 构建或测试失败(`write` 未注册/未实现)。 - -- [ ] **Step 3: 声明 `cmd_write`** — 在 `cpp/tools/commands/commands.h` 的 `cmd_sample` - 声明之后加入(注意签名无 reader、无 OutputFormat): - -```cpp -int cmd_write(const ParsedArgs& args, std::ostream& out, std::ostream& err); -``` - -- [ ] **Step 4: 创建 `cpp/tools/commands/cmd_write.cc`**(前置 license 头) - -```cpp -#include - -#include -#include -#include -#include -#include -#include -#include - -#include "cli/cli_args.h" -#include "cli/exit_codes.h" -#include "commands/commands.h" -#include "common/schema.h" -#include "common/tablet.h" -#include "file/write_file.h" -#include "format/input_format.h" -#include "writer/tsfile_table_writer.h" - -namespace tsfile_cli { -namespace { - -struct DataRow { - long long line_no; - int64_t timestamp; - std::vector cells; -}; - -void strip_cr(std::string& s) { - if (!s.empty() && s.back() == '\r') { - s.pop_back(); - } -} - -bool add_typed_value(storage::Tablet& tablet, uint32_t row, - const ColumnDef& def, const std::string& cell, - std::string& error) { - if (cell.empty()) { - return true; // null - } - char* e = nullptr; - switch (def.type) { - case common::BOOLEAN: { - bool v = false; - if (!parse_bool_cell(cell, v)) { - error = "bad BOOLEAN '" + cell + "'"; - return false; - } - tablet.add_value(row, def.name, v); - return true; - } - case common::INT32: { - long v = std::strtol(cell.c_str(), &e, 10); - if (e == nullptr || *e != '\0') { - error = "bad INT32 '" + cell + "'"; - return false; - } - tablet.add_value(row, def.name, static_cast(v)); - return true; - } - case common::INT64: { - long long v = std::strtoll(cell.c_str(), &e, 10); - if (e == nullptr || *e != '\0') { - error = "bad INT64 '" + cell + "'"; - return false; - } - tablet.add_value(row, def.name, static_cast(v)); - return true; - } - case common::FLOAT: { - float v = std::strtof(cell.c_str(), &e); - if (e == nullptr || *e != '\0') { - error = "bad FLOAT '" + cell + "'"; - return false; - } - tablet.add_value(row, def.name, v); - return true; - } - case common::DOUBLE: { - double v = std::strtod(cell.c_str(), &e); - if (e == nullptr || *e != '\0') { - error = "bad DOUBLE '" + cell + "'"; - return false; - } - tablet.add_value(row, def.name, v); - return true; - } - case common::STRING: - case common::TEXT: { - tablet.add_value(row, def.name, cell); - return true; - } - default: - error = "unsupported column type"; - return false; - } -} - -} // namespace - -int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, - std::ostream& err) { - std::vector columns; - std::string perr; - if (!parse_columns_spec(args.columns, columns, perr)) { - err << "Error: " << perr << "\n"; - return kExitUsage; - } - - std::istream* in = &std::cin; - std::ifstream fin; - if (!args.file.empty() && args.file != "-") { - fin.open(args.file.c_str()); - if (!fin.is_open()) { - err << "Error: cannot open input: " << args.file << "\n"; - return kExitFile; - } - in = &fin; - } - - const char delim = (args.format == ParsedArgs::Format::kTsv) ? '\t' : ','; - const bool csv_quotes = (delim == ','); - - std::string line; - long long line_no = 0; - if (!args.no_header) { - if (std::getline(*in, line)) { - ++line_no; - strip_cr(line); - if (args.header_match) { - std::vector h = split_line(line, delim, csv_quotes); - bool ok = (h.size() == columns.size() + 1); - for (size_t i = 0; ok && i < columns.size(); ++i) { - if (h[i + 1] != columns[i].name) { - ok = false; - } - } - if (!ok) { - err << "Error: header does not match --columns (line 1)\n"; - return kExitRuntime; - } - } - } - } - - std::vector rows; - while (std::getline(*in, line)) { - ++line_no; - strip_cr(line); - if (line.empty()) { - continue; - } - std::vector fields = split_line(line, delim, csv_quotes); - if (fields.size() != columns.size() + 1) { - err << "Error: expected " << (columns.size() + 1) << " fields, got " - << fields.size() << " (line " << line_no << ")\n"; - return kExitRuntime; - } - char* e = nullptr; - long long ts = std::strtoll(fields[0].c_str(), &e, 10); - if (e == nullptr || *e != '\0') { - err << "Error: bad timestamp '" << fields[0] << "' (line " << line_no - << ")\n"; - return kExitRuntime; - } - DataRow r; - r.line_no = line_no; - r.timestamp = static_cast(ts); - r.cells.assign(fields.begin() + 1, fields.end()); - rows.push_back(r); - } - - std::vector names; - std::vector types; - std::vector cats; - std::vector col_schemas; - for (const ColumnDef& d : columns) { - names.push_back(d.name); - types.push_back(d.type); - cats.push_back(d.category); - col_schemas.push_back(common::ColumnSchema( - d.name, d.type, common::UNCOMPRESSED, common::PLAIN, d.category)); - } - - storage::WriteFile file; - int flags = O_WRONLY | O_CREAT | O_TRUNC; -#ifdef _WIN32 - flags |= O_BINARY; -#endif - if (file.create(args.output, flags, 0666) != 0) { - err << "Error: cannot create output: " << args.output << "\n"; - return kExitFile; - } - auto* schema = new storage::TableSchema(args.table, col_schemas); - auto* writer = new storage::TsFileTableWriter(&file, schema); - - int rc = kExitOk; - const size_t kBatch = 1024; - for (size_t start = 0; start < rows.size() && rc == kExitOk; - start += kBatch) { - size_t end = std::min(start + kBatch, rows.size()); - storage::Tablet tablet(args.table, names, types, cats, - static_cast(end - start)); - for (size_t i = start; i < end && rc == kExitOk; ++i) { - uint32_t r = static_cast(i - start); - tablet.add_timestamp(r, rows[i].timestamp); - for (size_t j = 0; j < columns.size(); ++j) { - std::string cerr; - if (!add_typed_value(tablet, r, columns[j], rows[i].cells[j], - cerr)) { - err << "Error: " << cerr << " (line " << rows[i].line_no - << ")\n"; - rc = kExitRuntime; - break; - } - } - } - if (rc == kExitOk && writer->write_table(tablet) != 0) { - err << "Error: write_table failed\n"; - rc = kExitRuntime; - } - } - - if (rc == kExitOk) { - if (writer->flush() != 0 || writer->close() != 0) { - err << "Error: flush/close failed\n"; - rc = kExitRuntime; - } - } else { - writer->close(); - } - delete writer; - delete schema; - - if (rc == kExitOk && args.verbose) { - err << "wrote " << rows.size() << " rows to " << args.output << "\n"; - } - return rc; -} - -} // namespace tsfile_cli -``` - -- [ ] **Step 5: `run_cli.cc` 注册 write + 校验 + reader 旁路** - -在 `cpp/tools/cli/run_cli.cc` 中: - -1. `is_known_command` 集合加入 `"write"`: - -```cpp - static const std::set kCmds = { - "ls", "schema", "meta", "stats", "head", - "cat", "count", "sample", "write"}; -``` - -2. 在匿名 namespace 内新增 `validate_write_flags`(放在 `validate_command_flags` 之后): - -```cpp -bool validate_write_flags(const ParsedArgs& p, std::ostream& err) { - if (p.table.empty()) { - err << "Error: write requires --table\n"; - return false; - } - if (p.columns.empty()) { - err << "Error: write requires --columns\n"; - return false; - } - if (p.output.empty()) { - err << "Error: write requires -o/--output\n"; - return false; - } - if (p.format == ParsedArgs::Format::kJson || - p.format == ParsedArgs::Format::kTable) { - err << "Error: write input format must be csv or tsv\n"; - return false; - } - if (!p.measurements.empty() || !p.device.empty() || p.has_start || - p.has_end || p.has_seed || p.limit != -1 || p.offset != 0) { - err << "Error: read-only flags are not valid for write\n"; - return false; - } - return true; -} -``` - -3. 把 usage 的 Commands 段在 `cat` 行之后加入 write(在 `count`/`sample` 行附近,保持 - 可读即可): - -```cpp - " write import CSV/TSV rows into a new table tsfile " - "(--table, --columns, -o)\n" -``` - - 并把 Options 段追加一行: - -```cpp - "Write options: --table, --columns name:TYPE:tag|field,..., -o/--output,\n" - " --header-match, -v/--verbose\n" -``` - -4. 把文件缺失检查改为对 write 放行(write 的位置参数是输入 CSV,可为 stdin): - -```cpp - if (p.command != "write" && p.file.empty()) { - err << "Error: missing argument\n"; - return kExitUsage; - } -``` - -5. 在 `validate_command_flags` 调用之后、`storage::libtsfile_init();` 之前加入 write 分发: - -```cpp - if (p.command == "write") { - if (!validate_write_flags(p, err)) { - print_usage(err); - return kExitUsage; - } - storage::libtsfile_init(); - return cmd_write(p, out, err); - } -``` - -- [ ] **Step 6: 构建并运行确认通过** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='CliE2E.WriteThenReadRoundTrip:CliE2E.WriteMissingColumnsIsUsageError' -``` - -Expected: 构建成功;两个测试通过。 - -> 若 `WriteThenReadRoundTrip` 的 `cat` 断言因列顺序/空值细节失败,先用 -> `./build/Debug/bin/tsfile-cli cat -m s1 -f tsv tsfile_cli_write_out.tsfile`(先手动跑一次 -> write)打印实际输出再对齐;count=3 与 schema 是稳的。若 `add_value`/null 行为与预期不符, -> 对照 `cpp/test/tools/cli_test_util.h`(已验证可写读的 table fixture)排查。 - -- [ ] **Step 7: 提交** - -```bash -git add cpp/tools/commands/cmd_write.cc cpp/tools/commands/commands.h cpp/tools/cli/run_cli.cc cpp/test/tools/command_e2e_test.cc -git commit -m "Add tsfile CLI write command (CSV/TSV import)" -``` - ---- - -### Task 4: 全量验证、格式化、收尾 - -**Files:** -- Modify: `docs/superpowers/plans/2026-06-03-tsfile-cli-write.md` 仅当执行中需修正执行笔记。 - -- [ ] **Step 1: 跑完整 CLI 相关测试** - -```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 && ./build/Debug/test/lib/TsFile_Test --gtest_filter='InputFormatTest.*:CliE2E.*:ParseArgsTest.*:RunCliTest.*:RowWriterTest.*:StatTableTest.*' -``` - -Expected: 构建成功;全部通过。 - -- [ ] **Step 2: 跑完整测试可执行文件** - -```bash -cd cpp && ./build/Debug/test/lib/TsFile_Test 2>&1 | tail -3 -``` - -Expected: 全部通过(无回归)。 - -- [ ] **Step 3: 手动冒烟(含 stdin 与默认静默)** - -```bash -cd cpp -BIN=./build/Debug/bin/tsfile-cli -printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' | $BIN write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o /tmp/w.tsfile -; echo "rc=$? (静默,无输出)" -$BIN write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o /tmp/w.tsfile -v <<< $'time,id1,s1\n0,dev,0' 2>&1 # -v 才有 "wrote N rows" -$BIN count -f tsv /tmp/w.tsfile -``` - -Expected: 默认无 stdout/stderr;`-v` 时 stderr 一行 `wrote ... rows`;count 回读正常。 - -- [ ] **Step 4: 格式化与暂存范围** - -```bash -cd /Users/zhanghongyin/iotdb/tsfile && clang-format -i cpp/tools/format/input_format.h cpp/tools/format/input_format.cc cpp/tools/commands/cmd_write.cc cpp/tools/cli/run_cli.cc cpp/tools/cli/cli_args.cc cpp/tools/cli/cli_args.h cpp/test/tools/input_format_test.cc cpp/test/tools/cli_args_test.cc cpp/test/tools/command_e2e_test.cc -git checkout cpp/build.sh -git diff --check -git status --short -``` - -Expected: `git diff --check` 退出 0;`build.sh` 已还原;status 仅含本次 write 工作 + 若 -clang-format 有改动则一并提交。 - -- [ ] **Step 5: 最终提交(如格式化有改动)** - -```bash -git add -u cpp/tools cpp/test/tools -git commit -m "Format tsfile CLI write sources" -``` - -若无改动则不创建空提交。 - -## 覆盖检查(plan self-review) - -| Spec 要求 | 对应 | -|---|---| -| `write` 命令、CSV/TSV → table tsfile | Task 3 | -| `--columns name:TYPE:category` 显式、零推断 | Task 1(`parse_columns_spec`)、Task 2 | -| 首列即时间、字段数校验、空=null | Task 3(`cmd_write`) | -| `-o/--output`、stdin/`-`、覆盖写 | Task 2、Task 3 | -| `-f csv|tsv`(json/table → usage error) | Task 3(`validate_write_flags`) | -| `--no-header` 默认跳表头 / `--header-match` 校验 | Task 2、Task 3 | -| 成功默认静默、`-v` 才出摘要 | Task 3(`cmd_write` 末尾),Task 4 Step 3 验证 | -| 退出码 0/1/2/3、stdout 无数据/诊断走 stderr | Task 3、Task 1 错误返回 | -| 拒绝读侧 flag | Task 3(`validate_write_flags`) | -| reader 旁路(write 不开 reader) | Task 3 Step 5 | -| 测试:列规格/行切分/类型、write→读回往返 | Task 1、Task 3 | - -**占位扫描**:无 TBD/TODO;所有代码块完整。 - -**类型一致性**:`ColumnDef{name,type,category}`、`parse_columns_spec`、`split_line`、 -`parse_bool_cell`、`cmd_write(args,out,err)`、`ParsedArgs` 的 `output/columns/verbose/ -header_match` 在各 Task 间一致;SDK 调用(`TableSchema`/`ColumnSchema`/`Tablet`/ -`TsFileTableWriter`/`WriteFile`)均按「已核验的 SDK 事实」一节。 - -**已知残留风险(执行中验证)**: -1. 未 `add_value` 的单元格是否默认 null —— 对照 `cli_test_util.h` 已验证路径;E2E 若 null - 行为异常则调整。 -2. 零 tag 列的 table 是否可写读 —— E2E 用了 1 个 tag 列规避;纯 field 表留作后续验证。 -3. `cat` 回读新写文件理论上正常(E2E fixture 同型可 cat),若触发 aligned-chunk 断言则 - 说明是存储引擎层问题(超出本计划范围),改用 `count`/`schema` 断言往返。 diff --git a/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md b/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md deleted file mode 100644 index 32fd96611..000000000 --- a/docs/superpowers/specs/2026-06-02-tsfile-cli-design.md +++ /dev/null @@ -1,334 +0,0 @@ - - -# Design: TsFile C++ CLI(`tsfile`) - -- **日期**:2026-06-02 -- **模块**:`cpp/`(新增 `cpp/tools/`、`cpp/test/tools/`) -- **状态**:设计已批准;部分实现(见 §10「实现现状」),剩余工作见 - `docs/superpowers/plans/2026-06-02-tsfile-cli.md` -- **目标参照**:Parquet 的 `parquet-cli` / `pqrs` —— 让 `.tsfile` 像 `.parquet` - 一样可以在命令行里被浏览、检视、预览、导出。 -- **调研依据**: - - `/Users/zhanghongyin/reasearchNotes/research/tsfile/Report.md`(主报告 §5.3) - - `/Users/zhanghongyin/reasearchNotes/research/tsfile/调研报告/各文件格式CLI工具调研.md` - -本文是「实现 tsfile-cli」的单一权威设计文档,取代此前拆分的 -`2026-06-01-tsfile-unix-cli-design.md`。 - -## 1. 目标 - -为 TsFile 提供一个单二进制、可组合、适合管道使用的 C++ 命令行工具: - -```sh -tsfile-cli [options] -tsfile-cli --help | --version -tsfile-cli help -``` - -让用户能像查看其他自描述数据文件一样查看 `.tsfile`:发现命名空间、查看 schema 和 -元数据、预览行、流式导出行、统计行数、抽样行,而不需要自己写 reader 代码。 - -命令面贴近 Parquet 及相近数据格式的工具谱系:动词为 -`ls / schema / meta / stats / head / cat / count / sample`,投影、时间范围、limit、 -offset 作为行输出命令的共享参数。 - -## 2. 调研结论对设计的约束 - -TsFile 同时有两个身份: - -1. **像 Parquet 的文件形态**:封存、不可变、自描述、列式,带 footer 元数据、偏移和 - 统计量。因此 Parquet CLI 是最重要的命令设计参照。 -2. **像 HDF5/netCDF 的命名空间**:TsFile 不总是单表文件;tree 模型下有多 device, - table 模型下有多 table。因此它需要一个 `ls` 式命名空间命令。 - -CLI 调研把不可变数据文件的只读工具谱系统一为: - -```text -schema | meta(/footer/stats) | head(/cat) | count | sample -``` - -Parquet 是最完整模板:Apache `parquet-cli` 提供 `schema`、`meta`、`footer`、`head`、 -`cat` 以及索引/统计命令;Rust `pqrs` 补齐了特别有用的 `rowcount` 和 `sample`。ORC -与 Avro 也印证同一模式(`meta`/`data`/`count`、`getschema`/`getmeta`/`cat`/`count`)。 -HDF5 和 netCDF 提供命名空间与 header 经验:`h5ls`、`h5dump -H`、`ncdump -h` 的价值在于 -不用打开应用就能查看文件内部结构。 - -调研的一句话结论是:TsFile **缺的不是「有没有 CLI」,而是「动词齐不齐 + 是否统一分发 + -能否被通用查看器看见」**。本设计解决前两者——统一成 `tsfile ` 分发器并补齐 -只读动词;通用查看器接入(DuckDB/ClickHouse/VisiData reader)属后续工作(§9)。 - -## 3. 范围 - -包含: - -- 一个名为 `tsfile-cli` 的多命令二进制。 -- 只读命令:`ls`、`schema`、`meta`、`stats`、`head`、`cat`、`count`、`sample`。 -- 输出格式、模型选择、列投影、行数限制、offset、时间范围、抽样种子等共享参数。 -- 基于现有 `storage::TsFileReader` 读路径实现,不修改存储引擎。 -- 遵守 Unix 风格:数据输出到 stdout,诊断和错误输出到 stderr,便于接入 `awk`、`jq`、 - `sort`、导入工具和 shell 管道。 - -不包含: - -- 写入、转换、合并、重写命令。 -- 与 Java `TsFileSketchTool` 完全等价的字节结构 dump。 -- FUSE 挂载、DuckDB/ClickHouse/VisiData connector 或 SQL replacement scan。 -- ISO 时间格式化,以及超出时间范围和 measurement 投影的复杂谓词。 -- 拆分为多个 `tsfile-*` 二进制;不引入第三方参数解析库。 - -## 4. 命令谱系 - -| 动词 | 谱系来源 | 目的 | 主要 reader 支撑 | -|---|---|---|---| -| `ls` | `h5ls`、`ncdump -h` | tree 模型列 device,table 模型列 table,一行一个名字 | `get_all_device_ids()`、`get_all_table_schemas()` | -| `schema` | `parquet-cli schema`、Avro `getschema`、SQL `DESCRIBE` | 输出序列或列的类型信息 | `get_timeseries_metadata()`、`get_timeseries_schema()` | -| `meta` | `parquet-cli meta/footer`、Avro `getmeta` | 输出文件级摘要:模型、版本、命名空间规模、全局时间范围、Bloom filter、文件大小 | reader 元数据 + 文件系统元数据 | -| `stats` | `parquet-cli column-index/check-stats`、ORC statistics、SQL `SUMMARIZE` | 输出每条序列的 count、时间范围、min、max、first、last、sum | `get_timeseries_metadata()` 统计量 | -| `head` | `parquet-cli head`、`pqrs head`、SQL `LIMIT` | 输出前 N 行 | 共享 row query 路径 | -| `cat` | `parquet-cli cat/scan`、Avro `cat`/`tojson`、ORC `data` | 流式输出匹配行 | 共享 row query 路径 | -| `count` | `pqrs rowcount`、ORC `count`、Avro `count`、SQL `count(*)` | 不扫描数据页,从统计量输出行数 | `get_timeseries_metadata()` 统计量 | -| `sample` | `pqrs sample`、SQL sampling | 输出可复现样本行 | 共享 row query 路径 + 确定性抽样 | - -`select` **不是**独立动词。它实际承载的是投影、时间过滤、limit 和 offset;这些能力作为 -`head`、`cat`、`sample` 的共享参数存在,与 Parquet 工具把列选择挂到行输出命令上的习惯 -一致。 - -## 5. 命令语义 - -### `ls` - -输出顶层逻辑命名空间:tree 模型每行一个 device ID,table 模型每行一个 table name。默认 -输出刻意保持简单稳定,便于管道处理;measurement / column 级细节由 `schema` 负责。 - -### `schema` - -输出统一的逻辑 schema 表: - -```text -target, measurement, datatype, encoding, compression -``` - -tree 模型下 `target` 是 device、`measurement` 是测点;table 模型下 `target` 是 table、 -`measurement` 是列名。若当前公开 API 能拿到 datatype 但拿不到 encoding/compression -(如 table 模型),CSV/TSV 输出空字段,JSON 输出 `null`。`-m` 可投影到指定列。 - -### `meta` - -输出无需解码数据页即可回答的文件级信息: - -```text -file, model, version, device_count, table_count, series_count, -start_time, end_time, bloom_filter, file_size_bytes -``` - -对应 Parquet `meta`/`footer`:先快速了解文件,再决定是否继续查看 schema、stats 或 -行数据。若某字段当前公开 reader API 无法直接暴露(如 `version`、`bloom_filter`),输出 -空值而不是扫描数据页。 - -### `stats` - -输出每条序列的统计量: - -```text -target, measurement, count, start_time, end_time, min, max, first, last, sum -``` - -直接暴露 TsFile 的格式优势:Chunk/Page 级统计量包含 count 和数值摘要,很多查看问题不 -需要读取或解码数据页。`min`/`max`/`first`/`last`/`sum` 按类型可空(如布尔无 min/max, -文本无 sum)。 - -### `head` 与 `cat` - -行输出命令: - -- `head` 默认输出前 10 行,并接受 `-n, --limit` 覆盖行数。 -- `cat` 默认流式输出全部匹配行,除非显式指定 limit。 -- 两者都通过共享 row query 路径接受投影(`-m`)、时间范围(`--start`/`--end`)、offset。 - -`head` 本质上等价于带默认 limit 的 `cat`。 - -### `count` - -从统计量读取行数,不通过 row iterator 扫描数据。这是 TsFile 优于 `parquet-cli` 表面的 -地方(后者没有独立 row-count 子命令)。作用域规则: - -- 不指定作用域:输出所有序列的 count,并给出总数行; -- `--device`:限定某个 tree-model device; -- `--table`:限定某个 table-model table。 - -### `sample` - -通过共享 row query 和确定性抽样输出 N 条样本行,默认 N=10,接受 `--seed` 保证可复现。 -实现使用 reservoir sampling。设计要求:同一文件、作用域、投影、时间范围、limit 和 seed -下输出稳定。 - -## 6. 共享参数 - -| 参数 | 含义 | 适用命令 | -|---|---|---| -| `-f, --format csv\|tsv\|json\|table` | 输出格式;默认随 stdout 是否为 TTY 自适应 | 全部 | -| `-d, --device ` | 限定 tree-model device | 行输出命令、`schema`、`stats`、`count` | -| `-t, --table ` | 限定 table-model table | 行输出命令、`schema`、`stats`、`count` | -| `-m, --measurements a,b,c` | measurement / column 投影 | `schema`、`head`、`cat`、`sample` | -| `-n, --limit N` | 最大输出行数;`head` 用它作为行数 | `head`、`cat`、`sample` | -| `--offset N` | 跳过开头 N 行 | `head`、`cat` | -| `--start ` / `--end ` | epoch milliseconds 时间范围,闭区间 | `head`、`cat`、`sample` | -| `--seed N` | 可复现抽样种子 | `sample` | -| `--no-header` | 不输出表头 | 表格类输出 | -| `--model tree\|table` | 强制模型,覆盖自动检测 | 全部 | -| `-h, --help` / `--version` | 帮助和版本 | 顶层和单命令 | - -参数与命令不匹配时按 usage error 处理(退出码 `1`,错误到 stderr)。已实现的组合校验 -(`run_cli.cc::validate_command_flags`): - -- `--seed` 仅对 `sample` 有效; -- `--offset` 对 `sample` 无效; -- `--device` 与 `--table` 不能同时使用; -- `--limit >= -1`、`--offset >= 0`、`--start <= --end`。 - -## 7. Tree 与 table 模型 - -模型检测规则自动化: - -```text -get_all_table_schemas() non-empty => table model -otherwise => tree model -``` - -`--model tree|table` 可覆盖自动检测。统一命令面下的行为: - -- `ls` 在 tree 文件中列 device,在 table 文件中列 table。 -- `schema`、`stats`、`count` 可用 `--device` 或 `--table` 收窄作用域。 -- 行输出始终把时间列视为第一列;tree 模型用 device + measurements,table 模型用 - table + columns。 - -## 8. 输出格式与退出码 - -formatter(`format/output_format.*`、`format/result_set_format.*`): - -- `table`:面向人的对齐表格;stdout 是终端时默认使用。 -- `tsv`:tab 分隔;stdout 被 pipe 或 redirect 时默认使用。 -- `csv`:按 RFC 4180 引号规则输出(字段含分隔符/引号/换行时加引号,内部引号双写)。 -- `json`:NDJSON,一行一个 JSON object;数值/布尔裸输出,其余加引号,null 输出 `null`。 - -null 在 CSV/TSV 中输出为空字段。时间戳输出存储中的 epoch milliseconds 整数(ISO 格式化 -是后续工作)。数据→stdout,诊断/usage/错误→stderr。 - -退出码: - -| 退出码 | 条件 | -|---|---| -| `0` | 成功 | -| `1` | usage 或参数错误 | -| `2` | 文件打不开或文件损坏 | -| `3` | 查询或运行时错误 | - -`ReadFile::open`(`cpp/src/file/read_file.cc`)原先向 stdout 打印打开错误,会污染 -`tsfile cat f | jq`,已改为向 stderr 输出。 - -## 9. 架构 - -```text -cpp/tools/ -├── CMakeLists.txt # OBJECT 库 tsfile_cli_obj + 可执行文件 tsfile-cli -├── tools_main.cc # main(): 转发 argv 给 run_cli -├── cli/ -│ ├── exit_codes.h # kExitOk/kExitUsage/kExitFile/kExitRuntime -│ ├── cli_args.h / .cc # ParsedArgs + parse_args() -│ └── run_cli.h / .cc # 顶层 usage、白名单、flag 组合校验、reader open、分发 -├── format/ -│ ├── output_format.h / .cc # 纯层:resolve_format、转义、类型名、RowWriter -│ └── result_set_format.h/.cc # ResultSet 泵:cell_to_string、write_result_set[_sampled] -└── commands/ - ├── commands.h # is_table_model + run_row_query + cmd_* 声明 - ├── row_query.cc # head/cat/sample 共用的 query 构造 - ├── stat_table.h / .cc # collect_series_stats / collect_file_summary / 统计值格式化 - ├── cmd_ls.cc cmd_schema.cc cmd_meta.cc cmd_stats.cc - └── cmd_head.cc cmd_cat.cc cmd_count.cc cmd_sample.cc - -cpp/test/tools/ -├── cli_test_util.h # 写一个 table-model fixture .tsfile 到临时路径 -├── cli_args_test.cc # parse_args + run_cli 参数/分发单元测试 -├── output_format_test.cc # 纯 formatter 单元测试 -├── stat_table_test.cc # 统计值格式化与汇总 helper 单元测试 -└── command_e2e_test.cc # 通过 run_cli in-process 跑每个命令的 E2E(含确定性抽样) -``` - -设计要点: - -- CLI 逻辑编译为 OBJECT 库 `tsfile_cli_obj`,既链入可执行文件 `tsfile`,也链入 - `TsFile_Test`,使命令可在进程内对注入的 `std::ostream&` 测试。 -- formatter 分纯层(无 reader 依赖、重单元测试)和 `ResultSet` 泵层(E2E 测试)。 -- 手写参数 parser,零新依赖。 -- 不修改存储引擎:所有命令使用现有 reader 元数据或现有 row query API。 - -构建:`cpp/CMakeLists.txt` 提供 `option(BUILD_TOOLS ... ON)`,开启时 -`add_subdirectory(tools)`,链接 `libtsfile` 产出 `tsfile` 可执行文件,并 `install()` 到 -`bin`。`cpp/tools/CMakeLists.txt` 用 `GLOB_RECURSE` 收集源文件,新增 `.cc` 自动纳入。 - -## 10. 实现现状(2026-06-02) - -工作树处于「半迁移」状态,剩余工作详见 -`docs/superpowers/plans/2026-06-02-tsfile-cli.md`: - -- **已提交**(commit `a392a56f`,仅 `cli/` 层 + `cli_args_test.cc`): - - 8 动词命令面、usage/help、白名单、`--seed` 解析、`validate_command_flags`; - - `select` 已从白名单移除(`select` → `Unknown command`,退出码 1); - - `meta`/`count`/`sample` 在白名单内,但被 `is_unimplemented_command` 拦截,返回 - “command not implemented yet”。 -- **已实现但未提交**(untracked):`ls`、`schema`、`stats`(仅 5 列旧版)、`head`、`cat` - 及其依赖(`commands/`、`format/`、`tools_main.cc`、`CMakeLists.txt` 等)和 E2E 测试。 -- **遗留不一致**: - - `cmd_select.cc` 与 `commands.h` 中 `cmd_select` 声明仍在,但不被分发——死代码。 - - `command_e2e_test.cc` 仍以 `select` 命令测试 `SelectWithTimeRange` / - `SelectJsonIsNdjson`,与已移除 `select` 的命令面冲突——若构建会失败。 -- **尚未实现**:`stats` 扩展到 min/max/first/last/sum;`meta`;`count`;`sample`。 - -## 11. 测试 - -测试放在 `cpp/test/tools/`,使用 Google Test,只验证 CLI 行为和真实 reader 路径,不新增 -存储引擎行为。 - -单元测试覆盖:`cli_args`(命令与参数解析、`--seed`、未知命令/参数、命令/参数不匹配); -formatter(csv/tsv/json/table,含 null、分隔符、引号、换行);模型检测(含 `--model` -覆盖);统计值格式化(`statistic_value_cells` 各类型)。 - -E2E 测试:生成 table-model fixture,通过进程内 `run_cli` 跑每个命令,断言退出码、stdout -形状、stderr 行为;确定性抽样由固定 `--seed` 跑两次断言输出一致覆盖;TTY 自适应格式由 -单元测试覆盖,E2E 显式指定 `--format`。 - -## 12. 被拒绝的方案 - -- **保留 `select` 动词**:拒绝。它与 `cat`/`head` 重叠,真正提供的是投影和过滤,应落到 - 共享参数上(Parquet 风格)。 -- **把 `count` 折叠进 `stats` 或 `meta`**:拒绝。`count` 足够常用,且 TsFile 可从统计量 - 低成本回答,显式保留能让这个格式优势更易被发现。 -- **为完全模仿 Parquet 删除 `ls`**:拒绝。TsFile 不总是单逻辑表,多 device/多 table - 命名空间使 `ls` 成为用户经常需要的第一个命令。 -- **现在实现写入或转换命令**:拒绝。本阶段只读命令风险更低,正对应调研结论。 - -## 13. 后续工作 - -- 与 Java `TsFileSketchTool` 对齐的结构 dump 命令。 -- ISO 时间格式化;超出时间范围和 measurement 投影的复杂谓词。 -- 写入、转换、合并、重写命令。 -- DuckDB / ClickHouse / VisiData reader,让 TsFile 进入多格式查询/查看工具 - (对应主报告 §6.3.3「缺连接器宿主的适配层」)。 -- 只读 FUSE 命名空间或 TableFS 视图(若项目选择通过文件系统路径暴露 TsFile)。 diff --git a/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md b/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md deleted file mode 100644 index ae978afc1..000000000 --- a/docs/superpowers/specs/2026-06-03-tsfile-cli-write-design.md +++ /dev/null @@ -1,203 +0,0 @@ - - -# Design: TsFile CLI 写入(`tsfile-cli write`) - -- **日期**:2026-06-03 -- **模块**:`cpp/`(扩展 `cpp/tools/`、`cpp/test/tools/`) -- **状态**:设计已批准,待编写实现计划 -- **关系**:在只读 CLI(`docs/superpowers/specs/2026-06-02-tsfile-cli-design.md`)之上新增 - 第一个写入命令;该读侧设计把写入列为「后续工作」,本文把其中「文本导入」这一块具体化。 -- **调研依据**:`/Users/zhanghongyin/reasearchNotes/research/tsfile/调研报告/各文件格式CLI工具调研.md` - 第 2、3、5 章的写路径动词(Parquet `convert-csv`、ORC `convert`、Avro `fromjson`)。 - -## 1. 目标 - -为 `tsfile-cli` 增加一个 `write` 命令,把 **CSV/TSV 行数据导入成一个新的 table 模型 -`.tsfile`**。与读侧 `cat -f csv|tsv` 的输出对称,使读出的数据能经管道重新写回: - -```sh -tsfile-cli cat -m s1 -f csv in.tsfile | tsfile-cli write --table t1 \ - --columns "s1:INT64:field" -o out.tsfile -``` - -设计原则与读侧一致:单二进制、可组合、stdout/stderr 分离、零新第三方依赖、不修改存储 -引擎(仅调用现有 `storage::TsFileTableWriter` 写路径)。 - -## 2. 范围 - -包含: - -- 一个 `write` 命令:CSV/TSV → 单个 table 模型 `.tsfile`。 -- 显式 schema(`--columns` + `--table`),**零类型推断**。 -- 输入来自文件或 stdin;输出到 `-o` 指定的 `.tsfile`(覆盖写)。 - -不包含(YAGNI,列入后续工作): - -- tree 模型导入。 -- JSON / NDJSON 输入。 -- 类型推断。 -- 编码 / 压缩选择(v1 固定 `PLAIN` / `UNCOMPRESSED`)。 -- append / 合并 / `tsfile → tsfile` 转换 / 重写。 -- 引号字段内的换行(v1 假设每条记录占一行)。 - -## 3. 命令形态 - -``` -tsfile-cli write --table --columns -o \ - [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -] -``` - -| 参数 | 含义 | 必填 | -|---|---|---| -| `` 位置参数 | 输入文件路径;省略或 `-` 表示从 **stdin** 读 | 否(默认 stdin) | -| `-o, --output ` | 输出 `.tsfile` 路径;已存在则覆盖(`O_TRUNC`) | 是 | -| `--table ` | 输出表名 | 是 | -| `--columns ` | 数据列规格(见 §5),按序描述**除时间列外**的列 | 是 | -| `-f, --format csv\|tsv` | 输入分隔符,默认 `csv`;`json`/`table` 视为 usage error | 否 | -| `--no-header` | 输入无表头行(默认认为首行是表头并跳过) | 否 | -| `--header-match` | 校验首行表头列名与 `--columns`(及首列 `time`)一致,不符即报错 | 否 | -| `-v, --verbose` | 成功后向 stderr 打印一行摘要;默认静默 | 否 | - -`write` 只使用上述参数;读侧的 `-d/--device`、`-m/--measurements`、`-n/--limit`、 -`--offset`、`--start/--end`、`--seed` 对 `write` 无意义,**出现即按 usage error 处理** -(退出码 `1`),以免静默误用。 - -## 4. 输入格式与行约定 - -- 一行一条记录,字段用分隔符分隔(`csv` = `,`,`tsv` = `\t`)。 -- **第一列固定是时间戳**:epoch 毫秒整数(`INT64`)。它不出现在 `--columns` 里。 -- 其余字段按 `--columns` 的顺序一一对应;每条数据行的字段数必须等于 - `1 + len(--columns)`,否则报错(§7)。 -- 默认首行为表头并跳过;表头内容**默认不校验**(列身份完全由 `--columns` 决定)。 - `--no-header` 时不跳过首行。加 `--header-match` 时校验首行:首列名任意(约定为 `time`), - 其余列名须与 `--columns` 顺序逐一相等,不符即报错(§7)。 -- **空单元格 = null**:该行该列不写入(`Tablet` 不 `add_value`,留 null)。 -- CSV 解析遵循 RFC 4180 引号规则(字段含分隔符/引号时用 `"` 包裹,内部 `"` 双写); - TSV 按 `\t` 切分、不做引号处理。引号字段内不支持换行(v1)。 - -## 5. Schema 规格(`--columns`) - -逗号分隔的列项,每项 `name:TYPE:category`: - -- `name`:列名,不含 `:` 和 `,`。 -- `TYPE`:TSDataType 名,**大小写不敏感**;v1 支持 - `BOOLEAN | INT32 | INT64 | FLOAT | DOUBLE | STRING | TEXT`。 -- `category`:`tag` 或 `field`(小写)。 - -示例:`--columns "id1:STRING:tag,id2:STRING:tag,s1:INT64:field"`。 - -解析为有序的 `ColumnDef{name, type, category}` 列表,任何一项缺字段、类型名未知、 -category 非法都按 usage error 处理(退出码 `1`,stderr 给出错误项)。 - -## 6. 写入路径 - -1. `TableSchema(table, [ColumnSchema(def.name, def.type, common::UNCOMPRESSED, - common::PLAIN, def.category) for def in columns])`。 -2. `storage::WriteFile`:`create(output, O_WRONLY|O_CREAT|O_TRUNC[, O_BINARY], 0666)`。 -3. `storage::TsFileTableWriter(&file, schema)`。 -4. 构造一个批量 `Tablet`(列 = `--columns` 的列名/类型/类别,容量如 `1024` 行):逐行 - `add_timestamp(i, ts)`;非空单元格按列类型 `add_value(i, name, typedValue)`。 -5. 批满即 `write_table(tablet)` 后复用/重置 tablet;EOF 后写出残余批。 -6. `flush()` → `close()`。 - -类型转换:单元格字符串 → 列类型。`INT32/INT64` 用 `strtoll`,`FLOAT/DOUBLE` 用 -`strtod`,`BOOLEAN` 接受 `true/false`(大小写不敏感)与 `1/0`,`STRING/TEXT` 原样。 -不可解析 → 运行时错误(§7)。 - -## 7. 退出码与输出 - -| 退出码 | 条件 | -|---|---| -| `0` | 成功 | -| `1` | usage / 参数错误(缺 `--table`/`--columns`/`-o`,`--columns` 语法错,`-f json|table`,混入读侧 flag) | -| `2` | 输入打不开 / 输出创建失败 | -| `3` | 行级错误:字段数不符、`--header-match` 下表头不符、单元格类型解析失败、写库返回错误(stderr 标出行号) | - -`write` 不向 stdout 输出数据;进度/诊断/错误一律走 stderr。**成功时默认全静默**(无 stdout、 -无 stderr 输出,遵循 Unix「silence is golden」);仅当加 `-v/--verbose` 时向 stderr 打印一行 -摘要:`wrote rows to `。 - -## 8. 架构 - -新增/改动文件: - -```text -cpp/tools/ -├── cli/ -│ ├── cli_args.h / .cc # 新增 output(-o)、columns(--columns)、verbose(-v)、header_match(--header-match) -│ └── run_cli.cc # 注册 write;在 reader.open 之前特判 write 并分发 -├── commands/ -│ ├── commands.h # 声明 cmd_write -│ └── cmd_write.cc # 读输入/构 schema+tablet/写出 -└── format/ - ├── input_format.h / .cc # parse_columns_spec、split_delimited(csv/tsv)、parse_cell -cpp/test/tools/ -├── input_format_test.cc # 列规格解析、行切分、单元格类型解析(含 null/错误) -└── command_e2e_test.cc # 追加 write→读回 的往返 E2E -``` - -关键设计点 —— **`write` 是第一个不打开 `TsFileReader` 的命令**。当前 `run_cli` 对所有命令 -都 `reader.open(p.file)`,而 `write` 的位置参数是**输入 CSV**(或 stdin),不是要打开的 -`.tsfile`。因此在 `run_cli` 中: - -- 把 `write` 加入 `is_known_command`。 -- 新增 `validate_write_flags`(缺 `--table`/`--columns`/`-o`、`-f` 非 csv/tsv、混入读侧 - flag → usage error)。 -- 在 `storage::libtsfile_init()` 之后、构造 `TsFileReader` 之前插入: - `if (p.command == "write") return cmd_write(p, out, err);` —— 完全跳过 reader 路径。 - -`cmd_write` 签名不同于读侧命令(无 reader、无 OutputFormat): - -```cpp -int cmd_write(const ParsedArgs& args, std::ostream& out, std::ostream& err); -``` - -`input_format` 为纯层(不依赖 reader):列规格解析、按分隔符切行(引号感知)、单元格→ -类型转换,便于单测。`cmd_write` 负责打开输入流(文件或 `std::cin`)、串起 schema/tablet/ -writer。复用现有 `cli/exit_codes.h`。 - -## 9. 测试 - -- **单元**(`input_format_test.cc`):`parse_columns_spec` 正例与各类错误;`split_delimited` - 的 csv 引号/转义、tsv 切分;`parse_cell` 各类型正例、空=null、解析失败。 -- **E2E**(追加到 `command_e2e_test.cc`):把一段 CSV 写到临时文件,`run_cli({"write", - "--table","t1","--columns","s1:INT64:field","-o",out,csv})`,断言退出 0;随后在进程内 - 用读路径 `run_cli({"schema"/"count"/"cat", out})` 回读,断言表名、列、行数、行值与输入 - 一致(往返)。另覆盖:缺 `--columns` → 1;行字段数不符 → 3;`--header-match` 下表头不符 - → 3;输出到不可写路径 → 2;成功默认静默、仅 `-v` 才有摘要。 - -只验证 CLI/写库行为,不新增存储引擎行为。 - -## 10. 被拒绝的方案 - -- **类型推断**:拒绝。CSV 类型推断(`1` vs `1.0` vs `"01"`)易误判;显式 `--columns` - 零歧义、实现最简,符合「先稳后省事」。推断可作后续便利项。 -- **首列以外某命名列作时间**:v1 拒绝(约定首列即时间,最简单且与读侧输出对齐); - `--time-column` 可后续再加。 -- **第二个位置参数作输出**:拒绝。现有 parser 只有一个位置参数;用 `-o/--output` 更显式, - 也避免改动位置参数语义。 -- **同时支持 tree 模型 / JSON**:本阶段拒绝(YAGNI),列入后续。 - -## 11. 后续工作 - -- tree 模型导入(device + measurements,aligned/非 aligned)。 -- JSON/NDJSON 输入(与读侧 `-f json` 对称)。 -- 类型推断、`--time-column`、编码/压缩 flag。 -- `tsfile → tsfile` 的 convert/rewrite/merge。 From e3c0b5a46af3f627b04561c7c3c091d44c0f3688 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 17:14:32 +0800 Subject: [PATCH 32/41] Add Apache license header to tsfile-cli skill doc (fix RAT check) --- cpp/tools/skills/tsfile-cli/SKILL.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/cpp/tools/skills/tsfile-cli/SKILL.md b/cpp/tools/skills/tsfile-cli/SKILL.md index 48ff61729..35ee5f6e4 100644 --- a/cpp/tools/skills/tsfile-cli/SKILL.md +++ b/cpp/tools/skills/tsfile-cli/SKILL.md @@ -1,3 +1,22 @@ + + --- name: tsfile-cli description: Use when you need to inspect, preview, export, OR import an Apache TsFile (.tsfile) from the command line — list devices/tables, dump schema, read file/series metadata, count rows, sample/preview rows, or write CSV/TSV into a new .tsfile — via the project's C++ `tsfile-cli` in cpp/tools. From e47b1516ff1f0c18d25d0247caf22a96a81b94b4 Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 18:05:02 +0800 Subject: [PATCH 33/41] Fix CLI test race: unique per-process temp filenames for parallel ctest --- cpp/test/tools/cli_test_util.h | 31 ++++++++++++++++++++++++++---- cpp/test/tools/command_e2e_test.cc | 6 ++++-- 2 files changed, 31 insertions(+), 6 deletions(-) diff --git a/cpp/test/tools/cli_test_util.h b/cpp/test/tools/cli_test_util.h index 329cda6a8..0907cbbd1 100644 --- a/cpp/test/tools/cli_test_util.h +++ b/cpp/test/tools/cli_test_util.h @@ -21,7 +21,14 @@ #define TSFILE_CLI_TEST_UTIL_H #include +#ifdef _WIN32 +#include +#else +#include +#endif +#include +#include #include #include "common/schema.h" @@ -31,9 +38,25 @@ namespace tsfile_cli_test { -inline std::string write_table_fixture( - const std::string& path = "tsfile_cli_fixture.tsfile") { +// Unique per-process path so tests stay isolated when ctest runs the +// gtest-discovered cases in parallel processes. +inline std::string unique_temp_path(const std::string& stem, + const std::string& ext) { + static std::atomic counter(0); +#ifdef _WIN32 + long pid = static_cast(_getpid()); +#else + long pid = static_cast(getpid()); +#endif + std::ostringstream ss; + ss << stem << "_" << pid << "_" << counter.fetch_add(1) << ext; + return ss.str(); +} + +inline std::string write_table_fixture(const std::string& path = "") { storage::libtsfile_init(); + std::string out_path = + path.empty() ? unique_temp_path("tsfile_cli_fixture", ".tsfile") : path; std::string table_name = "table1"; storage::WriteFile file; @@ -41,7 +64,7 @@ inline std::string write_table_fixture( #ifdef _WIN32 flags |= O_BINARY; #endif - file.create(path, flags, 0666); + file.create(out_path, flags, 0666); auto* schema = new storage::TableSchema( table_name, @@ -75,7 +98,7 @@ inline std::string write_table_fixture( delete writer; delete schema; - return path; + return out_path; } } // namespace tsfile_cli_test diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 3ee86a19f..e31ec79fc 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -206,12 +206,14 @@ TEST(CliE2E, SampleIsReproducibleWithSeed) { } TEST(CliE2E, WriteThenReadRoundTrip) { - std::string csv_path = "tsfile_cli_write_in.csv"; + std::string csv_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_write_in", ".csv"); { std::ofstream o(csv_path.c_str()); o << "time,id1,s1\n0,dev,0\n1,dev,10\n2,dev,20\n"; } - std::string out_path = "tsfile_cli_write_out.tsfile"; + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_write_out", ".tsfile"); std::ostringstream wout; std::ostringstream werr; From 43f1a73f93ff52eff94f5a08e7408d778587e8de Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 18:08:35 +0800 Subject: [PATCH 34/41] Simplify CLI test fixture: drop unused path param and atomic counter --- cpp/test/tools/cli_test_util.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/cpp/test/tools/cli_test_util.h b/cpp/test/tools/cli_test_util.h index 0907cbbd1..0d1dccb56 100644 --- a/cpp/test/tools/cli_test_util.h +++ b/cpp/test/tools/cli_test_util.h @@ -27,7 +27,6 @@ #include #endif -#include #include #include @@ -42,21 +41,20 @@ namespace tsfile_cli_test { // gtest-discovered cases in parallel processes. inline std::string unique_temp_path(const std::string& stem, const std::string& ext) { - static std::atomic counter(0); + static unsigned counter = 0; #ifdef _WIN32 long pid = static_cast(_getpid()); #else long pid = static_cast(getpid()); #endif std::ostringstream ss; - ss << stem << "_" << pid << "_" << counter.fetch_add(1) << ext; + ss << stem << "_" << pid << "_" << counter++ << ext; return ss.str(); } -inline std::string write_table_fixture(const std::string& path = "") { +inline std::string write_table_fixture() { storage::libtsfile_init(); - std::string out_path = - path.empty() ? unique_temp_path("tsfile_cli_fixture", ".tsfile") : path; + std::string out_path = unique_temp_path("tsfile_cli_fixture", ".tsfile"); std::string table_name = "table1"; storage::WriteFile file; From 4e1ffac06112f03666baa4d322289035c3d3f7dc Mon Sep 17 00:00:00 2001 From: spricoder Date: Wed, 3 Jun 2026 18:14:11 +0800 Subject: [PATCH 35/41] Add tsfile-cli tool README --- cpp/tools/README.md | 148 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 cpp/tools/README.md diff --git a/cpp/tools/README.md b/cpp/tools/README.md new file mode 100644 index 000000000..7277ff4f6 --- /dev/null +++ b/cpp/tools/README.md @@ -0,0 +1,148 @@ + + +# tsfile-cli — TsFile Command-Line Tool + +`tsfile-cli` is a single, pipe-friendly C++ command-line tool for inspecting **and** +importing Apache TsFile (`.tsfile`) files from the shell — the TsFile analogue of +`parquet-cli` / `pqrs`. Read commands print data to **stdout** and diagnostics to +**stderr**, so they compose with `awk`, `jq`, `sort`, and friends; the `write` command +imports CSV/TSV into a new `.tsfile`. It is built on the public `storage::TsFileReader` +and `storage::TsFileTableWriter` APIs and does not modify the storage engine. + +## Building + +The tool builds with the C++ module (CMake target `tsfile_cli`, output binary +`tsfile-cli`): + +```bash +cd cpp && bash build.sh -t=Debug # -> cpp/build/Debug/bin/tsfile-cli +cd cpp && bash build.sh # Release -> cpp/build/Release/bin/tsfile-cli +``` + +If your CMake is 4.x and configuration fails on the bundled ANTLR4 runtime +(`Policy CMP00xx may not be set to OLD behavior`), add `--disable-antlr4`; the reader and +CLI do not use ANTLR4: + +```bash +cd cpp && bash build.sh -t=Debug --disable-antlr4 +``` + +## Usage + +``` +tsfile-cli [options] +tsfile-cli --help | --version | help +``` + +Exit codes: `0` success, `1` usage/argument error, `2` file open/corrupt, +`3` query/runtime error. + +### Reading + +| Command | Description | +|---|---| +| `ls` | List devices (tree model) or tables (table model), one name per line | +| `schema` | Per-series `target, measurement, datatype, encoding, compression` | +| `meta` | File summary: model, version, device/table/series counts, time range, file size | +| `stats` | Per-series `count, start_time, end_time, min, max, first, last, sum` | +| `count` | Per-series row counts plus a `total` row (from statistics, no page scan) | +| `head` | First N rows (default 10; use `-n`) | +| `cat` | All matching rows, streamed | +| `sample` | Reproducible reservoir sample (default 10; `-n`, `--seed`) | + +The metadata commands (`ls` / `schema` / `meta` / `stats` / `count`) answer most questions +without decoding data pages. + +Shared options: + +| Option | Meaning | +|---|---| +| `-f, --format csv\|tsv\|json\|table` | Output format; defaults to `table` on a TTY, `tsv` when piped | +| `-d, --device ` / `-t, --table ` | Scope to one device / table (mutually exclusive) | +| `-m, --measurements a,b,c` | Column projection (`schema`, `head`, `cat`, `sample`) | +| `-n, --limit N` / `--offset N` | Max rows / rows to skip (`head`, `cat`; `--offset` not valid for `sample`) | +| `--start ` / `--end ` | Inclusive epoch-millisecond time range (`head`, `cat`, `sample`) | +| `--seed N` | Reproducible sampling seed (`sample` only) | +| `--no-header` | Omit the header row | +| `--model tree\|table` | Force the model (otherwise auto-detected) | + +`json` output is NDJSON (one object per line; numbers/booleans bare, other values quoted, +nulls as `null`). CSV output follows RFC 4180. Timestamps are raw epoch milliseconds. + +```bash +BIN=cpp/build/Debug/bin/tsfile-cli +$BIN ls -f tsv data.tsfile # list tables / devices +$BIN meta data.tsfile # quick file overview +$BIN count -t table1 -f tsv data.tsfile # row counts, no page scan +$BIN cat -m temp,humidity --start 1700000000000 -f csv data.tsfile | head +$BIN sample -m temp -n 20 --seed 42 -f json data.tsfile | jq . +``` + +### Writing (import) + +`tsfile-cli write` imports CSV/TSV rows into a **new table-model** `.tsfile` (the output is +overwritten). The first input column is the timestamp (epoch milliseconds); the remaining +columns are declared explicitly with `--columns` — there is no type inference. + +``` +tsfile-cli write --table --columns -o \ + [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -] +``` + +`--columns` is a comma-separated list of `name:TYPE:category`, where `category` is `tag` or +`field` and `TYPE` (case-insensitive) is one of `BOOLEAN, INT32, INT64, FLOAT, DOUBLE, +STRING, TEXT` — for example `--columns "id1:STRING:tag,s1:INT64:field"`. + +| Option | Meaning | +|---|---| +| `--table ` | Output table name (lower-cased) | +| `--columns ` | Ordered data columns (excludes the leading timestamp column) | +| `-o, --output ` | Output `.tsfile` (required; overwritten) | +| `` / `-` | Input file, or `-` / omitted for stdin | +| `-f csv\|tsv` | Input delimiter (default csv; `json` / `table` are rejected) | +| `--no-header` | Input has no header row (default: first line is a header and is skipped) | +| `--header-match` | Validate header names against `--columns` | +| `-v, --verbose` | Print `wrote N rows to ` to stderr (otherwise silent on success) | + +An empty cell is written as null. The command is silent on success (Unix-style); pass `-v` +for a one-line summary. + +```bash +# round-trip through a pipe +printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' \ + | tsfile-cli write --table t1 --columns "id1:STRING:tag,s1:INT64:field" -o out.tsfile - +tsfile-cli count -f tsv out.tsfile # -> t1.dev s1 2 +``` + +For tree-model writes, JSON input, or programmatic use, use the C++ SDK directly — see +`cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter` / `TsFileWriter` + `Tablet`). + +## Source layout + +```text +cpp/tools/ +├── tools_main.cc # main(): forwards argv to run_cli +├── cli/ # argument parsing, top-level dispatch, exit codes +├── format/ # csv/tsv/json/table output + CSV/TSV input parsing +├── commands/ # one file per command + shared row-query / statistics helpers +└── skills/tsfile-cli/ # model-facing skill reference (for AI assistants) +``` From d41cedede4d110860ab55fc33c814bdabf9a8ce5 Mon Sep 17 00:00:00 2001 From: spricoder Date: Thu, 4 Jun 2026 10:41:32 +0800 Subject: [PATCH 36/41] Expand tsfile-cli README: build-from-source and skill install --- cpp/tools/README.md | 79 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 70 insertions(+), 9 deletions(-) diff --git a/cpp/tools/README.md b/cpp/tools/README.md index 7277ff4f6..1862c8ff0 100644 --- a/cpp/tools/README.md +++ b/cpp/tools/README.md @@ -28,24 +28,59 @@ importing Apache TsFile (`.tsfile`) files from the shell — the TsFile analogue imports CSV/TSV into a new `.tsfile`. It is built on the public `storage::TsFileReader` and `storage::TsFileTableWriter` APIs and does not modify the storage engine. -## Building +## Building from source -The tool builds with the C++ module (CMake target `tsfile_cli`, output binary -`tsfile-cli`): +The CLI is part of the C++ module and is built by default (CMake option `BUILD_TOOLS=ON`). +The CMake target is `tsfile_cli`; the produced executable is named `tsfile-cli`. + +**Prerequisites:** a C++11 compiler (GCC / Clang / MSVC) and CMake ≥ 3.11. The third-party +dependencies (ANTLR4, Snappy, LZ4, LZOKAY, Zlib, GoogleTest) are bundled under +`cpp/third_party/` and built automatically — no separate install step needed. + +Choose any one of the following. + +**1. Build script (recommended).** From `cpp/`: ```bash -cd cpp && bash build.sh -t=Debug # -> cpp/build/Debug/bin/tsfile-cli -cd cpp && bash build.sh # Release -> cpp/build/Release/bin/tsfile-cli +bash build.sh -t=Debug # -> cpp/build/Debug/bin/tsfile-cli +bash build.sh # Release (default) -> cpp/build/Release/bin/tsfile-cli ``` -If your CMake is 4.x and configuration fails on the bundled ANTLR4 runtime -(`Policy CMP00xx may not be set to OLD behavior`), add `--disable-antlr4`; the reader and -CLI do not use ANTLR4: +**2. Maven (builds the whole C++ module).** From the repository root: ```bash -cd cpp && bash build.sh -t=Debug --disable-antlr4 +./mvnw clean package -P with-cpp # -> cpp/target/build/bin/tsfile-cli ``` +**3. Plain CMake.** From `cpp/`: + +```bash +mkdir -p build/Debug && cd build/Debug +cmake ../.. -DCMAKE_BUILD_TYPE=Debug +make -j tsfile_cli # -> build/Debug/bin/tsfile-cli +``` + +> **CMake 4.x note.** The bundled ANTLR4 runtime sets old CMake policies that CMake 4 +> rejects (`Policy CMP00xx may not be set to OLD behavior`). The reader and CLI do not use +> ANTLR4, so disable it — `--disable-antlr4` for the build script, or `-DENABLE_ANTLR4=OFF` +> for plain CMake: +> +> ```bash +> bash build.sh -t=Debug --disable-antlr4 +> ``` + +Verify the binary: + +```bash +./build/Debug/bin/tsfile-cli --version # -> tsfile-cli (Apache TsFile C++) +./build/Debug/bin/tsfile-cli --help +``` + +The executable links the `tsfile` shared library built alongside it. To run it from +anywhere, either run it in place by its full path, or use CMake's install step +(`cmake --install .` / `make install`), which installs the binary to `/bin` and +`libtsfile` to `/lib`. + ## Usage ``` @@ -136,6 +171,32 @@ tsfile-cli count -f tsv out.tsfile # -> t1.dev s1 2 For tree-model writes, JSON input, or programmatic use, use the C++ SDK directly — see `cpp/examples/cpp_examples/demo_write.cpp` (`TsFileTableWriter` / `TsFileWriter` + `Tablet`). +## Using the skill with an AI assistant + +`cpp/tools/skills/tsfile-cli/SKILL.md` is a machine-readable reference that teaches AI +coding assistants (e.g. Claude Code) how to drive `tsfile-cli` correctly. Such assistants +auto-discover skills from a `.claude/skills/` directory at session start, so "installing" +the skill just means placing it there — either project-level or user-level: + +```bash +# project-level (this repository only) +mkdir -p .claude/skills/tsfile-cli +cp cpp/tools/skills/tsfile-cli/SKILL.md .claude/skills/tsfile-cli/SKILL.md + +# or user-level (available in all your projects) +mkdir -p ~/.claude/skills/tsfile-cli +cp cpp/tools/skills/tsfile-cli/SKILL.md ~/.claude/skills/tsfile-cli/SKILL.md +``` + +> The installed `SKILL.md` must begin with its YAML front-matter (`--- … ---`) for the +> assistant to detect it. The in-repo copy carries an Apache license header comment above +> the front-matter; if discovery fails, delete that leading `` block from the +> installed copy so `---` is the first line. + +Start a new assistant session afterward. The skill then activates automatically when you +ask to inspect or import a `.tsfile`; you can also invoke it explicitly (e.g. "use the +tsfile-cli skill"). + ## Source layout ```text From 7bb6b623c79c819fbe7dcec2fc783b033be583de Mon Sep 17 00:00:00 2001 From: spricoder Date: Fri, 5 Jun 2026 19:59:38 +0800 Subject: [PATCH 37/41] Address PR #829 review feedback for tsfile-cli - row_query/sample: translate storage error codes to readable phrases instead of emitting a bare numeric code - read_file: drop the always -1 fd_ value from open() diagnostic; keep strerror(errno) - run_cli: honor --help even with a positional file; correct the stats usage text (min/max/first/last/sum) - meta: remove the always-empty version/bloom_filter columns (the public reader API exposes neither); update README and SKILL accordingly - write: stream rows into fixed 1024-row Tablet batches so memory stays bounded regardless of input size - write: reject numeric overflow (ERANGE for int/float/double, plus an INT32 range check) - tools CMake: remove the noisy unconditional configure message - test CMake: drop the no-op SYSTEM target property and force the vendored GTest headers ahead on TsFile_Test where header resolution matters --- cpp/src/file/read_file.cc | 6 +- cpp/test/CMakeLists.txt | 17 +++- cpp/test/tools/command_e2e_test.cc | 5 +- cpp/tools/CMakeLists.txt | 2 - cpp/tools/README.md | 2 +- cpp/tools/cli/run_cli.cc | 5 +- cpp/tools/commands/cmd_meta.cc | 18 ++-- cpp/tools/commands/cmd_sample.cc | 8 +- cpp/tools/commands/cmd_write.cc | 130 +++++++++++++++++--------- cpp/tools/commands/row_query.cc | 8 +- cpp/tools/format/result_set_format.cc | 39 ++++++++ cpp/tools/format/result_set_format.h | 4 + cpp/tools/skills/tsfile-cli/SKILL.md | 2 +- 13 files changed, 172 insertions(+), 74 deletions(-) diff --git a/cpp/src/file/read_file.cc b/cpp/src/file/read_file.cc index 9dc41dbf6..8aab78ca6 100644 --- a/cpp/src/file/read_file.cc +++ b/cpp/src/file/read_file.cc @@ -51,10 +51,8 @@ int ReadFile::open(const std::string& file_path) { file_path_ = file_path; fd_ = ::open(file_path_.c_str(), O_RDONLY); if (fd_ < 0) { - std::cerr << "open file " << file_path << " error :" << fd_ - << std::endl; - std::cerr << "open error" << errno << " " << strerror(errno) - << std::endl; + std::cerr << "open file " << file_path << " error: " << strerror(errno) + << " (errno " << errno << ")" << std::endl; return E_FILE_OPEN_ERR; } diff --git a/cpp/test/CMakeLists.txt b/cpp/test/CMakeLists.txt index 29c2dd5b6..a4abddaf5 100644 --- a/cpp/test/CMakeLists.txt +++ b/cpp/test/CMakeLists.txt @@ -64,10 +64,12 @@ if (${DOWNLOADED}) set(gtest_force_shared_crt ON CACHE BOOL "" FORCE) FetchContent_MakeAvailable(googletest) # AppleClang searches /usr/local/include before CMake's generated -isystem - # paths. Force the vendored GTest headers ahead of any system installation. + # paths, so a system-installed GTest can shadow the vendored headers. Force + # the vendored include dirs ahead for GTest's own sources here; the same is + # done for the consuming TsFile_Test target below (where header resolution + # actually matters for the test code). foreach (GTEST_TARGET gtest gtest_main gmock gmock_main) if (TARGET ${GTEST_TARGET}) - set_target_properties(${GTEST_TARGET} PROPERTIES SYSTEM OFF) target_include_directories(${GTEST_TARGET} BEFORE PRIVATE ${googletest_SOURCE_DIR}/googletest/include ${googletest_SOURCE_DIR}/googletest @@ -83,6 +85,11 @@ if (${DOWNLOADED}) endif () endif () endforeach () + # Remember the vendored GTest header roots so they can be forced ahead of any + # system installation when compiling TsFile_Test itself. + set(VENDORED_GTEST_INCLUDE_DIRS + ${googletest_SOURCE_DIR}/googletest/include + ${googletest_SOURCE_DIR}/googlemock/include) set(TESTS_ENABLED ON PARENT_SCOPE) else () message(WARNING "Failed to download googletest from all provided URLs, setting TESTS_ENABLED to OFF") @@ -175,6 +182,12 @@ if (ENABLE_ANTLR4) endif() add_executable(TsFile_Test ${TEST_SRCS}) +# Force the vendored GTest headers ahead of any system installation so the test +# code reliably compiles against the FetchContent-provided 1.12.1 headers. +if (VENDORED_GTEST_INCLUDE_DIRS) + target_include_directories(TsFile_Test BEFORE PRIVATE + ${VENDORED_GTEST_INCLUDE_DIRS}) +endif () if (BUILD_TOOLS) target_include_directories(TsFile_Test PRIVATE ${CMAKE_SOURCE_DIR}/tools) endif () diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index e31ec79fc..0bff9e75f 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -163,9 +163,8 @@ TEST(CliE2E, MetaReportsFileSummary) { int code = tsfile_cli::run_cli({"meta", "-f", "tsv", f.path}, out, err); EXPECT_EQ(code, 0); EXPECT_TRUE(err.str().empty()); - EXPECT_NE(out.str().find("file\tmodel\tversion\tdevice_count\ttable_" - "count\tseries_count\tstart_time\tend_time\tbloom_" - "filter\tfile_size_bytes"), + EXPECT_NE(out.str().find("file\tmodel\tdevice_count\ttable_count\tseries_" + "count\tstart_time\tend_time\tfile_size_bytes"), std::string::npos); EXPECT_NE(out.str().find("\ttable\t"), std::string::npos); } diff --git a/cpp/tools/CMakeLists.txt b/cpp/tools/CMakeLists.txt index da4b072b5..d0e5d9633 100644 --- a/cpp/tools/CMakeLists.txt +++ b/cpp/tools/CMakeLists.txt @@ -17,8 +17,6 @@ specific language governing permissions and limitations under the License. ]] -message("Running in tools directory") - file(GLOB_RECURSE TSFILE_CLI_SRCS "cli/*.cc" "format/*.cc" diff --git a/cpp/tools/README.md b/cpp/tools/README.md index 1862c8ff0..40d2bd51c 100644 --- a/cpp/tools/README.md +++ b/cpp/tools/README.md @@ -97,7 +97,7 @@ Exit codes: `0` success, `1` usage/argument error, `2` file open/corrupt, |---|---| | `ls` | List devices (tree model) or tables (table model), one name per line | | `schema` | Per-series `target, measurement, datatype, encoding, compression` | -| `meta` | File summary: model, version, device/table/series counts, time range, file size | +| `meta` | File summary: model, device/table/series counts, time range, file size | | `stats` | Per-series `count, start_time, end_time, min, max, first, last, sum` | | `count` | Per-series row counts plus a `total` row (from statistics, no page scan) | | `head` | First N rows (default 10; use `-n`) | diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 837c43a64..750028551 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -51,7 +51,8 @@ void print_usage(std::ostream& os) { " ls list devices (tree) or tables (table)\n" " schema per-measurement data type/encoding/compression\n" " meta file metadata summary\n" - " stats per-series row count and time range\n" + " stats per-series count, time range, " + "min/max/first/last/sum\n" " head first N rows (use -n)\n" " cat all rows of a device/table\n" " count row count\n" @@ -143,7 +144,7 @@ int run_cli(const std::vector& args, std::ostream& out, return kExitUsage; } if (p.command == "help" || p.command == "--help" || p.command == "-h" || - (p.help && p.file.empty())) { + p.help) { print_usage(out); return kExitOk; } diff --git a/cpp/tools/commands/cmd_meta.cc b/cpp/tools/commands/cmd_meta.cc index 9b6d8f07a..a942a0508 100644 --- a/cpp/tools/commands/cmd_meta.cc +++ b/cpp/tools/commands/cmd_meta.cc @@ -29,22 +29,20 @@ namespace tsfile_cli { int cmd_meta(const ParsedArgs& args, storage::TsFileReader& reader, OutputFormat fmt, std::ostream& out, std::ostream& /*err*/) { RowWriter w(out, fmt, - {"file", "model", "version", "device_count", "table_count", - "series_count", "start_time", "end_time", "bloom_filter", - "file_size_bytes"}, - {common::STRING, common::STRING, common::STRING, common::INT64, - common::INT64, common::INT64, common::INT64, common::INT64, - common::STRING, common::INT64}, + {"file", "model", "device_count", "table_count", + "series_count", "start_time", "end_time", "file_size_bytes"}, + {common::STRING, common::STRING, common::INT64, common::INT64, + common::INT64, common::INT64, common::INT64, common::INT64}, args.no_header); FileSummary s = collect_file_summary(args, reader); - w.write({s.file, s.model, "", std::to_string(s.device_count), + w.write({s.file, s.model, std::to_string(s.device_count), std::to_string(s.table_count), std::to_string(s.series_count), s.has_time_range ? std::to_string(s.start_time) : "", - s.has_time_range ? std::to_string(s.end_time) : "", "", + s.has_time_range ? std::to_string(s.end_time) : "", std::to_string(s.file_size_bytes)}, - {false, false, true, false, false, false, !s.has_time_range, - !s.has_time_range, true, false}); + {false, false, false, false, false, !s.has_time_range, + !s.has_time_range, false}); w.finish(); return kExitOk; } diff --git a/cpp/tools/commands/cmd_sample.cc b/cpp/tools/commands/cmd_sample.cc index 75edc2f34..767fd2805 100644 --- a/cpp/tools/commands/cmd_sample.cc +++ b/cpp/tools/commands/cmd_sample.cc @@ -93,7 +93,7 @@ int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, } if (qret != 0 || rs == nullptr) { - err << "Error: query failed (code " << qret << ")\n"; + err << "Error: query failed: " << query_error_text(qret) << "\n"; if (rs != nullptr) { reader.destroy_query_data_set(rs); } @@ -106,7 +106,11 @@ int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, int wret = write_result_set_sampled(rs, fmt, args.no_header, out, limit, seed); reader.destroy_query_data_set(rs); - return wret == 0 ? kExitOk : kExitRuntime; + if (wret != 0) { + err << "Error: failed to read rows: " << query_error_text(wret) << "\n"; + return kExitRuntime; + } + return kExitOk; } } // namespace tsfile_cli diff --git a/cpp/tools/commands/cmd_write.cc b/cpp/tools/commands/cmd_write.cc index e08bd405d..35c07d3d0 100644 --- a/cpp/tools/commands/cmd_write.cc +++ b/cpp/tools/commands/cmd_write.cc @@ -20,6 +20,7 @@ #include #include +#include #include #include #include @@ -69,38 +70,58 @@ bool add_typed_value(storage::Tablet& tablet, uint32_t row, return true; } case common::INT32: { - long v = std::strtol(cell.c_str(), &e, 10); + errno = 0; + long long v = std::strtoll(cell.c_str(), &e, 10); if (e == nullptr || *e != '\0') { error = "bad INT32 '" + cell + "'"; return false; } + if (errno == ERANGE || v < INT32_MIN || v > INT32_MAX) { + error = "INT32 out of range '" + cell + "'"; + return false; + } tablet.add_value(row, def.name, static_cast(v)); return true; } case common::INT64: { + errno = 0; long long v = std::strtoll(cell.c_str(), &e, 10); if (e == nullptr || *e != '\0') { error = "bad INT64 '" + cell + "'"; return false; } + if (errno == ERANGE) { + error = "INT64 out of range '" + cell + "'"; + return false; + } tablet.add_value(row, def.name, static_cast(v)); return true; } case common::FLOAT: { + errno = 0; float v = std::strtof(cell.c_str(), &e); if (e == nullptr || *e != '\0') { error = "bad FLOAT '" + cell + "'"; return false; } + if (errno == ERANGE) { + error = "FLOAT out of range '" + cell + "'"; + return false; + } tablet.add_value(row, def.name, v); return true; } case common::DOUBLE: { + errno = 0; double v = std::strtod(cell.c_str(), &e); if (e == nullptr || *e != '\0') { error = "bad DOUBLE '" + cell + "'"; return false; } + if (errno == ERANGE) { + error = "DOUBLE out of range '" + cell + "'"; + return false; + } tablet.add_value(row, def.name, v); return true; } @@ -163,33 +184,6 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, } } - std::vector rows; - while (std::getline(*in, line)) { - ++line_no; - strip_cr(line); - if (line.empty()) { - continue; - } - std::vector fields = split_line(line, delim, csv_quotes); - if (fields.size() != columns.size() + 1) { - err << "Error: expected " << (columns.size() + 1) << " fields, got " - << fields.size() << " (line " << line_no << ")\n"; - return kExitRuntime; - } - char* e = nullptr; - long long ts = std::strtoll(fields[0].c_str(), &e, 10); - if (e == nullptr || *e != '\0') { - err << "Error: bad timestamp '" << fields[0] << "' (line " - << line_no << ")\n"; - return kExitRuntime; - } - DataRow r; - r.line_no = line_no; - r.timestamp = static_cast(ts); - r.cells.assign(fields.begin() + 1, fields.end()); - rows.push_back(r); - } - std::vector names; std::vector types; std::vector cats; @@ -214,31 +208,77 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, auto* schema = new storage::TableSchema(args.table, col_schemas); auto* writer = new storage::TsFileTableWriter(&file, schema); - int rc = kExitOk; + // Stream rows into fixed-size batches so memory stays bounded regardless of + // input size; a full file is never buffered in memory. const size_t kBatch = 1024; - for (size_t start = 0; start < rows.size() && rc == kExitOk; - start += kBatch) { - size_t end = std::min(start + kBatch, rows.size()); + int rc = kExitOk; + long long total_rows = 0; + std::vector batch; + batch.reserve(kBatch); + + auto flush_batch = [&]() -> bool { + if (batch.empty()) { + return true; + } storage::Tablet tablet(args.table, names, types, cats, - static_cast(end - start)); - for (size_t i = start; i < end && rc == kExitOk; ++i) { - uint32_t r = static_cast(i - start); - tablet.add_timestamp(r, rows[i].timestamp); + static_cast(batch.size())); + for (size_t i = 0; i < batch.size(); ++i) { + uint32_t r = static_cast(i); + tablet.add_timestamp(r, batch[i].timestamp); for (size_t j = 0; j < columns.size(); ++j) { - std::string cerr; - if (!add_typed_value(tablet, r, columns[j], rows[i].cells[j], - cerr)) { - err << "Error: " << cerr << " (line " << rows[i].line_no - << ")\n"; - rc = kExitRuntime; - break; + std::string cell_err; + if (!add_typed_value(tablet, r, columns[j], batch[i].cells[j], + cell_err)) { + err << "Error: " << cell_err << " (line " + << batch[i].line_no << ")\n"; + return false; } } } - if (rc == kExitOk && writer->write_table(tablet) != 0) { + if (writer->write_table(tablet) != 0) { err << "Error: write_table failed\n"; + return false; + } + total_rows += static_cast(batch.size()); + batch.clear(); + return true; + }; + + while (std::getline(*in, line)) { + ++line_no; + strip_cr(line); + if (line.empty()) { + continue; + } + std::vector fields = split_line(line, delim, csv_quotes); + if (fields.size() != columns.size() + 1) { + err << "Error: expected " << (columns.size() + 1) << " fields, got " + << fields.size() << " (line " << line_no << ")\n"; + rc = kExitRuntime; + break; + } + char* e = nullptr; + errno = 0; + long long ts = std::strtoll(fields[0].c_str(), &e, 10); + if (e == nullptr || *e != '\0' || errno == ERANGE) { + err << "Error: bad timestamp '" << fields[0] << "' (line " + << line_no << ")\n"; rc = kExitRuntime; + break; } + DataRow r; + r.line_no = line_no; + r.timestamp = static_cast(ts); + r.cells.assign(fields.begin() + 1, fields.end()); + batch.push_back(r); + if (batch.size() >= kBatch && !flush_batch()) { + rc = kExitRuntime; + break; + } + } + + if (rc == kExitOk && !flush_batch()) { + rc = kExitRuntime; } if (rc == kExitOk) { @@ -253,7 +293,7 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, delete schema; if (rc == kExitOk && args.verbose) { - err << "wrote " << rows.size() << " rows to " << args.output << "\n"; + err << "wrote " << total_rows << " rows to " << args.output << "\n"; } return rc; } diff --git a/cpp/tools/commands/row_query.cc b/cpp/tools/commands/row_query.cc index 0309108b8..308ebca81 100644 --- a/cpp/tools/commands/row_query.cc +++ b/cpp/tools/commands/row_query.cc @@ -96,7 +96,7 @@ int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, } if (qret != 0 || rs == nullptr) { - err << "Error: query failed (code " << qret << ")\n"; + err << "Error: query failed: " << query_error_text(qret) << "\n"; if (rs != nullptr) { reader.destroy_query_data_set(rs); } @@ -105,7 +105,11 @@ int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, int wret = write_result_set(rs, fmt, args.no_header, out, offset, limit); reader.destroy_query_data_set(rs); - return wret == 0 ? kExitOk : kExitRuntime; + if (wret != 0) { + err << "Error: failed to read rows: " << query_error_text(wret) << "\n"; + return kExitRuntime; + } + return kExitOk; } } // namespace tsfile_cli diff --git a/cpp/tools/format/result_set_format.cc b/cpp/tools/format/result_set_format.cc index 3f5aec471..81434c78a 100644 --- a/cpp/tools/format/result_set_format.cc +++ b/cpp/tools/format/result_set_format.cc @@ -29,6 +29,45 @@ namespace tsfile_cli { +const char* query_error_text(int code) { + switch (code) { + case common::E_OOM: + return "out of memory"; + case common::E_NOT_EXIST: + return "not found"; + case common::E_INVALID_ARG: + return "invalid argument"; + case common::E_OUT_OF_RANGE: + return "value out of range"; + case common::E_FILE_OPEN_ERR: + return "cannot open file"; + case common::E_FILE_READ_ERR: + return "file read error"; + case common::E_TSFILE_CORRUPTED: + return "file is corrupted"; + case common::E_INVALID_PATH: + return "invalid path"; + case common::E_DEVICE_NOT_EXIST: + return "device does not exist"; + case common::E_MEASUREMENT_NOT_EXIST: + return "measurement does not exist"; + case common::E_TABLE_NOT_EXIST: + return "table does not exist"; + case common::E_COLUMN_NOT_EXIST: + return "column does not exist"; + case common::E_INVALID_QUERY: + return "invalid query"; + case common::E_TYPE_NOT_SUPPORTED: + return "data type not supported"; + case common::E_TYPE_NOT_MATCH: + return "data type mismatch"; + case common::E_DECODE_ERR: + return "failed to decode data"; + default: + return "internal error"; + } +} + std::string cell_to_string(storage::ResultSet* rs, uint32_t i, common::TSDataType type) { std::ostringstream ss; diff --git a/cpp/tools/format/result_set_format.h b/cpp/tools/format/result_set_format.h index 076964850..1323b004f 100644 --- a/cpp/tools/format/result_set_format.h +++ b/cpp/tools/format/result_set_format.h @@ -29,6 +29,10 @@ namespace tsfile_cli { +// Translate a storage-engine error code into a human-readable phrase so CLI +// diagnostics carry meaning instead of a bare numeric code. +const char* query_error_text(int code); + std::string cell_to_string(storage::ResultSet* rs, uint32_t col_index, common::TSDataType type); diff --git a/cpp/tools/skills/tsfile-cli/SKILL.md b/cpp/tools/skills/tsfile-cli/SKILL.md index 35ee5f6e4..afd16a901 100644 --- a/cpp/tools/skills/tsfile-cli/SKILL.md +++ b/cpp/tools/skills/tsfile-cli/SKILL.md @@ -43,7 +43,7 @@ Single pipe-friendly C++ binary to inspect **and** import `.tsfile` (TsFile's an |---|---|---| | `ls` | device (tree) / table (table) per line | no | | `schema` | `target,measurement,datatype,encoding,compression` | no | -| `meta` | model, version, device/table/series counts, time range, bloom, size | no | +| `meta` | model, device/table/series counts, time range, size | no | | `stats` | per-series `count,start,end,min,max,first,last,sum` | no | | `count` | per-series counts + `total` row | no | | `head` | first N rows (default 10, `-n`) | yes | From 57392f0a25fc78b7d319f3ed68346f3725608221 Mon Sep 17 00:00:00 2001 From: spricoder Date: Fri, 5 Jun 2026 22:05:57 +0800 Subject: [PATCH 38/41] Harden tsfile-cli writes, errors, and input validation Strict-review follow-up to PR #829: - write: reject non-strictly-increasing timestamps per device (tag tuple) with a located message; refuse --output equal to the input file; remove the partial output on any failure so no corrupt .tsfile is left behind - write/query/sample failures now print a human-readable cause via error_code_message() instead of a bare numeric code; the helper lives in the output_format layer so read and write share it - schema: report real encoding/compression for table-model columns instead of always-empty cells - columns spec: reject duplicate column names - reject flags that do not apply to a command (write-only flags on read commands, row/range flags on metadata commands, --header-match with --no-header), and give a clear error when an option precedes the command - rename the read-output helpers to emit_result_set* and the JSON predicate to emits_json_bare so the names match what they do - docs: document the per-device timestamp ordering rule and drop the unimplemented "help " form --- cpp/tools/README.md | 8 +++- cpp/tools/cli/cli_args.cc | 8 ++++ cpp/tools/cli/run_cli.cc | 57 +++++++++++++++++++++++ cpp/tools/commands/cmd_sample.cc | 7 +-- cpp/tools/commands/cmd_schema.cc | 20 +++++---- cpp/tools/commands/cmd_write.cc | 65 ++++++++++++++++++++++++--- cpp/tools/commands/row_query.cc | 7 +-- cpp/tools/format/input_format.cc | 6 +++ cpp/tools/format/output_format.cc | 51 ++++++++++++++++++++- cpp/tools/format/output_format.h | 6 ++- cpp/tools/format/result_set_format.cc | 49 +++----------------- cpp/tools/format/result_set_format.h | 16 +++---- cpp/tools/skills/tsfile-cli/SKILL.md | 11 +++-- 13 files changed, 228 insertions(+), 83 deletions(-) diff --git a/cpp/tools/README.md b/cpp/tools/README.md index 40d2bd51c..c5721f46c 100644 --- a/cpp/tools/README.md +++ b/cpp/tools/README.md @@ -85,7 +85,7 @@ anywhere, either run it in place by its full path, or use CMake's install step ``` tsfile-cli [options] -tsfile-cli --help | --version | help +tsfile-cli --help | --version | help ``` Exit codes: `0` success, `1` usage/argument error, `2` file open/corrupt, @@ -138,6 +138,12 @@ $BIN sample -m temp -n 20 --seed 42 -f json data.tsfile | jq . overwritten). The first input column is the timestamp (epoch milliseconds); the remaining columns are declared explicitly with `--columns` — there is no type inference. +Timestamps must be **strictly increasing per device**, where a device is identified by its +`tag` column values (rows that share the same tags form one device's timeline). Rows for +different tag combinations may freely interleave and reuse timestamps. Out-of-order input is +rejected with the offending line number, and a failed import leaves no output file behind. +`--output` must differ from the input file. + ``` tsfile-cli write --table --columns -o \ [-f csv|tsv] [--no-header] [--header-match] [-v] [ | -] diff --git a/cpp/tools/cli/cli_args.cc b/cpp/tools/cli/cli_args.cc index cfdb9d75d..61c193d88 100644 --- a/cpp/tools/cli/cli_args.cc +++ b/cpp/tools/cli/cli_args.cc @@ -79,6 +79,14 @@ ParsedArgs parse_args(const std::vector& args) { if (p.command == "--help" || p.command == "-h") { p.help = true; } + // The subcommand must come first. A leading option means it was omitted; + // say so explicitly instead of failing later with a confusing message about + // the first real positional argument. + if (p.command.size() > 1 && p.command[0] == '-' && !p.version && !p.help) { + p.error = "the command must come before options (got option '" + + p.command + "'); run with --help for usage"; + return p; + } size_t i = 1; auto need_value = [&](const std::string& flag, std::string& dst) -> bool { diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 750028551..7c10fac63 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -121,6 +121,10 @@ bool validate_write_flags(const ParsedArgs& p, std::ostream& err) { err << "Error: write input format must be csv or tsv\n"; return false; } + if (p.no_header && p.header_match) { + err << "Error: --header-match cannot be combined with --no-header\n"; + return false; + } if (!p.measurements.empty() || !p.device.empty() || p.has_start || p.has_end || p.has_seed || p.limit != -1 || p.offset != 0) { err << "Error: read-only flags are not valid for write\n"; @@ -129,6 +133,54 @@ bool validate_write_flags(const ParsedArgs& p, std::ostream& err) { return true; } +// Reject flags that have no effect for the given read command, instead of +// silently ignoring them, so misuse is caught rather than producing surprising +// output. Only called for non-write commands; write has its own validation. +bool validate_read_flag_applicability(const ParsedArgs& p, std::ostream& err) { + const std::string& c = p.command; + const bool is_row = (c == "head" || c == "cat" || c == "sample"); + const bool scoped = (c == "schema" || c == "stats" || c == "count" || + c == "head" || c == "cat" || c == "sample"); + + if (!p.output.empty()) { + err << "Error: -o/--output is only valid for write\n"; + return false; + } + if (!p.columns.empty()) { + err << "Error: --columns is only valid for write\n"; + return false; + } + if (p.header_match) { + err << "Error: --header-match is only valid for write\n"; + return false; + } + if (p.verbose) { + err << "Error: -v/--verbose is only valid for write\n"; + return false; + } + if (!is_row && p.limit != -1) { + err << "Error: -n/--limit is only valid for head/cat/sample\n"; + return false; + } + if (!is_row && (p.has_start || p.has_end)) { + err << "Error: --start/--end are only valid for head/cat/sample\n"; + return false; + } + if (!scoped && !p.device.empty()) { + err << "Error: -d/--device is not valid for " << c << "\n"; + return false; + } + if (!scoped && !p.table.empty()) { + err << "Error: -t/--table is not valid for " << c << "\n"; + return false; + } + if (!scoped && !p.measurements.empty()) { + err << "Error: -m/--measurements is not valid for " << c << "\n"; + return false; + } + return true; +} + } // namespace int run_cli(const std::vector& args, std::ostream& out, @@ -176,6 +228,11 @@ int run_cli(const std::vector& args, std::ostream& out, return cmd_write(p, out, err); } + if (!validate_read_flag_applicability(p, err)) { + print_usage(err); + return kExitUsage; + } + storage::libtsfile_init(); storage::TsFileReader reader; int open_ret = reader.open(p.file); diff --git a/cpp/tools/commands/cmd_sample.cc b/cpp/tools/commands/cmd_sample.cc index 767fd2805..744dd6577 100644 --- a/cpp/tools/commands/cmd_sample.cc +++ b/cpp/tools/commands/cmd_sample.cc @@ -93,7 +93,7 @@ int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, } if (qret != 0 || rs == nullptr) { - err << "Error: query failed: " << query_error_text(qret) << "\n"; + err << "Error: query failed: " << error_code_message(qret) << "\n"; if (rs != nullptr) { reader.destroy_query_data_set(rs); } @@ -104,10 +104,11 @@ int cmd_sample(const ParsedArgs& args, storage::TsFileReader& reader, const unsigned long long seed = args.has_seed ? static_cast(args.seed) : 0ULL; int wret = - write_result_set_sampled(rs, fmt, args.no_header, out, limit, seed); + emit_result_set_sampled(rs, fmt, args.no_header, out, limit, seed); reader.destroy_query_data_set(rs); if (wret != 0) { - err << "Error: failed to read rows: " << query_error_text(wret) << "\n"; + err << "Error: failed to read rows: " << error_code_message(wret) + << "\n"; return kExitRuntime; } return kExitOk; diff --git a/cpp/tools/commands/cmd_schema.cc b/cpp/tools/commands/cmd_schema.cc index 734da1933..7ca5c5e05 100644 --- a/cpp/tools/commands/cmd_schema.cc +++ b/cpp/tools/commands/cmd_schema.cc @@ -41,19 +41,21 @@ void write_table_schema_rows(const ParsedArgs& args, if (!args.table.empty() && schema->get_table_name() != args.table) { continue; } - std::vector names = schema->get_measurement_names(); - std::vector types = schema->get_data_types(); - for (size_t i = 0; i < names.size(); ++i) { + for (const auto& ms : schema->get_measurement_schemas()) { + if (!ms) { + continue; + } + const std::string& name = ms->measurement_name_; if (!args.measurements.empty() && std::find(args.measurements.begin(), args.measurements.end(), - names[i]) == args.measurements.end()) { + name) == args.measurements.end()) { continue; } - const common::TSDataType type = - i < types.size() ? types[i] : common::INVALID_DATATYPE; - w.write({schema->get_table_name(), names[i], tsdatatype_name(type), - "", ""}, - {false, false, false, true, true}); + w.write({schema->get_table_name(), name, + tsdatatype_name(ms->data_type_), + tsencoding_name(ms->encoding_), + compression_name(ms->compression_type_)}, + {false, false, false, false, false}); } } } diff --git a/cpp/tools/commands/cmd_write.cc b/cpp/tools/commands/cmd_write.cc index 35c07d3d0..a9ce8d81f 100644 --- a/cpp/tools/commands/cmd_write.cc +++ b/cpp/tools/commands/cmd_write.cc @@ -22,10 +22,12 @@ #include #include #include +#include #include #include #include #include +#include #include #include "cli/cli_args.h" @@ -188,12 +190,26 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, std::vector types; std::vector cats; std::vector col_schemas; - for (const ColumnDef& d : columns) { + std::vector tag_idx; + for (size_t j = 0; j < columns.size(); ++j) { + const ColumnDef& d = columns[j]; names.push_back(d.name); types.push_back(d.type); cats.push_back(d.category); col_schemas.push_back(common::ColumnSchema( d.name, d.type, common::UNCOMPRESSED, common::PLAIN, d.category)); + if (d.category == common::ColumnCategory::TAG) { + tag_idx.push_back(j); + } + } + + // Creating the output truncates it; refuse to clobber the input we are + // still reading from, which would otherwise silently destroy the source + // data. + if (!args.file.empty() && args.file != "-" && args.output == args.file) { + err << "Error: --output is the same as the input file: " << args.output + << "\n"; + return kExitUsage; } storage::WriteFile file; @@ -215,6 +231,11 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, long long total_rows = 0; std::vector batch; batch.reserve(kBatch); + // The table writer requires strictly increasing timestamps per device, and + // a device is identified by its tag-column values. Track the last timestamp + // seen for each device so out-of-order input is rejected with a clear, + // located message instead of an opaque write failure. + std::unordered_map last_ts_by_device; auto flush_batch = [&]() -> bool { if (batch.empty()) { @@ -235,8 +256,10 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, } } } - if (writer->write_table(tablet) != 0) { - err << "Error: write_table failed\n"; + int wt = writer->write_table(tablet); + if (wt != 0) { + err << "Error: failed to write rows: " << error_code_message(wt) + << "\n"; return false; } total_rows += static_cast(batch.size()); @@ -270,6 +293,23 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, r.line_no = line_no; r.timestamp = static_cast(ts); r.cells.assign(fields.begin() + 1, fields.end()); + + std::string device_key; + for (size_t k : tag_idx) { + device_key += r.cells[k]; + device_key.push_back('\0'); + } + auto seen = last_ts_by_device.find(device_key); + if (seen != last_ts_by_device.end() && r.timestamp <= seen->second) { + err << "Error: timestamps must be strictly increasing per device " + "(line " + << line_no << ": " << r.timestamp << " <= previous " + << seen->second << ")\n"; + rc = kExitRuntime; + break; + } + last_ts_by_device[device_key] = r.timestamp; + batch.push_back(r); if (batch.size() >= kBatch && !flush_batch()) { rc = kExitRuntime; @@ -282,9 +322,18 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, } if (rc == kExitOk) { - if (writer->flush() != 0 || writer->close() != 0) { - err << "Error: flush/close failed\n"; + int fr = writer->flush(); + if (fr != 0) { + err << "Error: failed to flush output: " << error_code_message(fr) + << "\n"; rc = kExitRuntime; + } else { + int cr = writer->close(); + if (cr != 0) { + err << "Error: failed to close output: " + << error_code_message(cr) << "\n"; + rc = kExitRuntime; + } } } else { writer->close(); @@ -292,7 +341,11 @@ int cmd_write(const ParsedArgs& args, std::ostream& /*out*/, delete writer; delete schema; - if (rc == kExitOk && args.verbose) { + if (rc != kExitOk) { + // The import failed; do not leave a partial/corrupt .tsfile behind. + file.close(); + std::remove(args.output.c_str()); + } else if (args.verbose) { err << "wrote " << total_rows << " rows to " << args.output << "\n"; } return rc; diff --git a/cpp/tools/commands/row_query.cc b/cpp/tools/commands/row_query.cc index 308ebca81..2ea247ad3 100644 --- a/cpp/tools/commands/row_query.cc +++ b/cpp/tools/commands/row_query.cc @@ -96,17 +96,18 @@ int run_row_query(const ParsedArgs& args, storage::TsFileReader& reader, } if (qret != 0 || rs == nullptr) { - err << "Error: query failed: " << query_error_text(qret) << "\n"; + err << "Error: query failed: " << error_code_message(qret) << "\n"; if (rs != nullptr) { reader.destroy_query_data_set(rs); } return kExitRuntime; } - int wret = write_result_set(rs, fmt, args.no_header, out, offset, limit); + int wret = emit_result_set(rs, fmt, args.no_header, out, offset, limit); reader.destroy_query_data_set(rs); if (wret != 0) { - err << "Error: failed to read rows: " << query_error_text(wret) << "\n"; + err << "Error: failed to read rows: " << error_code_message(wret) + << "\n"; return kExitRuntime; } return kExitOk; diff --git a/cpp/tools/format/input_format.cc b/cpp/tools/format/input_format.cc index 789ee153e..bb04a202a 100644 --- a/cpp/tools/format/input_format.cc +++ b/cpp/tools/format/input_format.cc @@ -131,6 +131,12 @@ bool parse_columns_spec(const std::string& spec, std::vector& out, error = "bad category '" + parts[2] + "' (want tag|field)"; return false; } + for (const ColumnDef& prev : out) { + if (prev.name == def.name) { + error = "duplicate column name '" + def.name + "'"; + return false; + } + } out.push_back(def); } return true; diff --git a/cpp/tools/format/output_format.cc b/cpp/tools/format/output_format.cc index fb753689f..32f0dbed1 100644 --- a/cpp/tools/format/output_format.cc +++ b/cpp/tools/format/output_format.cc @@ -23,8 +23,55 @@ #include #include +#include "utils/errno_define.h" + namespace tsfile_cli { +const char* error_code_message(int code) { + switch (code) { + case common::E_OOM: + return "out of memory"; + case common::E_NOT_EXIST: + return "not found"; + case common::E_INVALID_ARG: + return "invalid argument"; + case common::E_OUT_OF_RANGE: + return "value out of range"; + case common::E_OUT_OF_ORDER: + return "data is out of order"; + case common::E_FILE_OPEN_ERR: + return "cannot open file"; + case common::E_FILE_WRITE_ERR: + return "file write error"; + case common::E_FILE_READ_ERR: + return "file read error"; + case common::E_TSFILE_CORRUPTED: + return "file is corrupted"; + case common::E_INVALID_PATH: + return "invalid path"; + case common::E_DEVICE_NOT_EXIST: + return "device does not exist"; + case common::E_MEASUREMENT_NOT_EXIST: + return "measurement does not exist"; + case common::E_TABLE_NOT_EXIST: + return "table does not exist"; + case common::E_COLUMN_NOT_EXIST: + return "column does not exist"; + case common::E_INVALID_QUERY: + return "invalid query"; + case common::E_TYPE_NOT_SUPPORTED: + return "data type not supported"; + case common::E_TYPE_NOT_MATCH: + return "data type mismatch"; + case common::E_ENCODE_ERR: + return "failed to encode data"; + case common::E_DECODE_ERR: + return "failed to decode data"; + default: + return "internal error"; + } +} + OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty) { switch (f) { case ParsedArgs::Format::kCsv: @@ -196,7 +243,7 @@ RowWriter::RowWriter(std::ostream& out, OutputFormat fmt, types_(std::move(types)), no_header_(no_header) {} -bool RowWriter::is_numeric(size_t col) const { +bool RowWriter::emits_json_bare(size_t col) const { if (col >= types_.size()) { return false; } @@ -248,7 +295,7 @@ void RowWriter::write(const std::vector& cells, out_ << "\"" << json_escape(header_[i]) << "\":"; if (i < is_null.size() && is_null[i]) { out_ << "null"; - } else if (is_numeric(i)) { + } else if (emits_json_bare(i)) { out_ << (i < cells.size() ? cells[i] : "null"); } else { out_ << "\"" << json_escape(i < cells.size() ? cells[i] : "") diff --git a/cpp/tools/format/output_format.h b/cpp/tools/format/output_format.h index c4fa14885..c7efdd190 100644 --- a/cpp/tools/format/output_format.h +++ b/cpp/tools/format/output_format.h @@ -33,6 +33,10 @@ enum class OutputFormat { kCsv, kTsv, kJson, kTable }; OutputFormat resolve_format(ParsedArgs::Format f, bool stdout_is_tty); +// Translate a storage-engine error code (common::E_*) into a human-readable +// phrase so CLI diagnostics carry meaning instead of a bare numeric code. +const char* error_code_message(int code); + const char* tsdatatype_name(common::TSDataType t); const char* tsencoding_name(common::TSEncoding e); const char* compression_name(common::CompressionType c); @@ -52,7 +56,7 @@ class RowWriter { private: void ensure_header(); - bool is_numeric(size_t col) const; + bool emits_json_bare(size_t col) const; std::ostream& out_; OutputFormat fmt_; diff --git a/cpp/tools/format/result_set_format.cc b/cpp/tools/format/result_set_format.cc index 81434c78a..104ec5ee2 100644 --- a/cpp/tools/format/result_set_format.cc +++ b/cpp/tools/format/result_set_format.cc @@ -29,45 +29,6 @@ namespace tsfile_cli { -const char* query_error_text(int code) { - switch (code) { - case common::E_OOM: - return "out of memory"; - case common::E_NOT_EXIST: - return "not found"; - case common::E_INVALID_ARG: - return "invalid argument"; - case common::E_OUT_OF_RANGE: - return "value out of range"; - case common::E_FILE_OPEN_ERR: - return "cannot open file"; - case common::E_FILE_READ_ERR: - return "file read error"; - case common::E_TSFILE_CORRUPTED: - return "file is corrupted"; - case common::E_INVALID_PATH: - return "invalid path"; - case common::E_DEVICE_NOT_EXIST: - return "device does not exist"; - case common::E_MEASUREMENT_NOT_EXIST: - return "measurement does not exist"; - case common::E_TABLE_NOT_EXIST: - return "table does not exist"; - case common::E_COLUMN_NOT_EXIST: - return "column does not exist"; - case common::E_INVALID_QUERY: - return "invalid query"; - case common::E_TYPE_NOT_SUPPORTED: - return "data type not supported"; - case common::E_TYPE_NOT_MATCH: - return "data type mismatch"; - case common::E_DECODE_ERR: - return "failed to decode data"; - default: - return "internal error"; - } -} - std::string cell_to_string(storage::ResultSet* rs, uint32_t i, common::TSDataType type) { std::ostringstream ss; @@ -105,8 +66,8 @@ std::string cell_to_string(storage::ResultSet* rs, uint32_t i, } } -int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, - std::ostream& out, long long offset, long long limit) { +int emit_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out, long long offset, long long limit) { auto meta = rs->get_metadata(); const uint32_t ncol = meta->get_column_count(); std::vector header; @@ -172,9 +133,9 @@ BufferedRow read_current_row(storage::ResultSet* rs, } // namespace -int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, - bool no_header, std::ostream& out, long long limit, - unsigned long long seed) { +int emit_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, + bool no_header, std::ostream& out, long long limit, + unsigned long long seed) { auto meta = rs->get_metadata(); const uint32_t ncol = meta->get_column_count(); std::vector header; diff --git a/cpp/tools/format/result_set_format.h b/cpp/tools/format/result_set_format.h index 1323b004f..a9fb2a4b1 100644 --- a/cpp/tools/format/result_set_format.h +++ b/cpp/tools/format/result_set_format.h @@ -29,20 +29,16 @@ namespace tsfile_cli { -// Translate a storage-engine error code into a human-readable phrase so CLI -// diagnostics carry meaning instead of a bare numeric code. -const char* query_error_text(int code); - std::string cell_to_string(storage::ResultSet* rs, uint32_t col_index, common::TSDataType type); -int write_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, - std::ostream& out, long long offset = 0, - long long limit = -1); +int emit_result_set(storage::ResultSet* rs, OutputFormat fmt, bool no_header, + std::ostream& out, long long offset = 0, + long long limit = -1); -int write_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, - bool no_header, std::ostream& out, long long limit, - unsigned long long seed); +int emit_result_set_sampled(storage::ResultSet* rs, OutputFormat fmt, + bool no_header, std::ostream& out, long long limit, + unsigned long long seed); } // namespace tsfile_cli diff --git a/cpp/tools/skills/tsfile-cli/SKILL.md b/cpp/tools/skills/tsfile-cli/SKILL.md index afd16a901..080674d07 100644 --- a/cpp/tools/skills/tsfile-cli/SKILL.md +++ b/cpp/tools/skills/tsfile-cli/SKILL.md @@ -37,7 +37,7 @@ Single pipe-friendly C++ binary to inspect **and** import `.tsfile` (TsFile's an ## Read -`tsfile-cli [opts] ` · `tsfile-cli --help | --version | help ` +`tsfile-cli [opts] ` · `tsfile-cli --help | --version | help` | cmd | output | scans pages | |---|---|---| @@ -82,11 +82,14 @@ TYPE ∈ { BOOLEAN, INT32, INT64, FLOAT, DOUBLE, STRING, TEXT } input := file | '-' | omitted # '-' or omitted = stdin ``` -- `-o` required (overwritten); `-f` default csv (json/table → usage error). +- `-o` required (overwritten, must differ from input); `-f` default csv (json/table → usage error). - header: first line skipped by default · `--no-header` if none · `--header-match` validates - header names vs `--columns`. + header names vs `--columns` (mutually exclusive with `--no-header`). - empty cell = null · `--table` is lower-cased · success **silent**, `-v` → `wrote N rows to ` on stderr. -- exit: `1` usage (missing `--table`/`--columns`/`-o`, bad spec, read-only flag) · `2` IO open · `3` row (field-count / type / header mismatch). +- **timestamps must be strictly increasing per device** (device = tag-column values); rows for + different tags may interleave/reuse timestamps. Out-of-order input → error with line number. +- a failed import deletes its partial output (no half-written `.tsfile` left behind). +- exit: `1` usage (missing `--table`/`--columns`/`-o`, bad spec, dup column, read-only flag) · `2` IO open · `3` row (field-count / type / overflow / timestamp-order / header mismatch). ```sh printf 'time,id1,s1\n0,dev,0\n1,dev,10\n' \ From 7878aa2fef103cc0643727d2efad52b77426921e Mon Sep 17 00:00:00 2001 From: spricoder Date: Fri, 5 Jun 2026 22:06:41 +0800 Subject: [PATCH 39/41] Rename tsfile-cli stat_table.{h,cc} to statistics.{h,cc} The file holds generic statistics helpers (collect_series_stats, collect_file_summary, statistic_value_cells) used by stats/count/meta for both the tree and table models. "table" wrongly implied the table model; "statistics" describes what it actually provides. --- cpp/test/tools/{stat_table_test.cc => statistics_test.cc} | 6 +++--- cpp/tools/commands/cmd_count.cc | 2 +- cpp/tools/commands/cmd_meta.cc | 2 +- cpp/tools/commands/cmd_stats.cc | 2 +- cpp/tools/commands/{stat_table.cc => statistics.cc} | 2 +- cpp/tools/commands/{stat_table.h => statistics.h} | 6 +++--- 6 files changed, 10 insertions(+), 10 deletions(-) rename cpp/test/tools/{stat_table_test.cc => statistics_test.cc} (91%) rename cpp/tools/commands/{stat_table.cc => statistics.cc} (99%) rename cpp/tools/commands/{stat_table.h => statistics.h} (95%) diff --git a/cpp/test/tools/stat_table_test.cc b/cpp/test/tools/statistics_test.cc similarity index 91% rename from cpp/test/tools/stat_table_test.cc rename to cpp/test/tools/statistics_test.cc index 7beb58c13..a151fdc3c 100644 --- a/cpp/test/tools/stat_table_test.cc +++ b/cpp/test/tools/statistics_test.cc @@ -17,13 +17,13 @@ * under the License. */ -#include "commands/stat_table.h" +#include "commands/statistics.h" #include #include "common/statistic.h" -TEST(StatTableTest, Int64StatisticCellsContainValueSummaries) { +TEST(StatisticsTest, Int64StatisticCellsContainValueSummaries) { storage::Int64Statistic st; st.update(1, static_cast(10)); st.update(3, static_cast(30)); @@ -37,7 +37,7 @@ TEST(StatTableTest, Int64StatisticCellsContainValueSummaries) { std::vector({false, false, false, false, false})); } -TEST(StatTableTest, BooleanStatisticLeavesMinMaxNull) { +TEST(StatisticsTest, BooleanStatisticLeavesMinMaxNull) { storage::BooleanStatistic st; st.update(1, true); st.update(2, false); diff --git a/cpp/tools/commands/cmd_count.cc b/cpp/tools/commands/cmd_count.cc index 7cb592253..9480744c6 100644 --- a/cpp/tools/commands/cmd_count.cc +++ b/cpp/tools/commands/cmd_count.cc @@ -22,7 +22,7 @@ #include "cli/exit_codes.h" #include "commands/commands.h" -#include "commands/stat_table.h" +#include "commands/statistics.h" namespace tsfile_cli { diff --git a/cpp/tools/commands/cmd_meta.cc b/cpp/tools/commands/cmd_meta.cc index 877e04683..dd70029f3 100644 --- a/cpp/tools/commands/cmd_meta.cc +++ b/cpp/tools/commands/cmd_meta.cc @@ -21,7 +21,7 @@ #include "cli/exit_codes.h" #include "commands/commands.h" -#include "commands/stat_table.h" +#include "commands/statistics.h" #include "reader/tsfile_reader.h" namespace tsfile_cli { diff --git a/cpp/tools/commands/cmd_stats.cc b/cpp/tools/commands/cmd_stats.cc index 1af68e298..898ca469b 100644 --- a/cpp/tools/commands/cmd_stats.cc +++ b/cpp/tools/commands/cmd_stats.cc @@ -22,7 +22,7 @@ #include "cli/exit_codes.h" #include "commands/commands.h" -#include "commands/stat_table.h" +#include "commands/statistics.h" namespace tsfile_cli { diff --git a/cpp/tools/commands/stat_table.cc b/cpp/tools/commands/statistics.cc similarity index 99% rename from cpp/tools/commands/stat_table.cc rename to cpp/tools/commands/statistics.cc index d09d4bd6a..e9bb8d6e5 100644 --- a/cpp/tools/commands/stat_table.cc +++ b/cpp/tools/commands/statistics.cc @@ -17,7 +17,7 @@ * under the License. */ -#include "commands/stat_table.h" +#include "commands/statistics.h" #include #include diff --git a/cpp/tools/commands/stat_table.h b/cpp/tools/commands/statistics.h similarity index 95% rename from cpp/tools/commands/stat_table.h rename to cpp/tools/commands/statistics.h index e79bd20c0..031b5b4aa 100644 --- a/cpp/tools/commands/stat_table.h +++ b/cpp/tools/commands/statistics.h @@ -17,8 +17,8 @@ * under the License. */ -#ifndef TSFILE_CLI_STAT_TABLE_H -#define TSFILE_CLI_STAT_TABLE_H +#ifndef TSFILE_CLI_STATISTICS_H +#define TSFILE_CLI_STATISTICS_H #include #include @@ -66,4 +66,4 @@ FileSummary collect_file_summary(const ParsedArgs& args, } // namespace tsfile_cli -#endif // TSFILE_CLI_STAT_TABLE_H +#endif // TSFILE_CLI_STATISTICS_H From 5509515e63339c298b1a5cc42ad32160bb1120ec Mon Sep 17 00:00:00 2001 From: spricoder Date: Fri, 5 Jun 2026 22:06:56 +0800 Subject: [PATCH 40/41] Add tests for tsfile-cli review hardening Covers per-device timestamp-order rejection (including across batch flushes), --output anti-alias and unlink-on-failure, large streaming round-trip, numeric overflow detection, duplicate-column rejection, flag-applicability errors, the leading-option error, error_code_message mapping, --help with a positional file, and table-model schema encoding/compression. --- cpp/test/tools/cli_args_test.cc | 11 + cpp/test/tools/command_e2e_test.cc | 309 +++++++++++++++++++++++++++ cpp/test/tools/input_format_test.cc | 9 + cpp/test/tools/output_format_test.cc | 23 ++ 4 files changed, 352 insertions(+) diff --git a/cpp/test/tools/cli_args_test.cc b/cpp/test/tools/cli_args_test.cc index 08042f741..614329463 100644 --- a/cpp/test/tools/cli_args_test.cc +++ b/cpp/test/tools/cli_args_test.cc @@ -50,6 +50,17 @@ TEST(RunCliTest, UnknownCommandIsUsageError) { EXPECT_NE(err.str().find("Unknown command"), std::string::npos); } +TEST(RunCliTest, LeadingOptionBeforeCommandIsClearError) { + std::ostringstream out; + std::ostringstream err; + int code = + tsfile_cli::run_cli({"-f", "json", "meta", "data.tsfile"}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("command must come before options"), + std::string::npos) + << err.str(); +} + TEST(ParseArgsTest, CommandAndFilePositional) { auto p = tsfile_cli::parse_args({"ls", "data.tsfile"}); EXPECT_TRUE(p.error.empty()); diff --git a/cpp/test/tools/command_e2e_test.cc b/cpp/test/tools/command_e2e_test.cc index 0bff9e75f..d06e909e0 100644 --- a/cpp/test/tools/command_e2e_test.cc +++ b/cpp/test/tools/command_e2e_test.cc @@ -248,3 +248,312 @@ TEST(CliE2E, WriteMissingColumnsIsUsageError) { EXPECT_EQ(code, 1); EXPECT_NE(err.str().find("--columns"), std::string::npos); } + +namespace { +bool path_exists(const std::string& p) { + std::ifstream in(p.c_str()); + return in.good(); +} +} // namespace + +TEST(CliE2E, WriteRejectsOutOfOrderTimestampsAndLeavesNoOutput) { + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_ooo", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,s1\n5,50\n1,10\n"; + } + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_ooo_out", ".tsfile"); + + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"write", "--table", "t", "--columns", + "s1:INT64:field", "-o", out_path, csv}, + out, err); + EXPECT_EQ(code, 3); + EXPECT_NE(err.str().find("strictly increasing"), std::string::npos) + << err.str(); + EXPECT_NE(err.str().find("line 3"), std::string::npos) << err.str(); + EXPECT_FALSE(path_exists(out_path)) << "failed import must leave no output"; + + std::remove(csv.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, WriteAllowsSameTimestampAcrossDevices) { + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_md", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,id,s1\n1,A,10\n1,B,20\n2,A,30\n"; + } + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_md_out", ".tsfile"); + + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"write", "--table", "t", "--columns", "id:STRING:tag,s1:INT64:field", + "-o", out_path, csv}, + out, err); + EXPECT_EQ(code, 0) << err.str(); + + std::ostringstream cout_; + std::ostringstream cerr_; + tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); + EXPECT_NE(cout_.str().find("total\t\t3"), std::string::npos) << cout_.str(); + + std::remove(csv.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, WriteRejectsOutputEqualsInput) { + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_alias", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,s1\n0,1\n"; + } + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"write", "--table", "t", "--columns", + "s1:INT64:field", "-o", csv, csv}, + out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("same as the input"), std::string::npos) + << err.str(); + // The input file must be untouched. + std::ifstream in(csv.c_str()); + std::stringstream buf; + buf << in.rdbuf(); + EXPECT_EQ(buf.str(), "time,s1\n0,1\n"); + + std::remove(csv.c_str()); +} + +TEST(CliE2E, WriteFailureOnBadValueLeavesNoOutput) { + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_badval", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,s1\n0,notanumber\n"; + } + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_badval_out", ".tsfile"); + + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"write", "--table", "t", "--columns", + "s1:INT64:field", "-o", out_path, csv}, + out, err); + EXPECT_EQ(code, 3); + EXPECT_FALSE(path_exists(out_path)); + + std::remove(csv.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, WriteRejectsDuplicateColumnNames) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"write", "--table", "t", "--columns", "s1:INT64:field,s1:INT64:field", + "-o", "x.tsfile", "-"}, + out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("duplicate column"), std::string::npos) + << err.str(); +} + +TEST(CliE2E, WriteRejectsHeaderMatchWithNoHeader) { + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli( + {"write", "--table", "t", "--columns", "s1:INT64:field", "-o", + "x.tsfile", "--no-header", "--header-match", "-"}, + out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("--header-match"), std::string::npos) << err.str(); +} + +TEST(CliE2E, ReadRejectsWriteOnlyFlag) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"ls", "-o", "x.tsfile", f.path}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("only valid for write"), std::string::npos) + << err.str(); +} + +TEST(CliE2E, MetaRejectsDeviceScopeFlag) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"meta", "-d", "dev", f.path}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("not valid for meta"), std::string::npos) + << err.str(); +} + +TEST(CliE2E, SchemaTableShowsEncodingAndCompression) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"schema", "-f", "tsv", f.path}, out, err); + EXPECT_EQ(code, 0); + // Table-model schema must report real (non-empty) encoding and compression + // rather than blanks. The INT64 field encodes as TS_2DIFF; the compression + // is the engine default (build-dependent) but must not be empty. + EXPECT_NE(out.str().find("\ts1\tINT64\tTS_2DIFF\t"), std::string::npos) + << out.str(); + EXPECT_EQ(out.str().find("\ts1\tINT64\tTS_2DIFF\t\n"), std::string::npos) + << out.str(); +} + +namespace { +// Run a one-row `write` whose single value cell is `value`, declaring the +// column as `type`. Returns the exit code; captures stderr into `err`. +int write_one_value(const std::string& type, const std::string& value, + std::string& err_out) { + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_ovf", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,s1\n0," << value << "\n"; + } + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_ovf_out", ".tsfile"); + std::ostringstream out; + std::ostringstream err; + int code = + tsfile_cli::run_cli({"write", "--table", "t", "--columns", + "s1:" + type + ":field", "-o", out_path, csv}, + out, err); + err_out = err.str(); + std::remove(csv.c_str()); + std::remove(out_path.c_str()); + return code; +} +} // namespace + +TEST(CliE2E, WriteRejectsInt32Overflow) { + std::string err; + EXPECT_EQ(write_one_value("INT32", "3000000000", err), 3); + EXPECT_NE(err.find("INT32 out of range"), std::string::npos) << err; +} + +TEST(CliE2E, WriteAcceptsInt32Boundary) { + std::string err; + EXPECT_EQ(write_one_value("INT32", "2147483647", err), 0) << err; +} + +TEST(CliE2E, WriteRejectsInt64Overflow) { + std::string err; + EXPECT_EQ(write_one_value("INT64", "99999999999999999999999999", err), 3); + EXPECT_NE(err.find("INT64 out of range"), std::string::npos) << err; +} + +TEST(CliE2E, WriteRejectsDoubleOverflow) { + std::string err; + EXPECT_EQ(write_one_value("DOUBLE", "1e400", err), 3); + EXPECT_NE(err.find("DOUBLE out of range"), std::string::npos) << err; +} + +TEST(CliE2E, WriteRejectsNonNumericInt64) { + std::string err; + EXPECT_EQ(write_one_value("INT64", "12abc", err), 3); + EXPECT_NE(err.find("bad INT64"), std::string::npos) << err; +} + +TEST(CliE2E, WriteRejectsOutOfOrderAcrossBatches) { + // More than one 1024-row batch of ascending rows, then a violating + // timestamp. The first batch is already flushed by the time the bad row is + // read, so this proves both that per-device tracking survives a batch flush + // and that the already-written output is removed on failure. + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_xbatch", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,s1\n"; + for (int i = 1; i <= 1100; ++i) { + o << i << "," << i << "\n"; + } + o << "500,999\n"; // <= the last timestamp for the tag-less device + } + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_xbatch_out", ".tsfile"); + + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"write", "--table", "t", "--columns", + "s1:INT64:field", "-o", out_path, csv}, + out, err); + EXPECT_EQ(code, 3); + EXPECT_NE(err.str().find("strictly increasing"), std::string::npos) + << err.str(); + EXPECT_FALSE(path_exists(out_path)); + + std::remove(csv.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, WriteStreamsLargeInputRoundTrips) { + std::string csv = + tsfile_cli_test::unique_temp_path("tsfile_cli_large", ".csv"); + { + std::ofstream o(csv.c_str()); + o << "time,s1\n"; + for (int i = 1; i <= 3000; ++i) { + o << i << "," << (i * 2) << "\n"; + } + } + std::string out_path = + tsfile_cli_test::unique_temp_path("tsfile_cli_large_out", ".tsfile"); + + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"write", "--table", "big", "--columns", + "s1:INT64:field", "-o", out_path, csv}, + out, err); + EXPECT_EQ(code, 0) << err.str(); + + std::ostringstream cout_; + std::ostringstream cerr_; + tsfile_cli::run_cli({"count", "-f", "tsv", out_path}, cout_, cerr_); + EXPECT_NE(cout_.str().find("\ts1\t3000"), std::string::npos) << cout_.str(); + + std::remove(csv.c_str()); + std::remove(out_path.c_str()); +} + +TEST(CliE2E, HelpWithPositionalFilePrintsUsage) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"cat", "--help", f.path}, out, err); + EXPECT_EQ(code, 0); + EXPECT_NE(out.str().find("Usage:"), std::string::npos) << out.str(); +} + +TEST(CliE2E, StatsRejectsRowOnlyFlag) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"stats", "--start", "1", f.path}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("only valid for head/cat/sample"), + std::string::npos) + << err.str(); +} + +TEST(CliE2E, LsRejectsMeasurementsFlag) { + Fixture f; + std::ostringstream out; + std::ostringstream err; + int code = tsfile_cli::run_cli({"ls", "-m", "s1", f.path}, out, err); + EXPECT_EQ(code, 1); + EXPECT_NE(err.str().find("not valid for ls"), std::string::npos) + << err.str(); +} diff --git a/cpp/test/tools/input_format_test.cc b/cpp/test/tools/input_format_test.cc index f73a72c5c..08c4b5f91 100644 --- a/cpp/test/tools/input_format_test.cc +++ b/cpp/test/tools/input_format_test.cc @@ -51,6 +51,15 @@ TEST(InputFormatTest, ParseColumnsSpecErrors) { EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64:bogus", cols, err)); EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64", cols, err)); EXPECT_FALSE(tsfile_cli::parse_columns_spec("", cols, err)); + EXPECT_FALSE(tsfile_cli::parse_columns_spec(":INT64:field", cols, err)); +} + +TEST(InputFormatTest, ParseColumnsSpecRejectsDuplicateNames) { + std::vector cols; + std::string err; + EXPECT_FALSE(tsfile_cli::parse_columns_spec("s1:INT64:field,s1:INT32:field", + cols, err)); + EXPECT_NE(err.find("duplicate column"), std::string::npos) << err; } TEST(InputFormatTest, SplitLineTsv) { diff --git a/cpp/test/tools/output_format_test.cc b/cpp/test/tools/output_format_test.cc index 6acf865f9..abaa47dc9 100644 --- a/cpp/test/tools/output_format_test.cc +++ b/cpp/test/tools/output_format_test.cc @@ -25,11 +25,34 @@ #include #include "common/db_common.h" +#include "utils/errno_define.h" using tsfile_cli::OutputFormat; using tsfile_cli::ParsedArgs; using tsfile_cli::RowWriter; +TEST(ErrorCodeMessageTest, KnownCodesMapToReadablePhrases) { + EXPECT_STREQ(tsfile_cli::error_code_message(common::E_TABLE_NOT_EXIST), + "table does not exist"); + EXPECT_STREQ(tsfile_cli::error_code_message(common::E_DEVICE_NOT_EXIST), + "device does not exist"); + EXPECT_STREQ( + tsfile_cli::error_code_message(common::E_MEASUREMENT_NOT_EXIST), + "measurement does not exist"); + EXPECT_STREQ(tsfile_cli::error_code_message(common::E_TSFILE_CORRUPTED), + "file is corrupted"); + EXPECT_STREQ(tsfile_cli::error_code_message(common::E_OUT_OF_ORDER), + "data is out of order"); + EXPECT_STREQ(tsfile_cli::error_code_message(common::E_DECODE_ERR), + "failed to decode data"); +} + +TEST(ErrorCodeMessageTest, UnknownCodeFallsBackToInternalError) { + EXPECT_STREQ(tsfile_cli::error_code_message(987654), "internal error"); + // The phrase is always a non-empty, printable string (never a bare code). + EXPECT_GT(std::string(tsfile_cli::error_code_message(-1)).size(), 0u); +} + TEST(ResolveFormatTest, AutoUsesTableOnTtyTsvOtherwise) { EXPECT_EQ(tsfile_cli::resolve_format(ParsedArgs::Format::kAuto, true), OutputFormat::kTable); From a480e0f1a70d0e4bf8691331854d4de19ec5e6ad Mon Sep 17 00:00:00 2001 From: spricoder Date: Sat, 6 Jun 2026 10:05:33 +0800 Subject: [PATCH 41/41] Simplify scoped-command check in tsfile-cli flag validation Derive the scoped flag from is_row instead of re-listing the head/cat/sample command names, so the row-command set lives in one place. --- cpp/tools/cli/run_cli.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cpp/tools/cli/run_cli.cc b/cpp/tools/cli/run_cli.cc index 7c10fac63..85bfba8c6 100644 --- a/cpp/tools/cli/run_cli.cc +++ b/cpp/tools/cli/run_cli.cc @@ -139,8 +139,8 @@ bool validate_write_flags(const ParsedArgs& p, std::ostream& err) { bool validate_read_flag_applicability(const ParsedArgs& p, std::ostream& err) { const std::string& c = p.command; const bool is_row = (c == "head" || c == "cat" || c == "sample"); - const bool scoped = (c == "schema" || c == "stats" || c == "count" || - c == "head" || c == "cat" || c == "sample"); + const bool scoped = + is_row || c == "schema" || c == "stats" || c == "count"; if (!p.output.empty()) { err << "Error: -o/--output is only valid for write\n";