-
Notifications
You must be signed in to change notification settings - Fork 105
Expand file tree
/
Copy pathlens.rs
More file actions
242 lines (207 loc) · 10.9 KB
/
lens.rs
File metadata and controls
242 lines (207 loc) · 10.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
static USAGE: &str = r#"
Explore tabular data files interactively using the csvlens (https://github.com/YS-L/csvlens) engine.
If the polars feature is enabled, lens can browse tabular data in Arrow, Avro/IPC, Parquet, JSON (JSON Array)
and JSONL files. It also automatically decompresses csv/tsv/tab/ssv files using the gz,zlib & zst
compression formats (e.g. data.csv.gz, data.tsv.zlib, data.tab.gz & data.ssv.zst).
If the polars feature is not enabled, lens can only browse CSV dialects (CSV, TSV, Tab, SSV) and
its snappy-compressed variants (CSV.sz, TSV.sz, Tab.sz & SSV.sz).
Press 'q' to exit. Press '?' for help.
Usage:
qsv lens [options] [<input>]
qsv lens --help
Examples:
Automatically choose delimiter based on the file extension
$ qsv lens data.csv // comma-separated
$ qsv lens data.tsv // Tab-separated
$ qsv lens data.tab // Tab-separated
$ qsv lens data.ssv // Semicolon-separated
# custom delimiter
$ qsv lens --delimiter '|' data.csv
Auto-decompresses several compression formats:
$ qsv lens data.csv.sz // Snappy-compressed CSV
$ qsv lens data.tsv.sz // Snappy-compressed Tab-separated
# additional compression formats below require polars feature
$ qsv lens data.csv.gz // Gzipped CSV
$ qsv lens data.tsv.zlib // Zlib-compressed Tab-separated
$ qsv lens data.tab.zst // Zstd-compressed Tab-separated
$ qsv lens data.ssv.zst // Zstd-compressed Semicolon-separated
Explore tabular data in other formats (if polars feature is enabled)
$ qsv lens data.parquet // Parquet
$ qsv lens data.jsonl // JSON Lines
$ qsv lens data.json // JSON - will only work with a JSON Array
$ qsv lens data.avro // Avro
Prompt the user to select a column to display. Once selected,
exit with the value of the City column for the selected row sent to stdout
$ qsv lens --prompt 'Select City:' --echo-column 'City' data.csv
Only show rows that contain "NYPD"
$ qsv lens --filter NYPD data.csv
# Show rows that contain "nois" case insensitive (for noise, noisy, noisier, etc.)
$ qsv lens --filter nois --ignore-case data.csv
Find and highlight matches in the data
$ qsv lens --find 'New York' data.csv
Find and highlight cells that have all numeric values in a column.
$ qsv lens --find '^\d+$' data.csv
lens options:
-d, --delimiter <char> Delimiter character (comma by default)
"auto" to auto-detect the delimiter
-t, --tab-separated Use tab separation. Shortcut for -d '\t'
--no-headers Do not interpret the first row as headers
--columns <regex> Use this regex to select columns to display by default.
e.g. "col1|col2|col3" to select columns "col1", "col2" and "col3"
and also columns like "col1_1", "col22" and "col3-more".
--filter <regex> Use this regex to filter rows to display by default.
The regex is matched against each cell in every column.
e.g. "val1|val2" filters rows with any cells containing "val1", "val2"
or text like "my_val1" or "val234".
--find <regex> Use this regex to find and highlight matches by default.
Automatically sets --monochrome to true so the matches are easier to see.
The regex is matched against each cell in every column.
e.g. "val1|val2" highlights text containing "val1", "val2" or
longer text like "val1_ok" or "val2_error".
-i, --ignore-case Searches ignore case. Ignored if any uppercase letters
are present in the search string
-f, --freeze-columns <num> Freeze the first N columns
[default: 1]
-m, --monochrome Disable color output
-W, --wrap-mode <mode> Set the wrap mode for the output.
Valid modes are:
"words": Wrap at word boundaries
"chars": Wrap at character boundaries
"disabled": No wrapping
For convenience, the first character can be used as a shortcut:
qsv lens -W w data.csv // wrap at word boundaries
[default: disabled]
-A, --auto-reload Automatically reload the data when the file changes.
-S, --streaming-stdin Enable streaming stdin (load input as it's being piped in)
NOTE: This option only applies to stdin input.
-P, --prompt <prompt> Set a custom prompt in the status bar. Normally paired w/ --echo-column:
qsv lens --prompt 'Select City:' --echo-column 'City'
Supports ANSI escape codes for colored or styled text. When using
escape codes, ensure it's properly escaped. For example, in bash/zsh,
the $'...' syntax is used to do so:
qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m'
see https://en.wikipedia.org/wiki/ANSI_escape_code#Colors or
https://gist.github.com/fnky/458719343aabd01cfb17a3a4f7296797
for more info on ANSI escape codes.
Typing a complicated prompt on the command line can be tricky.
If the prompt starts with "file:", it's interpreted as a filepath
from which to load the prompt, e.g.
qsv lens --prompt "file:prompt.txt"
--echo-column <column_name> Print the value of this column to stdout for the selected row
--debug Show stats for debugging
Common options:
-h, --help Display this message
"#;
use std::path::PathBuf;
use csvlens::{CsvlensOptions, WrapMode, run_csvlens_with_options};
use serde::Deserialize;
use crate::{CliError, CliResult, config::Config, util};
#[derive(Deserialize)]
struct Args {
arg_input: Option<String>,
flag_delimiter: Option<String>,
flag_tab_separated: bool,
flag_no_headers: bool,
flag_columns: Option<String>,
flag_filter: Option<String>,
flag_find: Option<String>,
flag_ignore_case: bool,
flag_freeze_columns: Option<u64>,
flag_monochrome: bool,
flag_prompt: Option<String>,
flag_echo_column: Option<String>,
flag_wrap_mode: String,
flag_auto_reload: bool,
flag_debug: bool,
flag_streaming_stdin: bool,
}
pub fn run(argv: &[&str]) -> CliResult<()> {
let args: Args = util::get_args(USAGE, argv)?;
let tmpdir = tempfile::tempdir()?;
let input = if args.arg_input.as_deref().unwrap_or("-") == "-" {
// No input file (or explicit "-"): preserve stdin handling for csvlens
// by avoiding util::process_input here, which would buffer all of
// stdin into a temp file and defeat --streaming-stdin entirely.
// Config::new below maps "-" to config.path = None, so csvlens is
// ultimately invoked with filename: None (its stdin marker) and
// no_streaming_stdin controls buffer-vs-stream.
"-".to_string()
} else {
// Process input file
// support auto-decompress snappy file
// decompressed file is written to a temporary file in tmpdir
// which is automatically deleted after the command finishes
let work_input = util::process_input(
vec![PathBuf::from(args.arg_input.clone().unwrap())],
&tmpdir,
"",
)?;
work_input[0].to_string_lossy().to_string()
};
// If the prompt starts with "file:", it's interpreted as a filepath
// from which to load the prompt, e.g.
// qsv lens --prompt "file:prompt.txt"
let prompt = match args.flag_prompt {
Some(p) if p.starts_with(util::FILE_PATH_PREFIX) => {
let prompt_file = PathBuf::from(p.trim_start_matches(util::FILE_PATH_PREFIX));
Some(std::fs::read_to_string(prompt_file)?)
},
other => other,
};
// Convert the wrap mode to a WrapMode enum value.
// Accept the full names (words/chars/disabled) and their single-letter
// shortcuts (w/c/d), case-insensitive. Anything else is rejected so typos
// like `--wrap-mode wodrs` don't silently get treated as `words`.
let wrap_mode = match args.flag_wrap_mode.to_ascii_lowercase().as_str() {
"d" | "disabled" => Some(WrapMode::Disabled),
"w" | "words" => Some(WrapMode::Words),
"c" | "chars" => Some(WrapMode::Chars),
_ => {
return fail_incorrectusage_clierror!(
"Invalid --wrap-mode value '{}'. Valid modes are: words, chars, disabled.",
args.flag_wrap_mode
);
},
};
// Create a Config to:
// 1. Get the delimiter (from QSV_DEFAULT_DELIMITER env var if set)
// 2. Check if delimiter sniffing is enabled (via QSV_SNIFF_DELIMITER)
// 3. Handle special file formats like Parquet/Avro if polars is enabled
let config: Config = Config::new(Some(&input));
let input = config.path.clone().map(|p| p.to_string_lossy().to_string());
let delimiter = if let Some(delimiter) = args.flag_delimiter {
Some(delimiter)
} else {
Some((config.get_delimiter() as char).to_string())
};
let comma_separated = delimiter == Some(",".to_string());
// monochrome is true if the monochrome flag is set or the find flag is set
// we set monochrome to true if the find flag is set because the find flag
// uses color output to highlight the matches.
let monochrome = args.flag_monochrome || args.flag_find.is_some();
let options = CsvlensOptions {
filename: input,
delimiter,
tab_separated: args.flag_tab_separated,
comma_separated,
no_headers: args.flag_no_headers,
columns: args.flag_columns,
filter: args.flag_filter,
find: args.flag_find,
ignore_case: args.flag_ignore_case,
echo_column: args.flag_echo_column,
debug: args.flag_debug,
freeze_cols_offset: args.flag_freeze_columns,
color_columns: !monochrome,
prompt,
wrap_mode,
auto_reload: args.flag_auto_reload,
no_streaming_stdin: !args.flag_streaming_stdin,
};
let out = run_csvlens_with_options(options)
.map_err(|e| CliError::Other(format!("csvlens error: {e}")))?;
if let Some(selected_cell) = out {
println!("{selected_cell}");
}
Ok(())
}