-
-
Notifications
You must be signed in to change notification settings - Fork 949
feat(output): implement YAML and JSON output #1800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I think i would really prefer json output. There are more tools that would support that as input. As for the streaming problem, I think there are a couple of ways to handle that:
|
tmccombs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also have some tests, and the man page should be updated.
As well as the things I commented on above.
src/output.rs
Outdated
| // Try to convert to UTF-8 first | ||
| match std::str::from_utf8(bytes) { | ||
| Ok(utf8_str) => { | ||
| let escaped: String = utf8_str.escape_default().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think collecting into an intermediate string is strictly necessary, if PathEncoding included a lifetime based on the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collecting into an intermediate string
I'm a little bit confused, since the escape_default always allocates new memory.
In my understanding, to avoid an intermediate string, I should borrow the utf8_str and store in the PathEncoding as Cow<'a, str>, and escape it when write!().
But the PathEncoding is created per file and released after the certain file is printed (or not?), and it is a must (or not?) to .escape_default().collet() into a String when escaping, so I can't see differences between.
But I'm just a newbie to Rust and I'm pretty sure you are right :), so looking forward to your further suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you would define PathEncoding like
enum<'a> PathEncoding<'a> {
Utf8(EscapeDefault<'a>),
Bytes(&'a [u8]),
}Then this would be defined like:
fn encode_path(path: &Path) -> PathEncoding<'_> {
match path.to_str() {
Some(utf8) => PathEncoding::Utf8(utf8.escape_default()),
None => PathEncoding::Bytes(path.as_os_str().as_encoded_bytes()),
}
}And FileDetail would need a lifetime parameter as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly encourage y'all to look at how ripgrep encodes paths. And in particular, you really really really should not be using WTF-8 in your output format. From the WTF-8 spec:
Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet.
You'll also want to check how PathEncoding serializes. Sometimes just serializing a &[u8] leads to something sub-optimal (like an array of integers or something). This is why ripgrep base64 encodes data that isn't valid UTF-8.
|
Just want to throw out a reference to libxo as my favourite way for command line programs to support structured output formats. Not sure there's something like that in the Rust ecosystem. |
tmccombs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I few minor suggestions, but I think it's close.
| due to OS restrictions on the maximum length of command lines. | ||
| .TP | ||
| .BI "\-\-json " | ||
| Specify JSONL (as known as NDJSON) format to use for the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to list the fields that are included in the output.
| } | ||
|
|
||
| impl<'a, W: Write> ReceiverBuffer<'a, W> { | ||
| impl<'a, W: Write + 'static> ReceiverBuffer<'a, W> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doesn't matter too much, but this could probably be
| impl<'a, W: Write + 'static> ReceiverBuffer<'a, W> { | |
| impl<'a, W: Write + 'a> ReceiverBuffer<'a, W> { |
| /// Flush stdout if necessary. | ||
| fn flush(&mut self) -> Result<(), ExitCode> { | ||
| if self.stdout.flush().is_err() { | ||
| if self.printer.stdout.flush().is_err() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a flush method on to printer, so this doesn't need to worry about the stdout field?
|
|
||
| ## Features | ||
|
|
||
| - Add `--yaml` flag for YAML format output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be changed to
| - Add `--yaml` flag for YAML format output. | |
| - Add `--jsonl` flag for JSONL format output. |
| write!(self.stdout, "}}") | ||
| } | ||
|
|
||
| fn print_entry_detail(&mut self, format: OutputFormat, entry: &DirEntry) -> io::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could potentially combine this with the print_entry_json_obj method now, since it is only used for jsonl.
OTOH, it may be useful in the future if we add a yaml format later, or maybe a tabular format?

Implement
--yamlswitch for YAML format output, which allows usingyqor nushell to interact with the result.Actually I was initially working on #1765, but could not reach to a perfect state. JSON afraids trailing commas, making it not so easy to statelessly stream the result without a buffer. Considering a large number of potiential results, it would be memory-consuming to store all lines and print them at last.
On the other side, the YAML format, while infamous for its complexity on parsing, is friendly for streaming output. No need for serializing or extra dependencies, the simple and fast
write!is all you need.There are tools like
yqthat work just like belovedjq, and nushell supports YAML as well, so I suggest this PR might be able to close #1765.