Support optimized fetching and caching

Just started trying out `protofetch` upon a recommendation. However the first thing I noticed is how slow it can be depending on the source, currently only git. The example I've encountered was setting up the following dependency:

```toml
[grpc_health_v1]
url = "github.com/grpc/grpc"
revision = "b8a04acbbf18fd1c805e5d53d62ed9fa4721a4d1" # v1.64.0
protocol = "https"
allow_policies = ["src/proto/grpc/health/v1/*"]
```

The `grpc/health/v1/health.proto` file is just 2416 B. Looks like it mirrored the entire repo into `~/.cache/protofetch/github.com/grpc/grpc` taking up 416 MB and about a minute for it to be ready.  Performance is machine and network dependent of course, I'm using an M2 mac. And when doing a shallow git clone myself this is the output to see also network performance:

```shell
% git clone --depth=1 https://github.com/grpc/grpc
Cloning into 'grpc'...
remote: Enumerating objects: 13476, done.
remote: Counting objects: 100% (13476/13476), done.
remote: Compressing objects: 100% (8198/8198), done.
remote: Total 13476 (delta 4629), reused 10048 (delta 3865), pack-reused 0
Receiving objects: 100% (13476/13476), 19.37 MiB | 10.66 MiB/s, done.
Resolving deltas: 100% (4629/4629), done.
Updating files: 100% (12308/12308), done.
```

The shallow clone takes up less space, just 178 MB.

So my thought was, even if one could use a repo mirror to support multiple versions of different deps from the same source, would it really beat the efficiency, in practice, of a "shallow" fetch and strip out all but the proto files? Perhaps even wrapped in a `.tar.gz` that could just be streamed and decoded in memory when needed. I'd think actual git mirrors or clones would only be necessary if fetching git submodules was supported.

Shouldn't be the user's fault that the proto repo is too large.

Also noticed that `revision` can be a tag or a hash. IMO both should be supported and use the hash to confirm the tag when both are provided. Git tags are not constants, and being able to specify both serves as functional documentation rather than just a manual code comment I'd be doing now like in the above example.

In any case, if there is any desire to support a potentially breaking config change in the future, I'd think it would be great to support different fetch types like plain http (tarball) with optional sha256 checks as well despite [sometimes the hash of a source like git source archives may not be guaranteed for long term on some platforms](https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes/). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support optimized fetching and caching #137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support optimized fetching and caching #137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions