I noticed that in term aggregations, doc_count isn't the actual number of documents that match the term but the number of times the term appears. For example, in this test, some documents have the term text_field => "Hello Hello" twice and are therefore counted twice.
In Lucene, I think you can get the right doc_count by deduplicating thanks to SortedSetDocValues which stores multi valued terms for a document in sorted order. My understanding is that ES uses this to display actual document counts.
Could something similar be achieved in Tantivy? do we want to? If not, I think we should document doc_count to clarify this.