Skip to content

Commit 6c80f01

Browse files
committed
feat: modularise MIME detection and document extension workflow
- replace the monolithic detector with a detection context, pipeline coordinator, and category-specific signature detectors that cache and return rich match objects - harden the façade with guarded file handling, hash helpers, base64 URI fallbacks, and repository-driven MIME/extension lookups backed by a bi-directional map - refresh the README with a clearer quick start, lookup examples, and a step-by-step guide for registering custom detectors and mappings - add focused exception factories and byte-cache validation to signal unreadable files and support exhaustive test coverage - add support for ~70 new file types
1 parent ae20459 commit 6c80f01

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+5065
-1138
lines changed

.github/workflows/Test.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ jobs:
2020
runs-on: ubuntu-latest
2121
strategy:
2222
matrix:
23-
php: [ '8.1', '8.2', '8.3', '8.4' ]
24-
continue-on-error: ${{ matrix.php == '8.4' }}
23+
php: [ '8.1', '8.2', '8.3', '8.4', '8.5' ]
24+
continue-on-error: ${{ matrix.php == '8.5' }}
2525
name: PHP ${{ matrix.php }} Test
2626

2727
steps:
2828
- name: Git checkout
29-
uses: actions/checkout@v4
29+
uses: actions/checkout@v5
3030

3131
- name: Checkout submodules
3232
run: git submodule update --init --recursive
@@ -52,6 +52,6 @@ jobs:
5252
run: composer test -- --coverage-clover=coverage.xml
5353

5454
- name: Run codecov
55-
uses: codecov/codecov-action@v4
55+
uses: codecov/codecov-action@v5
5656
with:
5757
token: ${{ secrets.CODECOV_TOKEN }}

README.md

Lines changed: 161 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,97 +1,202 @@
11
# PHP Mime Detector
22

3-
Detecting the real type of (binary) files doesn't have to be difficult. Checking a file's extension is not reliable and can lead to serious security issues.
3+
A modern, extensible MIME type detector for PHP that analyses the actual bytes
4+
of a file instead of trusting its extension. The detector ships with a modular
5+
pipeline of signature matchers and a bidirectional repository of MIME type ↔
6+
extension mappings so you can integrate it into security-sensitive workflows.
7+
8+
## Features
9+
10+
- **Real file inspection** – identifies file formats by signature instead of
11+
relying on filenames. You can find a list of supported file formats in the
12+
[Wiki](https://github.com/SoftCreatR/php-mime-detector/wiki/Supported-file-types).
13+
- **Composable architecture** – category-specific detectors can be swapped in
14+
or extended without touching the core.
15+
- **Rich lookup helpers** – translate between MIME types and extensions in both
16+
directions and enumerate the supported catalogue.
17+
- **No external dependencies** – runs on any PHP 8.1+ installation without and can take
18+
requiring additional, external extensions, or packages, but takes advantage
19+
of `ZipArchive` when it is available.
420

5-
This package helps you determine the correct type of files by reading them byte by byte (up to 4096 bytes) and checking for [magic numbers](http://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files).
21+
## Installation
622

7-
However, this package isn't a replacement for any security software. It simply aims to produce fewer false positives than a simple extension check would.
23+
Install the package via Composer:
824

9-
A list of supported file types can be found on [this Wiki page](https://github.com/SoftCreatR/php-mime-detector/wiki/Supported-file-types).
25+
```bash
26+
composer require softcreatr/php-mime-detector
27+
```
1028

11-
## Why a Separate Class?
29+
## Quick start
1230

13-
You may wonder why we don't just rely on extensions like [Fileinfo](https://www.php.net/manual/en/book.fileinfo.php). Here's a brief background:
31+
Detecting the MIME type and the preferred extension for a file is as simple as
32+
instantiating the façade and calling its helpers:
1433

15-
We develop extensions and applications for an Open Source PHP Framework, creating web software for the masses. Many of our customers and users of our free products are on shared hosting without any ability to install or manage PHP extensions. Therefore, our goal is to develop solutions with minimal dependencies while providing as much functionality as possible.
34+
```php
35+
<?php
1636

17-
When developing a solution that allows people to convert HEIF/HEIC files to a more "standardized" format (using our own external API), we had trouble detecting these files because this format isn't widely recognized by most web servers. Since checking the file extension isn't reliable, we needed to find a reusable solution that works for most of our clients. This led to the creation of our Mime Detector, based on magic number checks.
37+
use SoftCreatR\MimeDetector\MimeDetector;
38+
use SoftCreatR\MimeDetector\MimeDetectorException;
1839

19-
## Requirements
40+
require 'vendor/autoload.php';
2041

21-
- PHP 8.1 or newer
22-
- [Composer](https://getcomposer.org)
42+
try {
43+
$detector = new MimeDetector(__DIR__ . '/example.png');
44+
45+
echo $detector->getMimeType(); // image/png
46+
echo $detector->getFileExtension(); // png
47+
echo $detector->getFileHash(); // crc32 hash of the file contents
48+
} catch (MimeDetectorException $exception) {
49+
// React to unreadable files or unsupported formats.
50+
echo $exception->getMessage();
51+
}
52+
```
2353

24-
## Installation
54+
## Resolving MIME types and extensions
2555

26-
Install this package using [Composer](https://getcomposer.org/) in the root directory of your project:
56+
The façade exposes several lookup helpers that do not require a file scan. They
57+
operate on the shared repository of known mappings:
2758

28-
```bash
29-
composer require softcreatr/php-mime-detector
59+
```php
60+
$detector = new MimeDetector(__DIR__ . '/example.png');
61+
62+
// Retrieve the canonical extension for a MIME type.
63+
$extension = $detector->getExtensionForMimeType('image/jpeg'); // "jpg"
64+
65+
// List every MIME type that corresponds to the given extension.
66+
$mimeTypes = $detector->getMimeTypesForExtension('heic');
67+
68+
// Fetch the complete map as [mimeType => list of extensions].
69+
$catalogue = $detector->listAllMimeTypes();
3070
```
3171

32-
## Usage
72+
Need a data URI? The detector will encode the configured file for you:
73+
74+
```php
75+
$dataUri = $detector->getBase64DataURI();
76+
// ...
77+
```
78+
79+
## Optional ZipArchive support
80+
81+
The detector is fully functional without PHP's `ZipArchive` extension; all ZIP
82+
signatures are recognised by scanning the first 4 KiB of the file for well-known
83+
markers such as `mimetype`, `[Content_Types].xml`, or `classes.dex`. When the
84+
extension is present, the `ZipSignatureDetector` opens the archive and inspects
85+
its entries directly. This deeper look allows the detector to resolve format
86+
families like OOXML (`.docx`, `.pptx`, `.xlsx`), APK/JAR/XPI bundles, and other
87+
ZIP-based containers even when their identifying files live deeper inside the
88+
archive than the cached bytes.
89+
90+
If the extension is missing, the detector simply falls back to its heuristic
91+
path and ultimately reports a generic `application/zip` match whenever a more
92+
specific signature cannot be derived. Unit tests that require `ZipArchive` are
93+
skipped automatically when the class is not available, so no additional setup is
94+
needed to run the suite.
95+
96+
## Extending the detector
97+
98+
Custom formats can be added without modifying the library itself. Follow these
99+
steps to teach the detector about a new signature and MIME mapping.
33100

34-
Here is an example of how this package makes it easy to determine the MIME type and corresponding file extension of a given file:
101+
### 1. Implement a signature detector
102+
103+
Create a class that implements
104+
`SoftCreatR\MimeDetector\Contract\FileSignatureDetectorInterface`. The detector
105+
receives the `DetectionContext`, which gives access to the file buffer and lets
106+
you return a `MimeTypeMatch` when the signature is recognised.
35107

36108
```php
37109
<?php
38110

39-
use SoftCreatR\MimeDetector\MimeDetector;
40-
use SoftCreatR\MimeDetector\MimeDetectorException;
111+
namespace App\MimeDetector;
41112

42-
require 'vendor/autoload.php';
113+
use SoftCreatR\MimeDetector\Attribute\DetectorCategory;
114+
use SoftCreatR\MimeDetector\Contract\FileSignatureDetectorInterface;
115+
use SoftCreatR\MimeDetector\Detection\DetectionContext;
116+
use SoftCreatR\MimeDetector\Detection\MimeTypeMatch;
43117

44-
try {
45-
// Create an instance of MimeDetector with the file path
46-
$mimeDetector = new MimeDetector('foo.bar');
47-
48-
// Get the MIME type and file extension
49-
$fileData = [
50-
'mime_type' => $mimeDetector->getMimeType(),
51-
'file_extension' => $mimeDetector->getFileExtension(),
52-
'file_hash' => $mimeDetector->getFileHash(),
53-
];
54-
55-
// Print the result
56-
echo '<pre>' . print_r($fileData, true) . '</pre>';
57-
} catch (MimeDetectorException $e) {
58-
die('An error occurred while trying to load the given file: ' . $e->getMessage());
118+
#[DetectorCategory('custom')]
119+
final class CustomContainerDetector implements FileSignatureDetectorInterface
120+
{
121+
public function detect(DetectionContext $context): ?MimeTypeMatch
122+
{
123+
$buffer = $context->buffer();
124+
125+
if ($buffer->checkForBytes([0x43, 0x55, 0x53, 0x54])) { // "CUST"
126+
return new MimeTypeMatch('custom', 'application/x-custom');
127+
}
128+
129+
return null;
130+
}
59131
}
60132
```
61133

62-
## Testing
134+
### 2. Register MIME mappings
63135

64-
This project uses PHPUnit for testing. To run tests, use the following command:
136+
Extend the repository so your MIME type resolves to the expected extension(s):
65137

66-
```bash
67-
composer test
138+
```php
139+
use SoftCreatR\MimeDetector\MimeTypeRepository;
140+
141+
$repository = MimeTypeRepository::createDefault();
142+
$repository->register('custom', 'application/x-custom');
68143
```
69144

70-
To run a full test suite, you can use a set of provided test files. These files are not included in the Composer package or Git repository, so you must clone this repository and initialize its submodules:
145+
### 3. Compose a detector pipeline
71146

72-
```bash
73-
git clone https://github.com/SoftCreatR/php-mime-detector
74-
cd php-mime-detector
75-
git submodule update --init --recursive
147+
Finally, plug your detector into the default pipeline and hand both pieces to
148+
the façade. Placing your detector first ensures it runs before the built-in
149+
matchers:
150+
151+
```php
152+
use SoftCreatR\MimeDetector\Detection\DetectorPipeline;
153+
use SoftCreatR\MimeDetector\Detector\ArchiveSignatureDetector;
154+
use SoftCreatR\MimeDetector\Detector\DocumentSignatureDetector;
155+
use SoftCreatR\MimeDetector\Detector\ExecutableSignatureDetector;
156+
use SoftCreatR\MimeDetector\Detector\FontSignatureDetector;
157+
use SoftCreatR\MimeDetector\Detector\ImageSignatureDetector;
158+
use SoftCreatR\MimeDetector\Detector\MediaSignatureDetector;
159+
use SoftCreatR\MimeDetector\Detector\MiscSignatureDetector;
160+
use SoftCreatR\MimeDetector\Detector\XmlSignatureDetector;
161+
use SoftCreatR\MimeDetector\Detector\ZipSignatureDetector;
162+
use SoftCreatR\MimeDetector\MimeDetector;
163+
164+
$pipeline = DetectorPipeline::create(
165+
new CustomContainerDetector(),
166+
new ImageSignatureDetector(),
167+
new ZipSignatureDetector(),
168+
new ArchiveSignatureDetector(),
169+
new MediaSignatureDetector(),
170+
new DocumentSignatureDetector(),
171+
new FontSignatureDetector(),
172+
new ExecutableSignatureDetector(),
173+
new MiscSignatureDetector(),
174+
new XmlSignatureDetector(),
175+
);
176+
177+
$detector = new MimeDetector(__DIR__ . '/file.cust', $repository, $pipeline);
76178
```
77179

78-
After that, install the necessary dependencies with `composer install`, and run PHPUnit as described above.
180+
From this point the new MIME type behaves exactly like the built-in ones – it
181+
can be detected from files, resolved by MIME type, and listed in the catalogue.
79182

80-
## ToDo
183+
## Testing
81184

82-
- Reduce method sizes where possible.
83-
- Add a method that accepts a MIME type and returns the corresponding file extension.
84-
- Add a method that accepts a file extension and returns a list of corresponding MIME types.
85-
- Add a method that returns a list of all detectable MIME types and their corresponding file extensions.
185+
Fixture files for the test suite are stored in a Git submodule. After cloning
186+
this repository run:
86187

87-
## Contributing
188+
```bash
189+
git submodule update --init --recursive
190+
composer install
191+
composer test
192+
```
88193

89-
Please see [CONTRIBUTING](CONTRIBUTING.md) for details.
194+
## Contributing
90195

91-
When adding new detections, please provide at least one sample file to ensure the detection works as expected.
196+
We welcome pull requests! Please review [CONTRIBUTING](CONTRIBUTING.md) for the
197+
coding standards and workflow. When adding new detections, include at least one
198+
fixture so behaviour can be verified automatically.
92199

93200
## License
94201

95-
[ISC License](LICENSE.md)
96-
97-
Free Software, Hell Yeah!
202+
Released under the [ISC License](LICENSE.md).

composer.json

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
{
22
"name": "softcreatr/php-mime-detector",
3-
"description": "Mime Detector Decoder",
3+
"description": "Modern, extensible MIME type detector for PHP",
4+
"license": "ISC",
45
"keywords": [
56
"Mime",
67
"Mime Type",
78
"File Type",
8-
"Magic Number"
9+
"Magic Number",
10+
"Detector",
11+
"Extension Lookup"
912
],
10-
"homepage": "https://1-2.dev",
11-
"version": "4.0.0",
12-
"license": "ISC",
1313
"authors": [
1414
{
1515
"name": "Sascha Greuel",
@@ -20,19 +20,23 @@
2020
"email": "[email protected]"
2121
}
2222
],
23+
"homepage": "https://1-2.dev",
24+
"support": {
25+
"email": "[email protected]",
26+
"issues": "https://github.com/SoftCreatR/php-mime-detector/issues",
27+
"source": "https://github.com/SoftCreatR/php-mime-detector"
28+
},
2329
"require": {
24-
"php": ">=8.1.0"
30+
"php": ">=8.1.0",
31+
"ext-ctype": "*"
2532
},
2633
"require-dev": {
2734
"phpunit/phpunit": ">=10.0",
2835
"roave/security-advisories": "dev-latest",
29-
"squizlabs/php_codesniffer": "^3.7"
30-
},
31-
"config": {
32-
"optimize-autoloader": true,
33-
"preferred-install": "dist",
34-
"sort-packages": true
36+
"squizlabs/php_codesniffer": "^3.10"
3537
},
38+
"minimum-stability": "stable",
39+
"prefer-stable": true,
3640
"autoload": {
3741
"psr-4": {
3842
"SoftCreatR\\MimeDetector\\": "src/SoftCreatR/MimeDetector"
@@ -43,16 +47,14 @@
4347
"SoftCreatR\\Tests\\MimeDetector\\": "tests/SoftCreatR/MimeDetector"
4448
}
4549
},
46-
"minimum-stability": "stable",
47-
"prefer-stable": true,
50+
"config": {
51+
"optimize-autoloader": true,
52+
"preferred-install": "dist",
53+
"sort-packages": true
54+
},
4855
"scripts": {
4956
"cs-check": "phpcs",
5057
"cs-fix": "phpcbf",
5158
"test": "phpunit"
52-
},
53-
"support": {
54-
"email": "[email protected]",
55-
"issues": "https://github.com/SoftCreatR/php-mime-detector/issues",
56-
"source": "https://github.com/SoftCreatR/php-mime-detector"
5759
}
5860
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace SoftCreatR\MimeDetector\Attribute;
6+
7+
use Attribute;
8+
9+
/**
10+
* Attribute to group signature detectors by a logical category.
11+
*
12+
* The category metadata can be used by consumers to build reporting or
13+
* filtering logic around the shipped signature detectors.
14+
*/
15+
#[Attribute(Attribute::TARGET_CLASS)]
16+
final class DetectorCategory
17+
{
18+
public function __construct(public readonly string $name)
19+
{
20+
// ...
21+
}
22+
}

0 commit comments

Comments
 (0)