|
1 | 1 | # PHP Mime Detector |
2 | 2 |
|
3 | | -Detecting the real type of (binary) files doesn't have to be difficult. Checking a file's extension is not reliable and can lead to serious security issues. |
| 3 | +A modern, extensible MIME type detector for PHP that analyses the actual bytes |
| 4 | +of a file instead of trusting its extension. The detector ships with a modular |
| 5 | +pipeline of signature matchers and a bidirectional repository of MIME type ↔ |
| 6 | +extension mappings so you can integrate it into security-sensitive workflows. |
| 7 | + |
| 8 | +## Features |
| 9 | + |
| 10 | +- **Real file inspection** – identifies file formats by signature instead of |
| 11 | + relying on filenames. You can find a list of supported file formats in the |
| 12 | + [Wiki](https://github.com/SoftCreatR/php-mime-detector/wiki/Supported-file-types). |
| 13 | +- **Composable architecture** – category-specific detectors can be swapped in |
| 14 | + or extended without touching the core. |
| 15 | +- **Rich lookup helpers** – translate between MIME types and extensions in both |
| 16 | + directions and enumerate the supported catalogue. |
| 17 | +- **No external dependencies** – runs on any PHP 8.1+ installation without and can take |
| 18 | + requiring additional, external extensions, or packages, but takes advantage |
| 19 | + of `ZipArchive` when it is available. |
4 | 20 |
|
5 | | -This package helps you determine the correct type of files by reading them byte by byte (up to 4096 bytes) and checking for [magic numbers](http://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files). |
| 21 | +## Installation |
6 | 22 |
|
7 | | -However, this package isn't a replacement for any security software. It simply aims to produce fewer false positives than a simple extension check would. |
| 23 | +Install the package via Composer: |
8 | 24 |
|
9 | | -A list of supported file types can be found on [this Wiki page](https://github.com/SoftCreatR/php-mime-detector/wiki/Supported-file-types). |
| 25 | +```bash |
| 26 | +composer require softcreatr/php-mime-detector |
| 27 | +``` |
10 | 28 |
|
11 | | -## Why a Separate Class? |
| 29 | +## Quick start |
12 | 30 |
|
13 | | -You may wonder why we don't just rely on extensions like [Fileinfo](https://www.php.net/manual/en/book.fileinfo.php). Here's a brief background: |
| 31 | +Detecting the MIME type and the preferred extension for a file is as simple as |
| 32 | +instantiating the façade and calling its helpers: |
14 | 33 |
|
15 | | -We develop extensions and applications for an Open Source PHP Framework, creating web software for the masses. Many of our customers and users of our free products are on shared hosting without any ability to install or manage PHP extensions. Therefore, our goal is to develop solutions with minimal dependencies while providing as much functionality as possible. |
| 34 | +```php |
| 35 | +<?php |
16 | 36 |
|
17 | | -When developing a solution that allows people to convert HEIF/HEIC files to a more "standardized" format (using our own external API), we had trouble detecting these files because this format isn't widely recognized by most web servers. Since checking the file extension isn't reliable, we needed to find a reusable solution that works for most of our clients. This led to the creation of our Mime Detector, based on magic number checks. |
| 37 | +use SoftCreatR\MimeDetector\MimeDetector; |
| 38 | +use SoftCreatR\MimeDetector\MimeDetectorException; |
18 | 39 |
|
19 | | -## Requirements |
| 40 | +require 'vendor/autoload.php'; |
20 | 41 |
|
21 | | -- PHP 8.1 or newer |
22 | | -- [Composer](https://getcomposer.org) |
| 42 | +try { |
| 43 | + $detector = new MimeDetector(__DIR__ . '/example.png'); |
| 44 | + |
| 45 | + echo $detector->getMimeType(); // image/png |
| 46 | + echo $detector->getFileExtension(); // png |
| 47 | + echo $detector->getFileHash(); // crc32 hash of the file contents |
| 48 | +} catch (MimeDetectorException $exception) { |
| 49 | + // React to unreadable files or unsupported formats. |
| 50 | + echo $exception->getMessage(); |
| 51 | +} |
| 52 | +``` |
23 | 53 |
|
24 | | -## Installation |
| 54 | +## Resolving MIME types and extensions |
25 | 55 |
|
26 | | -Install this package using [Composer](https://getcomposer.org/) in the root directory of your project: |
| 56 | +The façade exposes several lookup helpers that do not require a file scan. They |
| 57 | +operate on the shared repository of known mappings: |
27 | 58 |
|
28 | | -```bash |
29 | | -composer require softcreatr/php-mime-detector |
| 59 | +```php |
| 60 | +$detector = new MimeDetector(__DIR__ . '/example.png'); |
| 61 | + |
| 62 | +// Retrieve the canonical extension for a MIME type. |
| 63 | +$extension = $detector->getExtensionForMimeType('image/jpeg'); // "jpg" |
| 64 | + |
| 65 | +// List every MIME type that corresponds to the given extension. |
| 66 | +$mimeTypes = $detector->getMimeTypesForExtension('heic'); |
| 67 | + |
| 68 | +// Fetch the complete map as [mimeType => list of extensions]. |
| 69 | +$catalogue = $detector->listAllMimeTypes(); |
30 | 70 | ``` |
31 | 71 |
|
32 | | -## Usage |
| 72 | +Need a data URI? The detector will encode the configured file for you: |
| 73 | + |
| 74 | +```php |
| 75 | +$dataUri = $detector->getBase64DataURI(); |
| 76 | +// ... |
| 77 | +``` |
| 78 | + |
| 79 | +## Optional ZipArchive support |
| 80 | + |
| 81 | +The detector is fully functional without PHP's `ZipArchive` extension; all ZIP |
| 82 | +signatures are recognised by scanning the first 4 KiB of the file for well-known |
| 83 | +markers such as `mimetype`, `[Content_Types].xml`, or `classes.dex`. When the |
| 84 | +extension is present, the `ZipSignatureDetector` opens the archive and inspects |
| 85 | +its entries directly. This deeper look allows the detector to resolve format |
| 86 | +families like OOXML (`.docx`, `.pptx`, `.xlsx`), APK/JAR/XPI bundles, and other |
| 87 | +ZIP-based containers even when their identifying files live deeper inside the |
| 88 | +archive than the cached bytes. |
| 89 | + |
| 90 | +If the extension is missing, the detector simply falls back to its heuristic |
| 91 | +path and ultimately reports a generic `application/zip` match whenever a more |
| 92 | +specific signature cannot be derived. Unit tests that require `ZipArchive` are |
| 93 | +skipped automatically when the class is not available, so no additional setup is |
| 94 | +needed to run the suite. |
| 95 | + |
| 96 | +## Extending the detector |
| 97 | + |
| 98 | +Custom formats can be added without modifying the library itself. Follow these |
| 99 | +steps to teach the detector about a new signature and MIME mapping. |
33 | 100 |
|
34 | | -Here is an example of how this package makes it easy to determine the MIME type and corresponding file extension of a given file: |
| 101 | +### 1. Implement a signature detector |
| 102 | + |
| 103 | +Create a class that implements |
| 104 | +`SoftCreatR\MimeDetector\Contract\FileSignatureDetectorInterface`. The detector |
| 105 | +receives the `DetectionContext`, which gives access to the file buffer and lets |
| 106 | +you return a `MimeTypeMatch` when the signature is recognised. |
35 | 107 |
|
36 | 108 | ```php |
37 | 109 | <?php |
38 | 110 |
|
39 | | -use SoftCreatR\MimeDetector\MimeDetector; |
40 | | -use SoftCreatR\MimeDetector\MimeDetectorException; |
| 111 | +namespace App\MimeDetector; |
41 | 112 |
|
42 | | -require 'vendor/autoload.php'; |
| 113 | +use SoftCreatR\MimeDetector\Attribute\DetectorCategory; |
| 114 | +use SoftCreatR\MimeDetector\Contract\FileSignatureDetectorInterface; |
| 115 | +use SoftCreatR\MimeDetector\Detection\DetectionContext; |
| 116 | +use SoftCreatR\MimeDetector\Detection\MimeTypeMatch; |
43 | 117 |
|
44 | | -try { |
45 | | - // Create an instance of MimeDetector with the file path |
46 | | - $mimeDetector = new MimeDetector('foo.bar'); |
47 | | - |
48 | | - // Get the MIME type and file extension |
49 | | - $fileData = [ |
50 | | - 'mime_type' => $mimeDetector->getMimeType(), |
51 | | - 'file_extension' => $mimeDetector->getFileExtension(), |
52 | | - 'file_hash' => $mimeDetector->getFileHash(), |
53 | | - ]; |
54 | | - |
55 | | - // Print the result |
56 | | - echo '<pre>' . print_r($fileData, true) . '</pre>'; |
57 | | -} catch (MimeDetectorException $e) { |
58 | | - die('An error occurred while trying to load the given file: ' . $e->getMessage()); |
| 118 | +#[DetectorCategory('custom')] |
| 119 | +final class CustomContainerDetector implements FileSignatureDetectorInterface |
| 120 | +{ |
| 121 | + public function detect(DetectionContext $context): ?MimeTypeMatch |
| 122 | + { |
| 123 | + $buffer = $context->buffer(); |
| 124 | + |
| 125 | + if ($buffer->checkForBytes([0x43, 0x55, 0x53, 0x54])) { // "CUST" |
| 126 | + return new MimeTypeMatch('custom', 'application/x-custom'); |
| 127 | + } |
| 128 | + |
| 129 | + return null; |
| 130 | + } |
59 | 131 | } |
60 | 132 | ``` |
61 | 133 |
|
62 | | -## Testing |
| 134 | +### 2. Register MIME mappings |
63 | 135 |
|
64 | | -This project uses PHPUnit for testing. To run tests, use the following command: |
| 136 | +Extend the repository so your MIME type resolves to the expected extension(s): |
65 | 137 |
|
66 | | -```bash |
67 | | -composer test |
| 138 | +```php |
| 139 | +use SoftCreatR\MimeDetector\MimeTypeRepository; |
| 140 | + |
| 141 | +$repository = MimeTypeRepository::createDefault(); |
| 142 | +$repository->register('custom', 'application/x-custom'); |
68 | 143 | ``` |
69 | 144 |
|
70 | | -To run a full test suite, you can use a set of provided test files. These files are not included in the Composer package or Git repository, so you must clone this repository and initialize its submodules: |
| 145 | +### 3. Compose a detector pipeline |
71 | 146 |
|
72 | | -```bash |
73 | | -git clone https://github.com/SoftCreatR/php-mime-detector |
74 | | -cd php-mime-detector |
75 | | -git submodule update --init --recursive |
| 147 | +Finally, plug your detector into the default pipeline and hand both pieces to |
| 148 | +the façade. Placing your detector first ensures it runs before the built-in |
| 149 | +matchers: |
| 150 | + |
| 151 | +```php |
| 152 | +use SoftCreatR\MimeDetector\Detection\DetectorPipeline; |
| 153 | +use SoftCreatR\MimeDetector\Detector\ArchiveSignatureDetector; |
| 154 | +use SoftCreatR\MimeDetector\Detector\DocumentSignatureDetector; |
| 155 | +use SoftCreatR\MimeDetector\Detector\ExecutableSignatureDetector; |
| 156 | +use SoftCreatR\MimeDetector\Detector\FontSignatureDetector; |
| 157 | +use SoftCreatR\MimeDetector\Detector\ImageSignatureDetector; |
| 158 | +use SoftCreatR\MimeDetector\Detector\MediaSignatureDetector; |
| 159 | +use SoftCreatR\MimeDetector\Detector\MiscSignatureDetector; |
| 160 | +use SoftCreatR\MimeDetector\Detector\XmlSignatureDetector; |
| 161 | +use SoftCreatR\MimeDetector\Detector\ZipSignatureDetector; |
| 162 | +use SoftCreatR\MimeDetector\MimeDetector; |
| 163 | + |
| 164 | +$pipeline = DetectorPipeline::create( |
| 165 | + new CustomContainerDetector(), |
| 166 | + new ImageSignatureDetector(), |
| 167 | + new ZipSignatureDetector(), |
| 168 | + new ArchiveSignatureDetector(), |
| 169 | + new MediaSignatureDetector(), |
| 170 | + new DocumentSignatureDetector(), |
| 171 | + new FontSignatureDetector(), |
| 172 | + new ExecutableSignatureDetector(), |
| 173 | + new MiscSignatureDetector(), |
| 174 | + new XmlSignatureDetector(), |
| 175 | +); |
| 176 | + |
| 177 | +$detector = new MimeDetector(__DIR__ . '/file.cust', $repository, $pipeline); |
76 | 178 | ``` |
77 | 179 |
|
78 | | -After that, install the necessary dependencies with `composer install`, and run PHPUnit as described above. |
| 180 | +From this point the new MIME type behaves exactly like the built-in ones – it |
| 181 | +can be detected from files, resolved by MIME type, and listed in the catalogue. |
79 | 182 |
|
80 | | -## ToDo |
| 183 | +## Testing |
81 | 184 |
|
82 | | -- Reduce method sizes where possible. |
83 | | -- Add a method that accepts a MIME type and returns the corresponding file extension. |
84 | | -- Add a method that accepts a file extension and returns a list of corresponding MIME types. |
85 | | -- Add a method that returns a list of all detectable MIME types and their corresponding file extensions. |
| 185 | +Fixture files for the test suite are stored in a Git submodule. After cloning |
| 186 | +this repository run: |
86 | 187 |
|
87 | | -## Contributing |
| 188 | +```bash |
| 189 | +git submodule update --init --recursive |
| 190 | +composer install |
| 191 | +composer test |
| 192 | +``` |
88 | 193 |
|
89 | | -Please see [CONTRIBUTING](CONTRIBUTING.md) for details. |
| 194 | +## Contributing |
90 | 195 |
|
91 | | -When adding new detections, please provide at least one sample file to ensure the detection works as expected. |
| 196 | +We welcome pull requests! Please review [CONTRIBUTING](CONTRIBUTING.md) for the |
| 197 | +coding standards and workflow. When adding new detections, include at least one |
| 198 | +fixture so behaviour can be verified automatically. |
92 | 199 |
|
93 | 200 | ## License |
94 | 201 |
|
95 | | -[ISC License](LICENSE.md) |
96 | | - |
97 | | -Free Software, Hell Yeah! |
| 202 | +Released under the [ISC License](LICENSE.md). |
0 commit comments