When managing a digital archive, prefer file formats that are likely to be usable in the future.
The usual evaluative criteria apply:
- Is the format documented and well understood?
- Are there multiple encoders and decoders for the format?
- Is there broad industry support?
- Is the format maintained by one entity or many?
- Does the format require specialized software or hardware?
Instead of using sidecar metadata files, embed metadata directly in digital objects:
- Only one file needs to be maintained
- It reduces the chance of metadata being separated from the object it describes
- Sensitive information may need to be stripped when the item is used or exported
- Some file formats may not support embedded metadata, or may not support all needed fields, or may place limits on what or how much metadata can be embedded
- Not all software can display or edit embedded metadata
- Fixity checksums must be updated when metadata is updated (this also makes it more difficult to ensure a digital object has not been altered from the moment it was ingested)
- Indexing the embedded metadata requires software to parse all of the different file formats in an archive
Depending on the recovery capabilities of a compressed bundle format, a data error may render the entire bundle unreadable. If the contents were stored as individual files instead, it’s likely an equivalent data error would only affect one of the files.
- Edward M. Corrado and Heather Moulaison Sandy. Digital Preservation for Libraries, Archives, & Museums. 2017.