MRC format

The CCP4/MRC format is a data format best for storing uniform sized matrices. Although limited by the lack of integrated meta data storage, its simplicity, clarity and compatibility with crystallographic maps had gained it great popularity in the EM world. 2D data (micrographs, gain references, etc.), 3D data (movies stacks, particle stacks, maps and masks) and 3+D data (for instance, volume stacks - many 3D maps of a same size) can all be stored in the MRC file format. This page will explain certain key elements of the MRC format and its relation with the CCP4 map format.

For programmers: the MRC files should follow the  MRC2014  specification. Specifically, one should pay attention that the magic word for MRC/CCP4 files is the 4-byte string 'MAP ' ('MAP' and a space, 0x4D 0x41 0x50 0x20) instead of the 3 visible characters alone.

Background: maps and masks used in crystallography
Crystallographic maps are often saved with extensions such as .map, .ccp4, .ccp4map and so on. However the file extension is not important at any level for the map file to be successfully interpreted. The important factors for a crystallographic map file to be correctly interpreted are that the file itself contains a proper 1024-byte header, a data block that is consistent with the header, and often, an proper extended header storing symmetry operators. A crystallographic map file is in an essentially identical file format, the MRC-CCP4 format, as an EM image file or an EM map file, except that it contains crystallographic symmetry information in its header and 3D map data in the data matrix.

Crystallographic maps are used to represent the electron densities calculated from X-ray diffraction experiments. Each map covers a continuous parallelepiped 3D volume and the data matrix is composed of evenly spaced sampling points in this volume. Although one might imagine a 3D map as a 3D matrix of voxels (as those found in the game Minecraft), the data points in fact only have coordinates in space but not volumes or shapes, therefore are more properly described as sampling points. In addition, crystallographic maps always use the crystal coordinate system (with implied fractional coordinates), which have axes along the crystallographic axes - not necessarily 90 degrees between each pair! That means, these sampling points are not in a Cartesian coordinate system, but a skew coordinate system. Therefore, even if we are allowed to imagine them as voxels, these are not square voxels, not rectangular parallelepiped (cuboid, bricks) voxels, but parallelepiped voxels. Map display programs will usually use an extrapolation scheme to generate smooth maps from these sampling points. In most cases the sampling frequency upon saving a crystallographic map is determined by the resolution of the diffraction data (structure factors) it is calculated from. The maps are usually sampled at 1/2 - 1/4 spacing of the highest resolution found in the structure factor file to ensure that information can be reconstructed at the same resolutoin. In electron density map files each data point is usually saved as a 32-bit floating-point number, with the unit being electrons/(Å3).

The data found in a electron density maps file, therefore, is simply a 3D matrix of numbers, which dimensions column, row and section. Because of the existence of crystallographic symmetry, it is not necessary to store the whole unit cell. It is very common to store only the volume that covers one full asymmetric unit, sometimes with a little extra to ensure continuity. This is why when opened in UCSF Chimera, a crystallographic maps is often cut at an "arbitrary" plane. Because of this, for a crystallographic map file, we not only need to store the 6 parameters for the dimension and shape of the unit cell, but also the origin and extent of the map portion that have been saved in the data matrix. The infinitely continuous maps we see in the model building program COOT is the result of extending the saved portion of the map with these information found in the header and the extended header. The extended header usually saves a copy of the symmetry operations, each in a 80-byte block. However since the space group itself is sufficient to indicate the symmops, the symmops in the extended head are in fact redundant.

It needs to be pointed out though, to store more than the extent of one assymmetric unit in a crystallographic map file is completely legit, as long the symmetry-related map points are consistent with each other. Thus one can generate a map that covers a whole molecule or a number of unit cells for analysis. See Manipulating maps.

Masks for maps are usually simply '1's and '0's saved in a map file, indicating selections. Because we only need 1 and 0 for the selected and discarded, or maybe sometimes a few other small integers for a few more categories, it is not necessary to dedicate a 4-byte floating-point number for each data point. More importantly, test of equality among floating-point numbers can become tricky and unpredictable if the programmer accidentally creates a special situation: floating-point number cannot exactly represent most non-integer numbers, and most large integers ( the smallest unrepresentable one is 2^127+1). Therefore a mask file saves the data in integers. The type of data saved in a map or mask file is indicated by the MODE number found in bytes 13-16 in the header. For crystallographic maps, only two modes are used: MODE '2' indicates 32-bit floating-point data; MODE '0' indicates 8-bit signed integer data.

Grid(nubmer of intervals)
A crystallographic map saves sampling points inside the unit cell. According to the Nyquist-Shannon theorem, to make sure that map details won't be lost due to under-sampling, the electron density map needs to be sampled at intervals at least 1/2 of the highest resolution of the map. In practice, the sampling is normally done at ~1/3-~1/4 of the resolution, a little bit of over-sampling to be on the safe side. For example, if a unit cell measures 100 Å in its a edge and the map is at 1 Å resolution, then the map should be sampled 300 times along the X direction. Here 300 is the number of intervals or, grids, in the X direction, and the sampling interval is 0.33 Å. In a map file, the number of intervals along the X, Y, Z directions are saved as three integers at words 8-10 in the map header (refer to the table in section ). The same idea is also applicable in EM maps, except that the number of intervals (now called MX, MY, MZ) are simply number of voxels in each dimension of the volume.

Extent
The matrix saved in a crystallographic map file is often not the whole unit cell, but an asymmetric unit (ASU) only. This is because for a crystallographic unit cell with symmetry, only the data within an ASU are unique. One whole unit cell, or any arbitrary volume, can be easily constructed by duplicating from the saved ASU with symmetry operations.

This is not to say that a map file always only saves an ASU though. There is nothing to prevent one from saving "redundant" columns, rows or sections in a map file. Then how much data is saved in a map file? This information is found in the first three int32 numbers of a map file: the column, row, and section. If the map densities are saved in float32 (usually they are, unless the file is a mask), then the data block should have a size of column x row x section x 4, because each float32 number uses 4 bytes.

Origin
Just like a map file does not need to save a whole unit cell, a map file also does not necessarily start from the coordinate (0,0,0). The coordinate of the origin of the saved map volume is indicated by three integers saved in the map header words 5-7. Upon loading, the volume matrix saved in the maps file will have its first column, row and section shifted by these three numbers. This allows the programs to correctly place the map volume in the unit cell.

Example
As an example, if one wants to have a map file that when loaded in UCSF Chimera will cover a whole protein PDB, one often needs to extend the crystallographic map coming from refinement programs. To do this, one can either run the CCP4 program FFT and tell it to generate a map file that covers the supplied PDB, or use the CCP4 program EXTENDS to extend an existing map to start from certain point and have specified extent. The EXTENDS mathod may require the person to manually measure the extremities of the PDB coordinates and use a little knowledge stated here to estimate the extents and the origins. Then what is going to happen is that data points from an ASU will be simply copied based on the Unit cell's symmetry to generate a large enough 3D data array, which is then saved in the new map file, with required origin and extent.

Reference: http://xray.bmc.uu.se/usf/mapman_man.html#H7

EM map
Similar to crystallographic maps, an EM map is often simply considered and used as a map representing probabilities of atoms in space. However, there are a few important differences that one should be aware about EM maps.

Differences between crystallographic maps and EM maps

 * EM maps are electrostatic potential maps. (Recall how electrons move in the electrostatic field of a point charge - the polarity of the point charge matters.) In contrast, X-ray crystallographic maps are electron density maps (photon-electron interaction is dependent on the electron density alone, not the local electrostatic field). The consequence of this is that for negatively charged groups, one can expect to see weakened densities or even negative densities in EM maps (for example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192980/). Therefore for simplicity, one might want to use the term "EM maps" when referring to the EM maps.


 * EM maps do not need to deal with crystallographic symmetries and periodicity. In each map, there is only a single molecule or molecular complex of interest floating in a space of infinite size. This greatly simplifies manipulation of EM maps. For comparison, even for a crystallographic map in space group P1, which is the simplest case of crystallographic maps, one still only has the freedom of translating the coordinates, but not rotating them, to keep the higher order (packing) of the crystal intact.


 * Crystallographic maps are in skew coordinate systems. The 3 axes of a crystallographic map are along the edges of the unit cell, and are not necessarily orthogonal to each other. EM maps are each in a Cartesian system, and the unit cell angles found in the header should be 90.0 90.0 90.0 (Hex: 00 00 B4 42, 00 00 B4 42, 00 00 B4 42 ). The PDB files, regardless of the data source, are in Cartesian system with the X axis overlapping with the a axis and the XY plane overlapping with the ab plane of the unit cell.


 * Not exactly map: the MRC files are not only used to save 3D volumes, but also 2D images and image stacks. In such cases the column and row numbers are the dimensions of one 2D image, and the section number indicates the number of 2D images saved in the file. The Unit cell c length is meaningless in such cases. However to maintain consistency, this number is usually (pixel size) x sections - essentially treating pixels as square voxels.

Comparison between the CCP4 map header and the MRC-2014 header
CCP4 map standard: http://www.ccp4.ac.uk/html/maplib.html

MRC2014 standard: http://www.ccpem.ac.uk/mrc_format/mrc2014.php

Detailed discussion on the MRC2014 standard: https://www.sciencedirect.com/science/article/pii/S104784771500074X


 * The word 23, ISPG, is reused in MRC files with number >400 (usually 401) to indicate a 3D+ volume stack.


 * In CCP4 maps, words 8-10, NX NY NZ, are number of samplings along the unit cell edges. In MRC2014, these 3 words are renamed MX MY MZ, indicating samplings along the edges of a single 3D volume. Particularly, the word 10, MZ, is used to indicate number of sections along Z in each 3D volume, while the MRC word 3, NZ (not same as the word 10 NZ in CCP4 map, this is a confusing naming), indicates total sections saved in the MRC file. Therefore, NZ = MZ x (number of volumnes) in a volume stack file (saving multiple 3D volumes).


 * As said above, words 1-3 in MRC file are renamed NX, NY, NZ. This is a confusing renaming because NX, NY, NZ had been used to refer to words 8-10 in the CCP4 maps. If words 1-3 retained names NC, NR and NS in the MRC2014 standard it would have been a lot clearer.


 * The voxel size in MRC files are unit cell lengths (words 11-13) devided by MX,MY,MZ (words 8-10)


 * The Origin (words 50-52) is used by MRC format to indicate sub-image/sub-volume origins in the original larger image(2D) or volume(3D) it is taken from. See the MRC2014 reference linked above the table.


 * The "A4 format" above is presumably a vestige of the typewriter era. Similarly we can say: "the PDB format is A4 format".


 * Machine stamps: The machine stamps at word 54 are four half-bytes indicating the number formats/endianess for double (d), float (f), int (i) and unsigned char (c) type. For little endian hardware(PC) the stamp is 0x44, 0x41, 0x00, 0x00 ('DA ') while the big endian stamp is 0x11, 0x11, 0x00, 0x00. The 0x44 0x41 machine stamps are actually assigned for the  IEEE754 floating-point format, which is common among all modern computers. Very old map file may use other floating-point formats such as VAX. There are machine stamps other than 0x44, 0x41 or 0x11 for indicating these floating-point number formats, which can be found in the CCP4 core library codes (listed below). But in reality no one had ever seen a single example of map file saved in non-IEEE formats. See this thread on CCP4BB: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1811&L=ccp4bb&F=&S=&P=76868

Data Modes
Map modes 0-5 are inherited from the ccp4 maps. The MRC2014 standard only introduced map mode 6. The Thermo (former FEI) EPU uses mode 6 to save 16-bit integer data from the Falcon camera or the Ceta camera, for example. CCP4 map Note: Mode 2 is the normal mode used in the CCP4 programs. Other modes than 2 and 0 may NOT WORK

Extended header
The extended header is composed of 80-byte blocks placed after the 1024-byte main header. The size of the extended header is indicated in bytes at word 24 NSYMBT. By all standards, NSYMBT should be a multiple of 80. For MRC files, there is no rule for the content of the extended header. Therefore one could potentially use this part to save meta data. However because of the lack of standards and the ease in writing homebrew MRC-reading programs, there is no guarantee that MRC files with extended header will be handled properly by all programs. Some programs may simply assume that everything after the 1024-byte header belong to the data array, even though this treatment disobeys either the MRC2014 standard or the CCP4 map standard.

Data block
The data block is a homogeneous array of same data type. The size of the data block is determined by words 1-3 NC, NR, NS and word 4, MODE:  len(data block) = NC * NR * NS * sizeof(mode). There is no beginning and ending markers for the data block. Normally the end of the data block should be end of the file, the beginning offset is NSYMBT+1024-1 (0-based counting).

Manipulating maps
Crystallographic maps usually can not be rotated freely without causing destruction of the crystallographic symmetry and periodicity. For visualization purposes, one can use a PDB file to define the extent and calculate a P1 map covering the molecule of interest using the program FFT.

Owing to the fact that maps are only 3D arrays of sampling points and its coordinate system is that of the unit cell, when a map is rotated, it is usually necessary to resample the new map in order to save the rotated map. Because maps are usually saved at only 2x - 4x spatial frequency of the highest resolution, one should expect to see some difference in numerical values when a map is resampled.