Canadian Macromolecular Crystallography Facility

Data Collection Primer

Data Collection

The following sections are meant to provide practical advice on data collection on protein crystals at CMCF for those with experience collecting diffraction data. If you are new to collecting X-ray diffraction data, we advise working with your supervisor or other experienced mentor until familiar with the process.

Native Datasets

Native datasets are used to obtain high-resolution data in order to discern as much detail as possible about a structure or ligand. They are also used to solve new structures using Molecular Replacement methods when a similar structure is available. Therefore, obtaining the highest resolution spots is the primary motivation. However, it is also important to minimize overloaded pixels in the lower resolution spots for successful molecular replacement, as well as minimize radiation damage for a complete high quality dataset.

To accomplish these goals, the collection parameters must be carefully balanced. Detector distance is adjusted such that all the desired spots fall on the detector surface with maximal spot separation. An oscillation or rotation angle per image, along with start and total angles must be selected to minimize overlaps while maintaining a reasonable total time for data collection. In general, a sufficient total angle should be collected to achieve a multiplicity in the dataset of 4 or more whenever possible (above 1 in the case of P1). Finally, an exposure is chosen to obtain maximum intensity, balanced with the need of minimizing overloaded pixels and radiation damage which will decrease the quality of the dataset.

Detector Distance. During the initial screening, take note of the quality of diffraction, and how well the crystal diffraction fills the image. Use the zoom feature, brightness & contrast settings to inspect the image. The middle mouse wheel can be conveniently used to adjust the image brightness to visualize how far the spots extend from the centre of the image. You may not see the weakest spots if the image is too bright; darkening the image and zooming into the areas farther from the centre allows you to inspect these spots. Shorter detector distances result in higher resolution at the edges. Note there are detector distance limits built into the data collection software.

Guideline: detector distance should be adjusted so that spots extend to cover approximately 80% of the distance from the centre of image to edge of the detector. The faintest spots may not be visible to the eye but may still be present some distance from the visible ones.


Oscillation or Rotation Angle per image, Starting & Total Angles. The Autoprocess algorithm is run with each sample screen that has 3 or more images and the results can be viewed in the analysis window of MxDC. You may also run Autoprocess or other programs such as Mosflm manually in order to get an idea of your unit cell, symmetry parameters and obtain recommendations for oscillation/rotation angle per image, total angle, and the best starting angle. In general, smaller oscillation/rotation angles per image are used for larger unit cells. Larger total angles are generally needed for lower symmetry crystals.

CMCF-BM Guidelines: Optimize the oscillation angle, starting angle and total angle according to the Autoprocess recommendations. Oscillations between 0.25 up to 1.0 degree are recommended. When considering the total angle, aim for multiplicity of 4 or higher (above 1 in the case of P1). If there is uncertainty, an oscillations of 0.5 degrees with a total angle of 180 degrees is usually reasonable. Keep in mind the smaller the oscillation angle, and larger the exposure time and total angle, the longer your total data collection will take.

CMCF-ID Guidelines: Optimize the oscillation angle, starting angle and total angle according to the Autoprocess recommendations. A rotation angle per image between 0.1 to 0.2 degrees is usually reasonable (0.5 to 1 degree for screening). For large unit cells, 0.1 or 0.15 degrees may provide better results. When considering the total angle, aim for multiplicity of 4 or higher (above 1 in the case of P1). If there is uncertainty, a total angle of 180 degrees is usually sufficient.

Exposure. The MxDC software offers powerful options for optimizing the experiment. Beneath the diffraction image pane is an Information Button. The information window can be kept open during data collection and displays information about each image.

CMCF-BM Guidelines (MARCCD detector): Generally expect an average intensity between 25 and 200 (lower for well-diffracting crystals). The maximum intensity should generally be between 20,000 to 50,000 with few or no overloads (red pixels on MxDC). On beamline CMCF-BM, typical exposure per frame is between 2 and up to 15 seconds (0.2 - 1 degree oscillations), and the 200 or 100 micron beam size is recommended in most cases. Smaller beam sizes can be used but will have lower flux on the sample.

CMCF-ID Guidelines (PILATUS detector): Generally expect an average intensity between about 10 and 30. On beamline CMCF-ID, typical exposure is 0.2 to 0.4 seconds / 0.2 degrees. The time should be chosen to be roughly equivalent to the angle increment, delta omega, (for example 0.2 seconds / 0.2 degrees). Higher exposure times / angle increment will result in increasing radiation damage to the sample with minimal resolution gains. The beam size is fixed at 50 microns.

Recommended Reading

SAD Datasets

Single-wavelength Anomalous Diffraction (SAD) datasets are used to identify heavy-atom positions and/or solve new structures for which a suitable Molecular Replacement model is unavailable. Data are generally collected at the peak of the absorption curve of a naturally-occuring heavy atom or heavy atom derivative. Very accurate measurement of the lower-resolution data is important in order to measure anomalous intensity differences. To accomplish this, radiation damage must be minimized and good quality high multiplicity data obtained.

Planning. Both CMCF beamlines can be used to collect anomalous datasets, but in general, the CMCF-BM beamline is chosen. It is more intuitive to avoid over-exposure and radiation damage on this beamline, while still obtaining sufficient intensities for SAD phasing. Before starting, be familiar with the heavy atom being used and its energy absorption edges. This information can be found on X-ray Absorption Edge Tables, and on the Scans page of MxDC. The CMCF beamlines can generally reach energies between 6 - 18 keV. For energies below this, the absorption edge cannot be reached and Sulfur-SAD (S-SAD) methods should be used instead.

Wavelength/Energy. Check the X-ray Absorption Edge Tables to choose an appropriate accessible energy for your heavy atom derivative (between 6 - 18 keV), or examine the periodic table display on the Scans page of MxDC. Adjust the beamline energy to a value near the edge of interest and, in the case of CMCF-ID, optimize the beam before continuing.

Before starting the MAD Scan, make sure your sample is centred properly. It is a good time to take the diffraction screening images to ensure the crystal is centred, to check the quality of diffraction, and to also get the Autoprocess screening algorithm started. Remember to use the "anomalous" option in Autoprocess or Mosflm to obtain an appropriate data collection strategy.

From the Scans page in MxDC, perform a MAD Scan. The following is an example of a scan obtained from a Zn-containing sample.

mad scan

Along the x-axis, the energy is displayed in units of keV. The y-axis represents fluorescence counts. The fluorescence detector will saturate around 15,000 counts so attenuation is needed if the readings are too high. If the result is a flat line at 0 counts, check that there is appropriately optimized beam, that the beamline shutters are open and that your sample is centred properly. If there is still no reading, call your user support person for help. If there is no metal in the sample, you will get a low level reading that is essentially noise with no distinct peak. Exposure can be increased (and/or attenuation decreased) to ensure fluorescent count levels of around 50 - 100. This is a good starting base level for MAD Scans in general.

Note: An XRF Scan (Excitation Scan) can be used instead of a MAD Scan to identify metals in the sample if there is uncertainty. To perform an Excitation Scan, the beamline energy must be optimized above the absorption edge(s) of the atom(s) of interest, instead of near the absorption edge of the metal of interest.

Once the MAD Scan is complete, Chooch will automatically be run and the peak energy output, along with inflection energy and a suggested remote energy. Inflection & remote energies would be used in addition to the peak energy for a Multiple-wavelength Anomalous Diffraction (MAD) experiment if desired. The calculated peak energy should correspond to the peak energy visible in the plot, and is the energy used for SAD data collection. 

Adjust the beamline to the peak energy and, in the case of CMCF-ID, optimize. In the Data page of MxDC, make sure the peak energy and other values are correctly defined before starting the collection.

Detector Distance. In general, detector distance can be chosen as for native datasets, described above.

Oscillation or Rotation Angle per image, Starting & Total Angles. When the Autoprocess screening algorithm is run with the anomalous mode enabled, suggested values will be provided for collecting anomalous data. Mosflm has a similar anomalous option. An important difference for setting these values, as compared with native data, is the total angle.

Guideline: Larger total angle is needed for more multiplicity; 360 degrees is not uncommon. The amount of data collected may be judged by the anomalous signal obtained after collecting some images and/or solving the metal sites. Autoprocess in anomalous mode provides "anomalous signal" and "anomalous correlation" values. Look for anomalous signal at least above ~1 in low resolution shells, with significant correlation. The higher the better. At some point, radiation damage will outweigh the benefit of collecting more frames. In some cases, combining data from multiple crystals may be necessary. 

Exposure. This should be much less than for native datasets. Radiation damage must be minimized as much as possible, therefore:

CMCF-BM Guideline: Aim for maximum intensity between 5,000 - 20,000, with no overloads, attenuation may be needed.

CMCF-ID Guideline: 0.2 seconds or more should be used to keep detector readout error <1%; beam attenuation is thus usually required to minimize radiation damage.

Recommended Reading