Machine Learning Algorithms

Project Buttons.015.png

In any imaging experiment, the resolution that can be achieved is limited by the wavelength of the “light” being used. In order to be able to resolve objects as small as molecules and atoms, you need to use a radiation source whose characteristic wavelength is of the order of the distance between two atoms in a covalent bond - i.e. about 1 Ångstrom (10^-10 meters). Hard x-rays can be produced which have wavelengths of this order, but the reconstruction of the original object from its scattering pattern cannot be achieved using a traditional optical system, and must be done mathematically by computers.

What is generated by the x-ray diffraction imaging experiment, is essentially a set of intensities corresponding to vectors with known magnitudes but unknown phases. Reconstruction of the original object requires the estimation of these unknown phases, which for a typical dataset of tens of thousands of x-ray diffraction intensities, is decidedly not a trivial problem.

The machine learning algorithm created to tackle this so-called “phase problem” was inspired by the kind of genetic and evolutionary adaptation of species to their environments, that is seen in nature. Very large (and initially random) populations of trial structures corresponding to the electron density of the asymmetric unit of the crystalline object were generated and their Fourier transforms calculated for comparison and ranking against the experimentally observed scattering intensities. This population of electron density structures is then subjected to a computational analog of genetic recombination, that favors the selection of the more highly ranked structures for recombination. Fourier transforms for the new recombinant population are calculated and the new population ranked against the experimental data, and the cycle is repeated over many generations.

Even using a fairly average desktop computer, this proof-of-concept algorithm proved able to successfully reconstruct the low resolution electron density for a medium-sized protein using only a few thousand of its unphased x-ray scattering intensities as a starting point.