Return to Main Documents Page

Protein/deuterated Protein Complex Example SASSIE Workflow

In this workflow example we will model the conformation of a disordered region of the glycoprotein vitronectin (VN) that is in a complex with plasminogen activator inhibitor-1 (PAI-1). PAI-1 inhibits the activation of plasminogen to plasmin, which has a key role in fibrin degradation, Extra Cellular Matrix remodeling and cell migration. Binding of its co-factor, VN, stabilizes PAI-1 so that it can perform its function. Residues 1-39 of VN are structured when bound to PAI-1. However, residues 40-130 of VN remain disordered.

Regions of PAI/VN complex

The complex that was measured on the EQ-SANS instrument at Oak Ridge National Lab using contrast variation consists of one subunit of 60% deuterated PAI-1 in complex with one subunit of protonated VN. The Contrast Calculator module was used prior to the SANS experiment to determine that the match point of the complex occurs in 77% D2O buffer. The match point of the deuterated PAI-1 occurs in 88% D2O buffer (and that of VN in 44% D2O buffer). The complex was measured at a concentration of about 8 mg/ml in 0%, 20% and 85% D2O buffers. In 85% D2O buffer, the scattering from PAI-1 is negligible compared to that of VN.

An initial structure was built from a co-crystal structure (PDB ID 1OC0) of PAI-1/VN(1-39) with VN(40-130), along with other missing residues, added. The initial structure was energy-minimized and subjected to a short SASSIE Complex Monte Carlo run to obtain the "starting" structure shown above.

After calculating Rg and I(0) for the data at each contrast, the match point, Stuhrmann and parallel axis theorem analyses were performed using MulCh. The pertinent results that are needed for this analysis are shown below.

Data Analysis Results

NOTE: This work was part of the thesis of Letitia N. Puster. The example presented here begins from just one of several starting structures that were explored for this complex. The other starting structures considered the possibility that residues 40-130 contained some ordered regions, such as helices. Only a relatively small number of structures are generated in this example for illustration purposes. A real study typically requires 10,000 to 50,000 accepted structures to adequately sample configuration space.

In this example it is assumed that the sassie-web project name is test. Download the PAI-1 and VN sequences and the starting structure files pai_vn_start.pdb and pai_vn_start.psf to visualize the starting structure and perform the contrast calculations, PDB Scan and Monte Carlo analysis in this example. The data are not provided at this time since the manuscript is still in preparation. Substitute your own files to go through the entire example with your contrast variation data.

Contrast Calculator (run prior to experiment)

Here we demonstate the use of the Contrast Calculator module to aid in experiment planning. More information on the Contrast Calculator module is available in the Contrast Calculator documentation.

Select the 'Contrast Calculator' button from this 'Tools' menu. You should now see a page like the one below. This page is used to enter all of the information needed to calculate the SANS scattering length densities, contrasts and I(0) values for PAI-VN in H2O:D2O buffers.

contrast calculator input

Optional Input

The output screen lists the match points for the protein and 60% deuterated protein components in the complex as well as for the entire complex. The output also shows plots of the calculated neutron SLDs, contrasts, I(0) and sqrt[I(0)] as a function of % D2O in the solvent since the complex doesn't contain more than 2 components. Note that roll-over help will indicate options to resize, zoom and reset the view of the plots. Resultant files containing the SLD, contrast and I(0) values are written to a new directory, "contrast_calculator", within the run_0 directory.

contrast calculator output 1<- ->contrast calculator output 2<- ->contrast calculator output 3

What have we generated:

test/run_0/contrast_calcuator

Data Interpolation

Data interpolation is necessary to create a new data file that is spaced on a uniform grid from the experimental data file. The number of q-points, range of q, and the spacing of the q-points used to create the interpolated data files MUST match the input settings that you use in subsequent modules to calculate SAS profiles that are subsequently used to compare theoretical and experimental data. The interpolation must be performed for the data at all contrasts measured.

Since the data are not provided for this example, the data interpolation is not shown here. For an example of how to interpolate your own data, please refer to the Data Interpolation documentation.

The PAI-VN data were interpolated as follows:

PDB Scan

PDB Scan is used to assess whether an input PDB is ready for simulation and where possible to provide files enabling CHARMM forcefield parameterization. Information on missing atoms and residues and those not covered as standard by the CHARMM 27 forcefield are reported. PDB files do not need to have header information. At this time, only PDB files of proteins are supported. More information on the module is available in the PDB Scan documentation.

Select the 'Build' button from the Main Menu of SASSIE-web and then click on the 'PDB Scan' button.

You should now see a page like th one below. This page is used to enter all of the information needed to check the PDB file.

Input for the PDB Scan

pdb file input: The PDB file that we want to examine. Here we use the pai_vn_start.pdb file.

Once you have entered the file name, click on the 'Submit' button to start the file scan.

As the run continues the progress bar beneath the submit button should update. Once complete the output should look similar to the figure below.

Output1 for the PDB Scan

The text output region provides a brief summary of the PDB Scan report.

What have we generated:

test/run_0/pdbscan

Visualization

A JSmol vizualization of the protein is produced and is shown below the text output region. Holding down the left mouse button and moving the cursor over the picture allows you to rotate the view, the scroll wheel facilitates zooming in and out. Right clicking on the image allows you to access all of the JSmol options.

Output2 for the PDB Scan

The full PDB scan report can be found below the image of the structure.

Output3 for the PDB Scan

Output4 for the PDB Scan

The pai_vn_start.psf and pai_vn_start.pdb files can also be loaded into VMD or a similar structure viewer to observe the location of the disulphide bonds. This PDB file is ready for simulation so we can proceed to create an ensemble of structures for comparison to the SANS data.

Complex Monte Carlo

Complex Monte Carlo is used to generate a series of structures to sample configurations of the PAI-VN complex. The single flexible region in the VN subunit is shown in the structure and table above. More information on the module is available in the Complex Monte Carlo documentation.

Select the 'Simulate' button from the Main Menu of SASSIE-web and then click on the 'Complex Monte Carlo' button.

Complex Monte Carlo input

Complex Specific Input

Advanced Input Options

Here the advanced options are used to set a high Rg cutoff value and to direct the Monte Carlo to a particular Rg value. These values were chosen based on the Rg values obtained from the contrast variation data and from the contrast analysis. The Rg value calculated in this module is the molecular Rg derived from the coordinates. Thus, it DOES NOT depend on the contrast. Therefore, the Rg value is directed to 25 Å, which is close to the experimental values obtained in 0% and 20% D20. These are the contrasts that we want to refer to when deciding on the directed Rg value since the molecular Rg always includes both subunits. PAI doesn't contribute to the scattering in 85% D2O, so we don't want to direct the molecular Rg to this experimental Rg value.

Once complete the output should look similar to the figure below.

complex calculator output

The output will indicate various Rg values from the ensemble, acceptance and overlap statistics, and dimensions of the accepted structures in the final ensemble.

Results are written to a new directory within the given "run name" as noted in the output. In addition, a plot of Rg versus structure number is shown.

Several files are generated and saved to the "run name" complex_monte_carlo directory. A copy of the original input PDB file, the output DCD file containing accepted structures, files with Rg values as shown in the plot on the web-page, and run statistics.

What have we generated:

test/run_0/complex_monte_carlo

REMINDER: This example has only 698 accepted structures. For a real study, 10,000 to 50,000 accepted structures would be needed to adequately cover configuration space.

Initial SAS Curve Calculation - Single Structure for All Contrasts

We will use SasCalc to calculate the SANS curves from the generated structures. More information can be found in the SasCalc documentation. The starting structure must be a complete structure without missing residues or atoms (including hydrogen atoms) in order to obtain accurate scattering profiles. Atom and residue naming must be compatiable with those defined in the CHARMM force field.

The SasCalc module is first run using the "converged number of golden vectors" option on just one structure. We will calculate the SANS curves at each contrast for the PAI-VN starting structure. In order to create theoretical curves with data points at q values that match our interpolated data, we need three pieces of information:

Select the 'Beta' button from the Main Menu of SASSIE-web and then click on the 'SasCalc' button.

Since this is the first time we are running SasCalc on this structure, we are using the "converged number of golden vectors" option. The scattering profile is calculated several times using an increasing number of golden vectors until there is no difference between the calculated scattering curves to within a chosen tolerance (averaged over the entire scattering profile). This is useful for determining the fixed number of golden vectors that are needed to achieve convergence within a chosen tolerance. The smaller the tolerance value specified in the converged option, the smaller the differences in the scattering profiles over the entire q range specified. In this case, a tolerance of 0.001 has been chosen.

Input values for theoretical curve calculation using SasCalc

The single structure that we used to start the simulation is used as both the reference pdb and the trajectory file filename (PDB in this case) so it is already uploaded to the SASSIE-web server. Thus, you can either upload it again from your local computer or locate it on the server and read it from there.

Note that we are calculating the SANS curves for all three contrasts at the same time. The number of exchangeable H regions is 1 since the H-D exchange is the same for both protein subunits. Thus, the exchangeable H region can be set to "moltype protein" and the fraction of exchangeable H value is set to 0.95. Only the PAI-1 subunit is deuterated, so the number of deuterated regions is 1. The deuterated region can be specified using "segname PAI1" and the fraction deuterated is set to 0.6. Once the calculations are finished, the output will look similar to the figure below.

Output values for theoretical curve calculation using SasCalc

The output files are written to a sub-directory of sascalc/ that is named according to the D2O percentage in the solvent.

What have we generated:

test/run_0/sascalc/neutron_D2Op_85

Similarly, we have:

test/run_0/sascalc/neutron_D2Op_20

test/run_0/sascalc/neutron_D2Op_0

Second SAS Curve Calculation - Entire Ensemble for 85% D2O

Next we calculate a theoretical scattering curve for each of the trial structures we have generated for the contrast condition in 85% D2O. We are interested in the best fitting structures at all contrasts. However, since the PAI-1 structure is fixed, we want to narrow down the structures that fit the VN portion of the complex and then compare that subset of structures to the entire contrast variation data set to extract the best structures that fit all of the data. The run_0_00001.log files from the inital SAS Curve calculation at each contrast indicate that 103, 73 or 83 golden vectors were required for convergence to the desired tolerance (0.001 in this case) for 85%, 20% and 0% D2O, respectively. The log file for 85% D2O is shown below.

Output values first SAS curve calculation

We now use this information to calculate the scattering curves for all of the generated structures using the "fixed number of golden vectors" option from the SasCalc method menu as shown below.

Input values for second theoretical curve calculation using SasCalc

run name: run_1 to avoid overwriting the files we calculated in run_0 above

reference pdb: the starting structure that is already on the SASSIE-web server

trajectory file filename: the run_0.dcd file generated using Complex Monte Carlo

Once the calculations are finished, the output will look similar to the figure below.

Output values for second theoretical curve calculation using SasCalc

What have we generated:

test/run_1/sascalc/neutron_D2Op_85

Initial SAS Curve Comparison

Now we compare our theoretical curves to the experimental data taken in 85% D2O using Chi-Square Filter. More information can be found in the Chi-Square Filter documention.

Select the 'Analyze' button from the Main Menu of SASSIE-web and then click on the Chi Square Filter button.

The inputs are shown in the figure below.

Input for Chi-Square Filter

Note that you are required to select the path on the server containing the theoretical scattering curves, i.e., they can't be uploaded from your local computer. To set the path to the scattering curves generated in the previus step:

Selecting the sascalc folder in the file browser

Advanced Input Option

Check the box for Advanced Input and then check the box to keep existing run folder name. This will put the output in a directory under chi_square_filter that matches the path that you chose above, i.e., chi-square_filter/neutron_D2Op_85, so that it isn't overwritten when comparing data at more than one contast.

Once complete you should see outputs similar to those below.

Output from Chi-Square Filter module

In the text output you will see the minimum chi square (X2) values is given.

The top plot shows the variation of chi squared (y-axis) with the radius of gyration (x-axis).

NOTE that the Rg values in the plots are the scattering Rg values (calculated in SasCalc) and they are contrast-dependent! The low Rg values (< ~20 Å) are due to the fact that the disordered tail (40-130) of VN is so long that portions are interacting with its ordered, fixed region (1-39). (Remember that PAI-1 is not visible at this contrast!) Thus, there is no longer a true Guinier region due to this intra-particle interaction. This results in a depressed I(0) value and a lower apparent Rg. In some cases, a weak intra-particle interaction peak can even be observed in the scattering curves such that the Guinier region is undefined. In these cases, Rg is listed as "nan" in the output file. This will not occur in every system. It occurs at this contrast in this system since VN has a long disordered tail region compared to the size and mass of its ordered region. Thus, in some configurations, the two regions of VN essentially act as if they were separate particles (like two ends of a barbell) and can interfere with each other.

The bottom plot shows a direct comparison of the best, worst and average theoretical curves with experiment (goal).

What have we generated:

test/run_1/chi_square_filter/neutron_D2Op_85

/spectra

Second SAS Curve Comparison

Now that we know the range of chi square values that we have, we can compare the theoretical curves to the data a second time and create a weight file that flag all structures with chi square and Rg values in a desired range. Now, we set the 'number of weight files' to 1.

Input for Chi-Square Filter

run name: Since we already have a chi_square_filter folder in the run_0 directory, set the run name to run_1.

number of weight files: Set this to 1 using the down arrow associated with the input box.

Weight files contain information on which frames in our simulation meet specific criteria provided in the expression box.

enter expression[1]:


(x2 < 8) and (rg < 40) and (rg > 20)

Adjust these values if necessary to suit the results from your simulation.

weight file name[1]:

low Rg cutoff[1]:

Once complete the outputs should be exactly the same as for the previous run. The only difference is that the a weight file can now be found in the chi_square_filter/neutron_D2Op_85 folder.

What have we generated:

test/run_1/chi_square_filter/neutron_D2Op_85

/spectra

Trajectory Filtering

Now we can filter out the best fit structures and vizualize them using the Extract Utilities. More information can be found in the Extract Utilities documentation.

We will extract the frames from run_0.dcd that fit the data with x2 < 8 and Rg between 20 Å and 40 Å.

Input for the Extract Utilities module

When the process is finished your output should look like the one below.

Output for the Extract Utilities module

What have we generated:

test/run_1/extract_utilities

SAS Curve Calculation - All Contrasts

Next we calculate a theoretical scattering curve at all contrasts for each of the trial structures that we extracted above. We will use the "fixed number of golden vectors" option from the SasCalc method menu as shown below. We choose 103 golden vectors for all contrasts since that is the largest number of golden vectors we found were necessary to converge the calculated scattering curves to a tolerance of 0.001.

Input values for second theoretical curve calculation using SasCalc

run name: run_2 to avoid overwriting the files we calculated in run_1 above

reference pdb: the starting structure that is already on the SASSIE-web server

trajectory file filename: the 0_x2_8_20_rg_40.dcd file that we just extracted

Once the calculations are finished, the output will look similar to the figure below.

Output values for second theoretical curve calculation using SasCalc

What have we generated:

test/run_2/sascalc/neutron_D2Op_85

test/run_2/sascalc/neutron_D2Op_20

test/run_2/sascalc/neutron_D2Op_0

Initial SAS Curve Comparison - All Contrasts

Now we compare our theoretical curves to the experimental data taken at all 3 contrasts using Chi-Square Filter. More information can be found in the Chi-Square Filter documention.

85% D2O

We start with the 85% D2O case. The inputs are shown in the figure below.

Input for Chi-Square Filter

Advanced Input Option

Check the box for Advanced Input and then check the box to keep existing run folder name. This will put the output in a directory under chi_square_filter that matches the path that you chose above, i.e., chi-square_filter/neutron_D2Op_85, so that it isn't overwritten when comparing data at more than one contast.

Once complete you should see outputs similar to those below.

Output from Chi-Square Filter module

In the text output you will see the minimum chi square (X2) values is given.

The top plot shows the variation of chi squared (y-axis) with the radius of gyration (x-axis).

NOTE that there are some additional low Rg values (< ~20 Å) present. Each time the SANS curves are calculated, the deuteration of exchangeable and non-exchaneable hydrogen atoms is random. At this contrast (where PAI-1 is not contributing to the scattering), for those structures that have Guinier regions with nearly flat or negative slopes, a small change in the location of the deuterated hydrogen atoms can cause a relatively large change in calculated Rg. These structures do not agree with our 85% D2O data and we will exclude them in our second SAS curve comparison below.

The bottom plot shows a direct comparison of the best, worst and average theoretical curves with experiment (goal).

We now repeat the above analysis for the 0% and 20% D2O contrasts. The inputs and outputs are shown below.

20% D2O

Input for Chi-Square Filter

Output from Chi-Square Filter module

0% D2O

Input for Chi-Square Filter

Output from Chi-Square Filter module

NOTE that the best fit structures for 20% and 0% D2O have higher x2 values than those for 85% D2O. This is because the structure of PAI-1 is fixed and its Rg value is about 2 Å smaller than that found in our contrast analysis. Since PAI-1 is the main contributor to the scattering at these two contrasts, changes in VN structure are not enough to compensate for the Rg mismatch in PAI-1. Better x2 values may be able to be obtained by performing MD simulations or normal mode analysis (ProDy) on the PAI-1 portion of the complex to determine if conformations with slightly higher Rg values can be found. This Chi-Square Filter analysis can then be performed again on the new PAI-VN structures.

What have we generated:

Directories from Chi-Square Filter module

test/run_2/chi_square_filter/neutron_D2Op_85

/spectra

test/run_2/chi_square_filter/neutron_D2Op_20

/spectra

test/run_2/chi_square_filter/neutron_D2Op_0

/spectra

Second SAS Curve Comparison - All Contrasts

We now repeat the above SAS curve comparisons at all 3 contrasts and create a weight file that flags those structures with the best x2 values in each case.

85% D2O

We start with the 85% D2O case. The inputs are shown in the figure below.

Input for Chi-Square Filter

number of weight files: 1

enter expression[1]:


(x2 < 8) and (rg < 40) and (rg > 20)

weight file name[1]:

Advanced Input Option

Check the box for Advanced Input and then check the box to keep existing run folder name.

The outputs are the same as before except that the desired weight file has also been created.

We now repeat the above analysis for the 0% and 20% D2O contrasts. The inputs and outputs are shown below.

20% D2O

Input for Chi-Square Filter

number of weight files: 1

enter expression[1]:


x2 < 50

weight file name[1]:

0% D2O

Input for Chi-Square Filter

number of weight files: 1

enter expression[1]:


x2 < 40

weight file name[1]:

What have we generated:

test/run_3/chi_square_filter/neutron_D2Op_85

/spectra

test/run_3/chi_square_filter/neutron_D2Op_20

/spectra

test/run_3/chi_square_filter/neutron_D2Op_0

/spectra

Combined Weight File

Now, we want to create a single weight file that flags only those structures that are flagged in ALL 3 weight files that we just created. Thus, only those structures that are in the lowest x2 region for all 3 contrasts will be chosen.

An easy way to create this file is to take the average value of the weight flag for each structure from the 3 weight files. If the average value is 1, then the flag is 1 at all 3 contrasts. Otherwise, the average will be a value between 0 and 1. A Python program to perform this average can be downloaded below or you can use your favorite program to perform the average and write out a new weight file.

Input Files:

Python Program:

Output File:

Trajectory Filtering - All Contrasts

Now we can filter out the best fit structures for all 3 contrasts and vizualize them using the Extract Utilities. More information can be found in the Extract Utilities documentation.

We will extract the frames from 0_x2_8_20_rg_40.dcd that are flagged in new_weight_file.txt.

The inputs are shown in the figure below.

Input for the Extract Utilities module

When the process is finished your output should look like the one below.

Output for the Extract Utilities module

NOTE that there are only 8 frames that satisfy our criteria for this small sampling of structures. In a real study, several hundred to several thousand structures would be needed to adequately define the best fit ensemble for all 3 contrasts.

What have we generated:

test/run_3/extract_utilities

Analysis

The molecular (contrast-independent) Rg was calculated for each subunit, as well as the CM distance between the subunits, for each of the 8 best fit structures.

Analysis of Best Fit Structures

The Rg values for VN are in good agreement with the value found from the contrast analysis. The Rg value for PAI-1 is about 2 Å smaller than the value found from the contrast analysis as previously discussed. The CM distances in these structures are larger than the value of 18 Å that was found from the contrast analysis. This further suggests that our sampling hasn't fully covered configuration space since there should be structures which can satisfy this criterion.

Visualization

The structures in any of the generated DCD files can be visualized by loading the DCD files along with a suitable PDB file (pai_vn_start.pdb) into VMD. The figure on the left below shows the 8 structures in this example that best fit the data at all 3 contrasts. The figure on the right shows the calculated scattering curves for these 8 structures compared with the 85% D2O SANS data.

Best Structures All Contrasts SANS Curves for Best Fit Structures

Another way to visualize the structures is to use the Density Plot module. The density plot below shows two views of the envelope sampled by all of the accepted structures as well as that sampled by only the best fit structures for all 3 contrasts. The blue and red regions are the envelopes represented by PAI-1 (blue) and the fixed portion of VN (red). The black and yellow regions represent the envelope sampled by the flexible portion of VN for all accepted structures (yellow) and for the best fit structures at all 3 contrasts (black). It is clear from the view on the right that there are portions of configuration space that have not been covered in this limited sampling since there are no yellow or black surfaces covering the blue PAI-1 region.

Density Plot Density Plot

More information on how to use the Density Plot module can be found in the Density Plot documentation.

Finally, the figure below shows the calculated data for the single structure from our small sampling that best fits the data at all 3 contrasts (structure #4) along with the interpolated SANS data at those contrasts.

Best Fit SANS Curve

Reminder: This example has only 698 accepted structures and only 8 structures that are in the lowest x2 region for all 3 contrasts. For a real study, 10,000 to 50,000 accepted structures would be needed to better sample configuration space. This would hopefully allow us to find a wider range of structures that best fit the data at all 3 contrasts.

References

  1. SASSIE: A program to study intrinsically disordered biological molecules and macromolecular ensembles using experimental scattering restraints J. E. Curtis, S. Raghunandan, H. Nanda, S. Krueger, Comp. Phys. Comm. 183, 382-389 (2012). BIBTeX, EndNote, Plain Text
  2. Small-angle scattering contrast calculator for protein and nucleic acid complexes in solution K.L. Sarachan, J.E. Curtis, S. Krueger, J. Appl. Cryst. 46, 1889-1893 (2013). BIBTeX, EndNote, Plain Text
  3. MULCh: modules for the analysis of small-angle neutron contrast variation data from biomolecular assemblies A. E. Whitten, S. Cai, J. Trewhella, J. Appl. Cryst. 41, 222-226 (2008). BIBTeX, EndNote, Plain Text
  4. Letitia N. Puster thesis.

Return to Main Documents Page

Go to top