[Return to Main Documents Page](../../sassie

Return to Main Documents Page

Protein/deuterated Protein Complex Example SASSIE Workflow

In this workflow example we will model the conformation of a disordered region of the glycoprotein vitronectin (VN) that is in a complex with plasminogen activator inhibitor-1 (PAI-1). PAI-1 inhibits the activation of plasminogen to plasmin, which has a key role in fibrin degradation, Extra Cellular Matrix remodeling and cell migration. Binding of its co-factor, VN, stabilizes PAI-1 so that it can perform its function. Residues 1-39 of VN are structured when bound to PAI-1. However, residues 40-130 of VN remain disordered.

Regions of PAI/VN complex

The complex that was measured on the EQ-SANS instrument at Oak Ridge National Lab using contrast variation consists of one subunit of 60% deuterated PAI-1 in complex with one subunit of protonated VN. The Contrast Calculator module was used prior to the SANS experiment to determine that the match point of the complex occurs in 77% D2O buffer. The match point of the deuterated PAI-1 occurs in 88% D2O buffer (and that of VN in 44% D2O buffer). The complex was measured at a concentration of about 8 mg/ml in 0%, 20% and 85% D2O buffers. In 85% D2O buffer, the scattering from PAI-1 is negligible compared to that of VN.

An initial structure was built from a co-crystal structure (PDB ID 1OC0) of PAI-1/VN(1-39) with VN(40-130), along with other missing residues, added. The initial structure was energy-minimized and subjected to a short SASSIE Complex Monte Carlo run to obtain the "starting" structure shown above.

After calculating Rg and I(0) for the data at each contrast, the match point, Stuhrmann and parallel axis theorem analyses were performed using MulCh. The pertinent results that are needed for this analysis are shown below.

Data Analysis Results

NOTE: This work was part of the thesis of Letitia N. Puster. The example presented here begins from just one of several starting structures that were explored for this complex. The other starting structures considered the possibility that residues 40-130 contained some ordered regions, such as helices. Only a relatively small number of structures are generated in this example for illustration purposes. A real study typically requires 10,000 to 50,000 accepted structures to adequately sample configuration space.

In this example it is assumed that the sassie-web project name is test. Download the PAI-1 and VN sequences and the starting structure files pai_vn_start.pdb and pai_vn_start.psf to visualize the starting structure and perform the contrast calculations, PDB Scan and Monte Carlo analysis in this example. The data are not provided at this time since the manuscript is still in preparation. Substitute your own files to go through the entire example with your contrast variation data.

Contrast Calculator (run prior to experiment)

Here we demonstate the use of the Contrast Calculator module to aid in experiment planning. More information on the Contrast Calculator module is available in the Contrast Calculator documentation.

Select the 'Contrast Calculator' button from this 'Tools' menu. You should now see a page like the one below. This page is used to enter all of the information needed to calculate the SANS scattering length densities, contrasts and I(0) values for PAI-VN in H2O:D2O buffers.

contrast calculator input

run name: user defined name of folder that will contain the results.
output file name prefix: The main part of the name of three files that will contain the SLD, contrast and I(0) calculations.
number of protein, DNA or RNA input files: The number of FASTA and/or PDB files that will be used in the complex. In this case, there is one file for PAI and one for VN, so the number of protein, DNA or RNA input files is 2. Additional information regarding the input files then must be provided as follows.
- FASTA? [1]: Select ON since the PAI sequence is in FASTA format.
- input file name [1]: Input FASTA file for the PAI protein.
- number of units [1]: The number of PAI proteins in the complex, which is 1 in this case.
- fraction deuterated [1]: The fraction of the PAI non-exchangeable hydrogen atoms, i.e., those bound to carbon atoms, that have been replaced with deuterium, which is 0.6 in this case since the protein is 60% deuterated.
- molecular type [1]: Choose protein from the pull-down menu.
- FASTA? [2]: Select ON since the VN protein sequence is in FASTA format.
- input file name [2]: Input FASTA file for the VN protein.
- number of units [2]: The number of VN proteins in the complex, which is 1 in this case.
- fraction deuterated [2]: The fraction of the protein non-exchangeable hydrogen atoms, i.e., those bound to carbon atoms, that have been replaced with deuterium, which is 0 in this case.
- molecular type [2]: Choose protein from the pull-down menu.
number of additional components: The number of additional components in the complex that are not protein, DNA or RNA, such as polymers, lipids, carbohydrates, etc. If this number is not zero, then additional input fields will be provided to enter the desired chemical formula, mass density and other pertinent information. (If the number of protein, DNA or RNA input files is 0, then this number cannot be 0.)

Optional Input

total solute concentration (mg/ml): The concentration of the complex (both proteins in this case). This information is needed to calculate the x-ray and neutron I(0) values. The default value of 1 mg/ml is convenient since the calculated I(0) values can easily be multiplied by a different concentration value to easily obtain the I(0) values for the new concentration. In this case, the calculated values will need to be multiplied by 8 to obtain the values expected for the PAI-VN data.
solvent % D2O step: The step increment in the % D2O in the solvent for which the neutron SLD, contrast and I(0) values will be calculated.
fraction exchangeable H (protein): The fraction of exchangeable hydrogen atoms, i.e., those bound to nitrogen and oxygen, in the protein components in the complex that actually do exchange in solvents containing D2O. This value applies to all protein molecules in the complex and is automatically adjusted according to the % D2O in the solvent.
fraction exchangeable H (nucleic acid): The fraction of exchangeable hydrogen atoms, i.e., those bound to nitrogen and oxygen, in the DNA and/or RNA components in the complex that actually do exchange in solvents containing D2O. This value applies to all DNA and RNA molecules in the complex and is automatically adjusted according to the % D2O in the solvent. Since there are no DNA or RNA molecules in this complex, this number will be ignored.
number of non-water solvent components: The number of components in the solvent that are not water, i.e., sugars, salts, glycerol, components of Tris, HEPES, phosphate buffers, etc. In this case, the number of components is 0 and no additional information is required.

The output screen lists the match points for the protein and 60% deuterated protein components in the complex as well as for the entire complex. The output also shows plots of the calculated neutron SLDs, contrasts, I(0) and sqrt[I(0)] as a function of % D2O in the solvent since the complex doesn't contain more than 2 components. Note that roll-over help will indicate options to resize, zoom and reset the view of the plots. Resultant files containing the SLD, contrast and I(0) values are written to a new directory, "contrast_calculator", within the run_0 directory.

contrast calculator output 1 <- -> contrast calculator output 2 <- -> contrast calculator output 3

What have we generated:

test/run_0/contrast_calcuator

pai_vn_izero.txt
pai_vn_sld.txt
pai_vn_contrast.txt

Data Interpolation

Data interpolation is necessary to create a new data file that is spaced on a uniform grid from the experimental data file. The number of q-points, range of q, and the spacing of the q-points used to create the interpolated data files MUST match the input settings that you use in subsequent modules to calculate SAS profiles that are subsequently used to compare theoretical and experimental data. The interpolation must be performed for the data at all contrasts measured.

Since the data are not provided for this example, the data interpolation is not shown here. For an example of how to interpolate your own data, please refer to the Data Interpolation documentation.

The PAI-VN data were interpolated as follows:

Intensity and Q=0, I(0): contrast dependent; taken from table above
Maximum value of Q: 0.25 (units are inverse Angstroms)
Number of points in the curve: 51

PDB Scan

PDB Scan is used to assess whether an input PDB is ready for simulation and where possible to provide files enabling CHARMM forcefield parameterization. Information on missing atoms and residues and those not covered as standard by the CHARMM 27 forcefield are reported. PDB files do not need to have header information. At this time, only PDB files of proteins are supported. More information on the module is available in the PDB Scan documentation.

Select the 'Build' button from the Main Menu of SASSIE-web and then click on the 'PDB Scan' button.

You should now see a page like th one below. This page is used to enter all of the information needed to check the PDB file.

Input for the PDB Scan

pdb file input: The PDB file that we want to examine. Here we use the pai_vn_start.pdb file.

Once you have entered the file name, click on the 'Submit' button to start the file scan.

As the run continues the progress bar beneath the submit button should update. Once complete the output should look similar to the figure below.

Output1 for the PDB Scan

The text output region provides a brief summary of the PDB Scan report.

What have we generated:

test/run_0/pdbscan

coor_chain_X_pai_vn_start.pdb: output PDB file; same as the input PDB file
pai_vn_start_pdbscan.html: HTML file containing the full PDB scan report

Visualization

A JSmol vizualization of the protein is produced and is shown below the text output region. Holding down the left mouse button and moving the cursor over the picture allows you to rotate the view, the scroll wheel facilitates zooming in and out. Right clicking on the image allows you to access all of the JSmol options.

Output2 for the PDB Scan

The full PDB scan report can be found below the image of the structure.

Output3 for the PDB Scan

Output4 for the PDB Scan

The pai_vn_start.psf and pai_vn_start.pdb files can also be loaded into VMD or a similar structure viewer to observe the location of the disulphide bonds. This PDB file is ready for simulation so we can proceed to create an ensemble of structures for comparison to the SANS data.

Complex Monte Carlo

Complex Monte Carlo is used to generate a series of structures to sample configurations of the PAI-VN complex. The single flexible region in the VN subunit is shown in the structure and table above. More information on the module is available in the Complex Monte Carlo documentation.

Select the 'Simulate' button from the Main Menu of SASSIE-web and then click on the 'Complex Monte Carlo' button.

Complex Monte Carlo input

run name: user defined name of folder that will contain the results.
reference pdb: PDB file with naming information and coordinates of the starting structure.
output file name (dcd): Name of ouput DCD file containing accepted structures from the simulation.
number of trial attempts: Number of Monte Carlo moves to attempt.
return to previous structure: After this number of Monte Carlo moves fails to find an accepted configuration, re-load a previously accepted structure.
temperature (K): Simulation temperature.

Complex Specific Input

Enter TOTAL number of segments: An integer value indicating the number of rigid and flexible segments in the input PDB file.
number of flexible segments: An integer value indicating the number of regions to sample backbone torsions.
- molecule type: Select protein.
- flexible segment name: Name of particular flexible segment .
- number of flexible regions: An integer value indicating the number of regions to sample backbone torsions.
- flexible residue range(s): Residue numbers defining each flexible region in segment. The number of pairs should match the number of flexible regions for the given segment. Pairs of integers separated by hypens with each pair separated by commas.
- maximum angle(s): Angle, in degrees, that can be sampled in a single move for each region.
- structure alignment range: Residue to define the beginning of region used to align structures for the given segment.
overlap basis: Select either heavy atoms, all, backbone or enter atom name. The atom name option will spawn futher inputs:
- overlap basis: Enter an atom name to check for overlap.
- overlap cutoff (angsgtroms): Overlap basis atoms closer than this distance defines an overlap condition.

Advanced Input Options

Here the advanced options are used to set a high Rg cutoff value and to direct the Monte Carlo to a particular Rg value. These values were chosen based on the Rg values obtained from the contrast variation data and from the contrast analysis. The Rg value calculated in this module is the molecular Rg derived from the coordinates. Thus, it DOES NOT depend on the contrast. Therefore, the Rg value is directed to 25 Å, which is close to the experimental values obtained in 0% and 20% D20. These are the contrasts that we want to refer to when deciding on the directed Rg value since the molecular Rg always includes both subunits. PAI doesn't contribute to the scattering in 85% D2O, so we don't want to direct the molecular Rg to this experimental Rg value.

low Rg cutoff: Structures with Rg values less than this value are discarded.
high Rg cutoff: Structures with Rg values greater than this value are discarded.
directed Monte Carlo (0==no or Rg value): Enter a non-zero value to use an extra energy term in the Monte Carlo sampling to favor Rg values towards the supplied value. The default value is zero which indicates that no bias is implemented.
check box to use Z coordinate filter: Check box to implement the ability to discard structures with any Z coordinates with a value less than the user supplied Z cutoff value.
- Z cutoff: Value in angstroms to determine wheter a structure should be discarded.
check box to use atomic constraints: Check box to implement the ability to discard structures that do not satisfy the atomic / geometric constraints provided in the user defined constraint file.
- constraint file name: Choose a text file with constraint definitions. See Constraints for guidance as to how to create such a file with desired constraints.

Once complete the output should look similar to the figure below.

complex calculator output

The output will indicate various Rg values from the ensemble, acceptance and overlap statistics, and dimensions of the accepted structures in the final ensemble.

Results are written to a new directory within the given "run name" as noted in the output. In addition, a plot of Rg versus structure number is shown.

Several files are generated and saved to the "run name" complex_monte_carlo directory. A copy of the original input PDB file, the output DCD file containing accepted structures, files with Rg values as shown in the plot on the web-page, and run statistics.

What have we generated:

test/run_0/complex_monte_carlo

run_0.dcd: DCD files containing accepted trajectories
run_0.dcd.all_rg_results_data.txt: structure number and Rg value for all structures
run_0.dcd.accepted_rg_results_data.txt: structure number and Rg value for accepted structures
run_0.dcd.stats: run statistics

REMINDER: This example has only 698 accepted structures. For a real study, 10,000 to 50,000 accepted structures would be needed to adequately cover configuration space.

Initial SAS Curve Calculation - Single Structure for All Contrasts

We will use SasCalc to calculate the SANS curves from the generated structures. More information can be found in the SasCalc documentation. The starting structure must be a complete structure without missing residues or atoms (including hydrogen atoms) in order to obtain accurate scattering profiles. Atom and residue naming must be compatiable with those defined in the CHARMM force field.

The SasCalc module is first run using the "converged number of golden vectors" option on just one structure. We will calculate the SANS curves at each contrast for the PAI-VN starting structure. In order to create theoretical curves with data points at q values that match our interpolated data, we need three pieces of information:

Intensity and Q=0, I(0): taken from table above
Maximum value of Q: 0.25 (units are inverse Angstroms)
Number of points in the curve: 51

Select the 'Beta' button from the Main Menu of SASSIE-web and then click on the 'SasCalc' button.

Since this is the first time we are running SasCalc on this structure, we are using the "converged number of golden vectors" option. The scattering profile is calculated several times using an increasing number of golden vectors until there is no difference between the calculated scattering curves to within a chosen tolerance (averaged over the entire scattering profile). This is useful for determining the fixed number of golden vectors that are needed to achieve convergence within a chosen tolerance. The smaller the tolerance value specified in the converged option, the smaller the differences in the scattering profiles over the entire q range specified. In this case, a tolerance of 0.001 has been chosen.

Input values for theoretical curve calculation using SasCalc

The single structure that we used to start the simulation is used as both the reference pdb and the trajectory file filename (PDB in this case) so it is already uploaded to the SASSIE-web server. Thus, you can either upload it again from your local computer or locate it on the server and read it from there.

Note that we are calculating the SANS curves for all three contrasts at the same time. The number of exchangeable H regions is 1 since the H-D exchange is the same for both protein subunits. Thus, the exchangeable H region can be set to "moltype protein" and the fraction of exchangeable H value is set to 0.95. Only the PAI-1 subunit is deuterated, so the number of deuterated regions is 1. The deuterated region can be specified using "segname PAI1" and the fraction deuterated is set to 0.6. Once the calculations are finished, the output will look similar to the figure below.

Output values for theoretical curve calculation using SasCalc

The output files are written to a sub-directory of sascalc/ that is named according to the D2O percentage in the solvent.

What have we generated:

test/run_0/sascalc/neutron_D2Op_85

run_0_00001.iq: files containing theoretical scattering data
run_0_00001.log: log files containing information about the structure and calculation inputs
D2Op_85.pdb: input PDB file with element names including deuterium atoms that were added as a result of H-D exchange of exchangeable hydrogen atoms or deuteration of non-exchangeable hydrogen atoms.
HDexchange_Info_D2Op_85.txt: file describing which hydrogen atoms were replaced with deuterium as a result of H-D exchange of exchangeable hydrogen atoms.
Deuteration_info_D2Op_85.txt: file describing which hydrogen atoms were replaced with deuterium atoms as a result of deuteration of non-exchangeable hydrogen atoms.

Similarly, we have:

test/run_0/sascalc/neutron_D2Op_20

run_0_00001.iq
run_0_00001.log
D2Op_20.pdb
HDexchange_Info_D2Op_20.txt
Deuteration_info_D2Op_20.txt

test/run_0/sascalc/neutron_D2Op_0

run_0_00001.iq
run_0_00001.log
D2Op_0.pdb
HDexchange_Info_D2Op_0.txt
Deuteration_info_D2Op_0.txt

Second SAS Curve Calculation - Entire Ensemble for 85% D2O

Next we calculate a theoretical scattering curve for each of the trial structures we have generated for the contrast condition in 85% D2O. We are interested in the best fitting structures at all contrasts. However, since the PAI-1 structure is fixed, we want to narrow down the structures that fit the VN portion of the complex and then compare that subset of structures to the entire contrast variation data set to extract the best structures that fit all of the data. The run_0_00001.log files from the inital SAS Curve calculation at each contrast indicate that 103, 73 or 83 golden vectors were required for convergence to the desired tolerance (0.001 in this case) for 85%, 20% and 0% D2O, respectively. The log file for 85% D2O is shown below.

Output values first SAS curve calculation

We now use this information to calculate the scattering curves for all of the generated structures using the "fixed number of golden vectors" option from the SasCalc method menu as shown below.

Input values for second theoretical curve calculation using SasCalc

run name: run_1 to avoid overwriting the files we calculated in run_0 above

reference pdb: the starting structure that is already on the SASSIE-web server

trajectory file filename: the run_0.dcd file generated using Complex Monte Carlo

Once the calculations are finished, the output will look similar to the figure below.

Output values for second theoretical curve calculation using SasCalc

What have we generated:

test/run_1/sascalc/neutron_D2Op_85

run_1_00001.iq to run_1_00698.iq
run_1_00001.log to run_1_00698.log
D2Op_85.pdb
HDexchange_Info_D2Op_85.txt
Deuteration_info_D2Op_85.txt

Initial SAS Curve Comparison

Now we compare our theoretical curves to the experimental data taken in 85% D2O using Chi-Square Filter. More information can be found in the Chi-Square Filter documention.

Select the 'Analyze' button from the Main Menu of SASSIE-web and then click on the Chi Square Filter button.

The inputs are shown in the figure below.

Input for Chi-Square Filter

Note that you are required to select the path on the server containing the theoretical scattering curves, i.e., they can't be uploaded from your local computer. To set the path to the scattering curves generated in the previus step:

Click on the 'Browse server for a path' button
Navigate to the test project folder and select the run_1/sascalc/neutron_D2Op_85 folder (see picture below) and click 'OK'.

Selecting the sascalc folder in the file browser

I(0) must match that from the 85% D2O data: 0.013
number of weight files Set this to 0 since we don't know what x2 and Rg values exist until we perform this first comparison.

Advanced Input Option

Check the box for Advanced Input and then check the box to keep existing run folder name. This will put the output in a directory under chi_square_filter that matches the path that you chose above, i.e., chi-square_filter/neutron_D2Op_85, so that it isn't overwritten when comparing data at more than one contast.

Once complete you should see outputs similar to those below.

Output from Chi-Square Filter module

In the text output you will see the minimum chi square (X2) values is given.

The top plot shows the variation of chi squared (y-axis) with the radius of gyration (x-axis).

NOTE that the Rg values in the plots are the scattering Rg values (calculated in SasCalc) and they are contrast-dependent! The low Rg values (< ~20 Å) are due to the fact that the disordered tail (40-130) of VN is so long that portions are interacting with its ordered, fixed region (1-39). (Remember that PAI-1 is not visible at this contrast!) Thus, there is no longer a true Guinier region due to this intra-particle interaction. This results in a depressed I(0) value and a lower apparent Rg. In some cases, a weak intra-particle interaction peak can even be observed in the scattering curves such that the Guinier region is undefined. In these cases, Rg is listed as "nan" in the output file. This will not occur in every system. It occurs at this contrast in this system since VN has a long disordered tail region compared to the size and mass of its ordered region. Thus, in some configurations, the two regions of VN essentially act as if they were separate particles (like two ends of a barbell) and can interfere with each other.

The bottom plot shows a direct comparison of the best, worst and average theoretical curves with experiment (goal).

What have we generated:

test/run_1/chi_square_filter/neutron_D2Op_85

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

Second SAS Curve Comparison

Now that we know the range of chi square values that we have, we can compare the theoretical curves to the data a second time and create a weight file that flag all structures with chi square and Rg values in a desired range. Now, we set the 'number of weight files' to 1.

Input for Chi-Square Filter

run name: Since we already have a chi_square_filter folder in the run_0 directory, set the run name to run_1.

number of weight files: Set this to 1 using the down arrow associated with the input box.

Weight files contain information on which frames in our simulation meet specific criteria provided in the expression box.

enter expression[1]:

We want to flag only those structures with x2 < 8 and Rg between 20 Å and 40 Å. So, we enter the following expression:


(x2 < 8) and (rg < 40) and (rg > 20)

Adjust these values if necessary to suit the results from your simulation.

weight file name[1]:

Enter 0_x2_8_20_rg_40.txt

low Rg cutoff[1]:

Enter a value if you wish to exclude files with Rg below that value from the analysis. In this case, we chose to restrict Rg to values between 20 and 40 using the expression above. Therefore, we didn't change the low Rg cutoff from its default value of 0.

Once complete the outputs should be exactly the same as for the previous run. The only difference is that the a weight file can now be found in the chi_square_filter/neutron_D2Op_85 folder.

What have we generated:

test/run_1/chi_square_filter/neutron_D2Op_85

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures
0_x2_8_20_rg_40.txt: weights file selecting only frames with x2 < 8, and Rg between 20 Å and 40 Å.

/spectra

spec_*.ciq: scattering curves scaled to correct I(0) for each structure

Trajectory Filtering

Now we can filter out the best fit structures and vizualize them using the Extract Utilities. More information can be found in the Extract Utilities documentation.

We will extract the frames from run_0.dcd that fit the data with x2 < 8 and Rg between 20 Å and 40 Å.

Select 'Tools' from the Main Menu and click the 'Extract Utilities' button.
Check the tick box labelled 'extract trajectory' to reveal the options shown in the figure below.
NOTE: 'extract SAS' is not yet implemented for SasCalc output.

Input for the Extract Utilities module

output filename: Enter 0_x2_8_20_rg_40.dcd
Choose 'weight file' from the 'select option' listbox.
- input name of weight file: Select the '0_x2_8_20_rg_40.txt' file generated in the last step that selects only the frames we want.

When the process is finished your output should look like the one below.

Output for the Extract Utilities module

What have we generated:

test/run_1/extract_utilities

0_x2_8_20_rg_40.dcd: DCD containing only the frames for which the theoretical scattering curve is a good match to experiment.

SAS Curve Calculation - All Contrasts

Next we calculate a theoretical scattering curve at all contrasts for each of the trial structures that we extracted above. We will use the "fixed number of golden vectors" option from the SasCalc method menu as shown below. We choose 103 golden vectors for all contrasts since that is the largest number of golden vectors we found were necessary to converge the calculated scattering curves to a tolerance of 0.001.

Input values for second theoretical curve calculation using SasCalc

run name: run_2 to avoid overwriting the files we calculated in run_1 above

reference pdb: the starting structure that is already on the SASSIE-web server

trajectory file filename: the 0_x2_8_20_rg_40.dcd file that we just extracted

Once the calculations are finished, the output will look similar to the figure below.

Output values for second theoretical curve calculation using SasCalc

What have we generated:

test/run_2/sascalc/neutron_D2Op_85

run_1_00001.iq to run_1_00162.iq
run_1_00001.log to run_1_00162.log
D2Op_85.pdb
HDexchange_Info_D2Op_85.txt
Deuteration_info_D2Op_85.txt

test/run_2/sascalc/neutron_D2Op_20

run_0_00001.iq to run_0_00162.iq
run_0_00001.log to run_1_00162.log
D2Op_20.pdb
HDexchange_Info_D2Op_20.txt
Deuteration_info_D2Op_20.txt

test/run_2/sascalc/neutron_D2Op_0

run_0_00001.iq to run_1_00162.iq
run_0_00001.log to run_1_00162.log
D2Op_0.pdb
HDexchange_Info_D2Op_0.txt
Deuteration_info_D2Op_0.txt

Initial SAS Curve Comparison - All Contrasts

Now we compare our theoretical curves to the experimental data taken at all 3 contrasts using Chi-Square Filter. More information can be found in the Chi-Square Filter documention.

85% D2O

We start with the 85% D2O case. The inputs are shown in the figure below.

Input for Chi-Square Filter

run name run_2 to avoid overwriting earlier calculations
I(0) must match that from the 85% D2O data: 0.013
number of weight files Set this to 0 since we don't know what x2 and Rg values exist until we perform this first comparison.

Advanced Input Option

Once complete you should see outputs similar to those below.

Output from Chi-Square Filter module

In the text output you will see the minimum chi square (X2) values is given.

The top plot shows the variation of chi squared (y-axis) with the radius of gyration (x-axis).

NOTE that there are some additional low Rg values (< ~20 Å) present. Each time the SANS curves are calculated, the deuteration of exchangeable and non-exchaneable hydrogen atoms is random. At this contrast (where PAI-1 is not contributing to the scattering), for those structures that have Guinier regions with nearly flat or negative slopes, a small change in the location of the deuterated hydrogen atoms can cause a relatively large change in calculated Rg. These structures do not agree with our 85% D2O data and we will exclude them in our second SAS curve comparison below.

The bottom plot shows a direct comparison of the best, worst and average theoretical curves with experiment (goal).

We now repeat the above analysis for the 0% and 20% D2O contrasts. The inputs and outputs are shown below.

20% D2O

Input for Chi-Square Filter

I(0) must match that from the 20% D2O data: 0.534

Output from Chi-Square Filter module

0% D2O

Input for Chi-Square Filter

I(0) must match that from the 0% D2O data: 0.85

Output from Chi-Square Filter module

NOTE that the best fit structures for 20% and 0% D2O have higher x2 values than those for 85% D2O. This is because the structure of PAI-1 is fixed and its Rg value is about 2 Å smaller than that found in our contrast analysis. Since PAI-1 is the main contributor to the scattering at these two contrasts, changes in VN structure are not enough to compensate for the Rg mismatch in PAI-1. Better x2 values may be able to be obtained by performing MD simulations or normal mode analysis (ProDy) on the PAI-1 portion of the complex to determine if conformations with slightly higher Rg values can be found. This Chi-Square Filter analysis can then be performed again on the new PAI-VN structures.

What have we generated:

Directories from Chi-Square Filter module

test/run_2/chi_square_filter/neutron_D2Op_85

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

test/run_2/chi_square_filter/neutron_D2Op_20

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

test/run_2/chi_square_filter/neutron_D2Op_0

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

Second SAS Curve Comparison - All Contrasts

We now repeat the above SAS curve comparisons at all 3 contrasts and create a weight file that flags those structures with the best x2 values in each case.

85% D2O

We start with the 85% D2O case. The inputs are shown in the figure below.

Input for Chi-Square Filter

run name run_3 to avoid overwriting earlier calculations
I(0) must match that from the 85% D2O data: 0.013

number of weight files: 1

enter expression[1]:


(x2 < 8) and (rg < 40) and (rg > 20)

weight file name[1]:

Enter 0_x2_8_20_rg_40.txt

Advanced Input Option

Check the box for Advanced Input and then check the box to keep existing run folder name.

The outputs are the same as before except that the desired weight file has also been created.

We now repeat the above analysis for the 0% and 20% D2O contrasts. The inputs and outputs are shown below.

20% D2O

Input for Chi-Square Filter

I(0) must match that from the 20% D2O data: 0.534

number of weight files: 1

enter expression[1]:


x2 < 50

weight file name[1]:

Enter x2_lt_50.txt

0% D2O

Input for Chi-Square Filter

I(0) must match that from the 0% D2O data: 0.85

number of weight files: 1

enter expression[1]:


x2 < 40

weight file name[1]:

Enter x2_lt_40.txt

What have we generated:

test/run_3/chi_square_filter/neutron_D2Op_85

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures
0_x2_8_20_rg_40.txt: weight file

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

test/run_3/chi_square_filter/neutron_D2Op_20

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures
x2_lt_50.txt: weight file

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

test/run_3/chi_square_filter/neutron_D2Op_0

averagefile.txt: average scattering curve for all structures
bestworstfile.txt: best and worst scattering curves selected from all structures
sas_spectra_plot.txt: goal, best, worst and average scattering curves
x2_vs_rg_plot.txt: chi squared against radius of gyration for all structures
x2file.txt: chi squared for all structures
x2_lt_40.txt: weight file

/spectra

spec_*.ciq*: scattering curves scaled to correct I(0) for each structure

Combined Weight File

Now, we want to create a single weight file that flags only those structures that are flagged in ALL 3 weight files that we just created. Thus, only those structures that are in the lowest x2 region for all 3 contrasts will be chosen.

An easy way to create this file is to take the average value of the weight flag for each structure from the 3 weight files. If the average value is 1, then the flag is 1 at all 3 contrasts. Otherwise, the average will be a value between 0 and 1. A Python program to perform this average can be downloaded below or you can use your favorite program to perform the average and write out a new weight file.

Input Files:

Python Program:

compare_weight_files.py

Output File:

new_weight_file.txt

Trajectory Filtering - All Contrasts

Now we can filter out the best fit structures for all 3 contrasts and vizualize them using the Extract Utilities. More information can be found in the Extract Utilities documentation.

We will extract the frames from 0_x2_8_20_rg_40.dcd that are flagged in new_weight_file.txt.

The inputs are shown in the figure below.

Input for the Extract Utilities module

output filename: best_all.dcd
Choose 'weight file' from the 'select option' listbox.
- input name of weight file: Select the 'new_weight_file.txt' file generated in the last step that selects only the frames we want.

When the process is finished your output should look like the one below.

Output for the Extract Utilities module

NOTE that there are only 8 frames that satisfy our criteria for this small sampling of structures. In a real study, several hundred to several thousand structures would be needed to adequately define the best fit ensemble for all 3 contrasts.

What have we generated:

test/run_3/extract_utilities

best_all.dcd: DCD containing only the frames for which the theoretical scattering curve is a good match to experiment.

Analysis

The molecular (contrast-independent) Rg was calculated for each subunit, as well as the CM distance between the subunits, for each of the 8 best fit structures.

Analysis of Best Fit Structures

The Rg values for VN are in good agreement with the value found from the contrast analysis. The Rg value for PAI-1 is about 2 Å smaller than the value found from the contrast analysis as previously discussed. The CM distances in these structures are larger than the value of 18 Å that was found from the contrast analysis. This further suggests that our sampling hasn't fully covered configuration space since there should be structures which can satisfy this criterion.

Visualization

The structures in any of the generated DCD files can be visualized by loading the DCD files along with a suitable PDB file (pai_vn_start.pdb) into VMD. The figure on the left below shows the 8 structures in this example that best fit the data at all 3 contrasts. The figure on the right shows the calculated scattering curves for these 8 structures compared with the 85% D2O SANS data.

Best Structures All Contrasts SANS Curves for Best Fit Structures

Another way to visualize the structures is to use the Density Plot module. The density plot below shows two views of the envelope sampled by all of the accepted structures as well as that sampled by only the best fit structures for all 3 contrasts. The blue and red regions are the envelopes represented by PAI-1 (blue) and the fixed portion of VN (red). The black and yellow regions represent the envelope sampled by the flexible portion of VN for all accepted structures (yellow) and for the best fit structures at all 3 contrasts (black). It is clear from the view on the right that there are portions of configuration space that have not been covered in this limited sampling since there are no yellow or black surfaces covering the blue PAI-1 region.

Density Plot

More information on how to use the Density Plot module can be found in the Density Plot documentation.

Finally, the figure below shows the calculated data for the single structure from our small sampling that best fits the data at all 3 contrasts (structure #4) along with the interpolated SANS data at those contrasts.

Best Fit SANS Curve

Reminder: This example has only 698 accepted structures and only 8 structures that are in the lowest x2 region for all 3 contrasts. For a real study, 10,000 to 50,000 accepted structures would be needed to better sample configuration space. This would hopefully allow us to find a wider range of structures that best fit the data at all 3 contrasts.

References

SASSIE: A program to study intrinsically disordered biological molecules and macromolecular ensembles using experimental scattering restraints J. E. Curtis, S. Raghunandan, H. Nanda, S. Krueger, Comp. Phys. Comm. 183, 382-389 (2012). BIBTeX, EndNote, Plain Text
Small-angle scattering contrast calculator for protein and nucleic acid complexes in solution K.L. Sarachan, J.E. Curtis, S. Krueger, J. Appl. Cryst. 46, 1889-1893 (2013). BIBTeX, EndNote, Plain Text
MULCh: modules for the analysis of small-angle neutron contrast variation data from biomolecular assemblies A. E. Whitten, S. Cai, J. Trewhella, J. Appl. Cryst. 41, 222-226 (2008). BIBTeX, EndNote, Plain Text
Letitia N. Puster thesis.

Return to Main Documents Page

Go to top