Week 2: How are there so many proteins!

Aditya L - February 20, 2025 11:01 am

Hey everyone, welcome to my week 2 post!

Last week, I outlined my first task: to use a sample Rhodopsin sequence with no mutations to find all other similar sequences in the protein data bank. This week, I actually began the process – and the results were surprising! As a quick refresher, the sample sequence that I used is 1L9H. First, I searched for this PDB ID in the protein data bank and found the corresponding entry. Then, I clicked on the ‘display files’ menu and selected ‘FASTA Sequence’ – which is just a fancy term for the amino acid sequence that represents that structure. Next, I took the sequence and simply pasted it inside the PDB search bar. When I hit enter, 800+ proteins populated in the PDB! Now, because not all 800 of those are actually Rhodopsin and may only have a 10-20% resemblance, I also added a filter that only looks for sequences that have a 95% or higher similarity to the 1L9H sequence. The final result was 75 sequences.

Over the weekend and past couple of days, I began going through each of the 75 sequences to pre-screen for any issues: (1) I looked for whether the 75 sequences all actually contained Rhodopsin – because Rhodopsin is in the broader family of GPCR proteins, it is similar to other protein receptors; (2) I look to make sure that I could access full structures (Rhodopsin contains 7 total helices; some of the PDB IDs such as (1FDF, 1EDX, 1EDW, 1EDV, 1EDS) contained one helix, which wasn’t enough to make a successful docking simulation.

Then, I went through the data and identified the imaging method, crystallization protocol, resolution (Angstroms), and other metrics.

I have also decided on using the DiffDock molecular docking software to conduct simulations with Benzo[A]Pyrene, due to its high-accuracy in predicting structures that we don’t have data on yet. Because there isn’t any data or structures on Rhodopsin + the Benzo[A]Pyrene environmental ligand, I will be using this software to conduct my simulations. Over the week, I will begin some preliminary simulations, so stay tuned!

Thanks,

Aditya

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

alisha_j

Hello Aditya! What exactly do you plan to take away from the simulations throughout this next week?

February 22, 2025 at 9:42 pm - Reply

aditya_l

Hey Alisha, because the simulations can generate binding affinities that are unique to the specific protein conformation and the specific retinal ligand, i hope to use the binding affinities to 'score' which Rhodopsin states (i.e. dark-state) are better than others. For example, I will be asking questions such as 'is Rhodopsin dark-state + 11-cis-retinal better than Rhodopsin meta-11 + 11-cis-retinal by way of better binding affinity?'

February 24, 2025 at 11:50 am - Reply

evangeline_c

Hi Aditya! How did you come up with the filter that only looks for sequences that have a 95% or higher similarity to the 1L9H sequence? Is there a reason in the literature or prior experiments for choosing 95% as your starting point?

February 24, 2025 at 11:40 am - Reply

aditya_l

Hey Evie, the 95% conservation cutoff was used because there would only be less than 20 amino acids of difference. Rhodopsin has 348 amino acids in total, so we wanted to set a reasonable limit on how many amino acids would change, and 5% of the 348 sequences represented around 20 amino acids. Because using a 95% cutoff ensures that a maximum of 20 amino acids would be different, we decided to go with this threshold.

February 24, 2025 at 11:56 am - Reply

Anonymous

Hi Aditya! How do you plan to handle variations in data quality when conducting your simulations with DiffDock? Do you think any of these factors could impact the accuracy of your results?

February 24, 2025 at 11:43 pm - Reply

mikyle_h

Hi Aditya! How do you plan to handle variations in data quality when conducting your simulations with DiffDock? Do you think any of these factors could impact the accuracy of your results?

February 24, 2025 at 11:44 pm - Reply

aditya_l

Hey Mikyle, generally most of the entries in the protein data bank have low variance (i.e. a contained range of resolutions). However, there are outliers and these outliers can actually be caught by using DiffDock. For example, when I have a low-resolution sample, the binding affinity will adjust to accommodate that low-resolution sample.

February 27, 2025 at 11:04 am - Reply

Week 2: How are there so many proteins!

More Posts

Comments:

Leave a Reply to Anonymous Cancel reply