Preface

In May 2012, as a rising Junior at the University of Illinois, I started as a summer student in the Bil Clemons’ Lab. I had asked Bil to spend the summer in his lab because I wanted to learn something about membrane protein expression to help with my undergraduate research project with Claudio Grosman at UIUC expressing newly found bacterial homologs of the acetylcholine receptor. Given that Bil was one of the world’s experts on the topic, I figured the lab would have some special know-how to share. Alas, I returned with only the conviction that everyone struggles expressing membrane proteins.

Yet, I loved my “research internship” for my mentor in the lab, Axel Müller seemed happy to have me do “whatever I wanted” on the computer vaguely related to what we were talking about. When I wasn’t in “The Matrix” (the computer room which eventually turned into my de facto office), I could program on my Acer Aspire One netbook bathing in the Southern California sun. I realized that I enjoyed hanging out with wet-lab folks as a dry-lab person. While I was grounded in benchtop work (like most life scientists), my stamina and abilities are much more significant on the command line. Indeed, when I applied to graduate school in 2013, I wrote the following to each graduate program (edited for brevity, emphasis added):

My research experiences lie with the process of applying computational methods to solve biological questions. I have learned that I enjoy solving the complex, fundamental problems that face biology, and I have enjoyed applying computational methods because complex biological questions transform into fruitful benchtop experiments when fueled by quantitative frameworks, i.e., big data. I would like to continue building my scientific toolbox by acquiring familiarity with other computational methods as well as refining those that I already. It is very important for me to work closely with wet lab experimentalists because each of us hold a different toolbox which we use to further research challenges.

Membrane Protein Expression Prediction

Flowery language aside (still a penchant of mine), looking back over the course of my graduate work, it would seem that I successfully materialized on this goal. At first, I was motivated by a problem that I had faced as an undergraduate student and gotten a taste of while a summer student: helping with the need for a large mass of heterologously expressed protein when working to solve the atomic structure of a membrane protein. That summer, in fact, Axel made an off-the-cuff remark that it might be a decade before expression prediction methods might come into being. Perhaps implicitly, I took it as a challenge and some years later developed a model (which is the topic of 1 A statistical model for improved membrane protein expression using sequence-derived features) with the support of Nauman Javed, a tremendous undergraduate student.

Then with the help of several students (Sam Schulte, Alex Chu, Nadine Bradbury, Kate Zator), I expended significant effort to extend an expression prediction model to other systems (e.g., Yeast) and improve prediction in E. coli. While we had a series of suggestive results, none developed significantly enough to come to fruition before I became disheartened by the years-long publishing endeavor for our initial E. coli model and perhaps also “distracted” by basic biochemical questions closer to the lab’s heart.

Tail-anchored Protein Biogenesis

While some of the lab worked on, and thought about, improving membrane protein expression, others worked on the question of how a certain class of membrane proteins (tail-anchored, i.e., with a signal to go to the membrane not at the beginning but at the end) mechanistically come to life. In Spring 2013, a brilliant, extremely hard-working biochemist, Geoffrey (Ku-Feng) Lin, approached me looking for my input on structure prediction for this protein domain he was trying to structurally characterize (obviously, beaten down otherwise he’d never considered modelling!). Familiar with the structure prediction literature and still enough naïve enough to think this might be a short pursuit, I agreed to help. Several years later, this work (2 Molecular basis of tail-anchored integral membrane protein recognition by the co-chaperone Sgt2) and extensions thereof (3 The STI1-domain is a flexible alpha-helical fold with a hydrophobic groove and 4 Sequence-based features that are determinant for tail-anchored membrane protein sorting in eukaryotes) became my primary focus working side-by-side with Michelle Fry. The interplay between Michelle and myself was what I think I’d envisioned several years earlier when applying to graduate school and ended up as one of the highlights of my work in the Clemons Lab. Along the way, I was able to design and execute studies that flexed my phylogenetics chops (5 Structures of Get3d reveal a distinct architecture associated with the emergence of photosynthesis).

I did think about a number of other problems while in Bil’s lab (e.g., structural repeats of the ClC membrane protein family and ultra-fast sequence searching) to name a few, but unfortunately, there was not enough time to bring these projects to completion. One day, I hope to be able to think about basic questions of biology again and perhaps return to a careful treatment of these and others.

Thinking About Colors

Motivated, at least in part, by numerous experiences teaching/TA’ing computation-focused courses with Justin Bois. I started thinking a lot about data visualization and color usage—even daydreaming as a TA sitting in the back of lecture halls. We expended so much time and energy to explain to students why they should think carefully about their data, how they plot it, what colors to use, etc. In particular, we would stress that it was important to pay attention to the colormap they use (e.g., the mechanism by which numerical data is painted onto a figure as colors) because the most commonly used colormap (called “Jet” or “rainbow”) was perceptually nonuniform. This meant that the colormap did not respect how our eyes perceive colors and differences between colors and thus could introduce visual artefacts into plots.

In the classroom, students readily took to it, but the question to me was “How do we go about broader adoption / knowledge?” To this end, I created a few pieces of software to help with literature: repaint figures that were already published using a “Jet” colormap (with help from Alex Guerra before his first year at Caltech) (6 Don’t be stuck with rainbow: Fixthejet), and screen and notify authors of bioRxiv pre-prints if they were using a “bad” colormap (7 JetFighter: Towards figure accuracy and accessibility). Concurrently, at the time, I was elected to eLife’s Early Career Advisory Group (ECAG) and benefited from conversations across the landscape of researchers on the topic of publications which helped drive the adoption / dissemination of these tools.

Finally, as a final work of my thesis, I worked to bring perceptually uniform colormaps to structural biology. While writing the code itself took just a few hours, the process of composing a story to convince other structural biologists that the change is worth it has been on our mind for several years now. It is the topic of 8 Structural biologists, let’s mind our colors.

Community Work

While a member of eLife’s ECAG, I was also grateful to help advocate for leveling the playing field at the highest level at eLife culminating in a few publications advocating for specific strategies to support underrepresented and/or marginalized groups in science. In addition, I contributed to a series of open-source software packages and was included on several publications thereof. I have not included these in my thesis trusting that they are best found on Github and other places for those that seek to look.