[New page: Updating...]
I am a researcher working in machine learning and computational & applied mathematics.
The purpose of this page is to collate some of my researchrelated activities and work. I am interested broadly in the theory and practice of machine
learning. My work frequently intersects with mathematical areas such as applied probability, statistical mechanics, combinatorics, harmonic analysis, and representation theory. I often
get inspiration from [and work on] applications of machine learning in computational chemistry/physics, science automation, and healthcare.
For the past few years my research has focused on developing rigorous theoretical and engineering tools for dataefficient machine learning (e.g. via
equivariant and geometric deep learning) and enabling their safe and reliable deployment in realworld applications and decisionmaking pipelines (e.g. via
conformal prediction and provable uncertainty quantification). A recent area of interest is using tools from statistical physics for analysis of neural
networks. I also maintain an interest in spectral graph theory and extremal combinatorics from a past life. A somewhat verbose description of topics
that I have worked on for extended periods can be found here. Publications and patents can be
found here.
I have extensive experience in both academic and industrial research and engineering, including in the deployment of largescale machine learning
systems. More information on my training can be found here. In industrial contexts,
I have worked, at various points, in technology research, in (technology/pharmaceuticals and management) consulting, and in semiconductors. I have also been involved with a semiconductors startup
in the past and have also advised multiple startups in the healthcare space. If you'd like a CV, please email me.
Collaborations: If you would like to collaborate on a topic of mutual interest (research or nonresearch; for nonresearch interests you might have to poke around),
please email me and we can set up a time. Some questions of current research interest can be found here. I am also keenly interested in teaching and
mentoring students in some of my free time, especially those coming from community colleges and rural areas. Please
see this for subjects of interest, and don't hesitate in contacting me if I might fit the bill.
Contact:
Some Background
Research: Past and Present
Equivariant Neural Networks:
I have worked on equivariant networks since 2016 intermittently, expending a significant amount of energy on it during my PhD. I initiated the project and focus on equivariant networks at Risi Kondor's lab at the University of Chicago during that period. Some of the background effort to build interest in said project included presenting a whole course on deep learning — first internally and then as a proper graduate course in spring 2017 (the first at the University of Chicago). Broadly, such work involves the design and implementation of neural architectures that either have task pertinent symmetries baked in them using the machinery of group and representation theory aka groupequivariant neural networks, or attempt to learn them from data. Such networks provide a rational and attractive design precept for the principled design of neural networks, while also affording significant data efficiency. I have been involved in work that gives a general prescriptive theory, which elucidates necessary and sufficient conditions for neural networks to be equivariant when inputs transform in a certain manner. The theory provides a tangible path for the practical construction of such networks. This was implemented in a highly optimized manner for the case of spherical inputs. This work has an additional thrust towards a "fully Fourier" methodology that is applicable for general compact groups (not just rotations as in the specific application). Complementary to some of the above theoretical work, with Mircea Petrache, I have also worked on elucidating some very general quantitative bounds that show the generalization benefits of equivariance. These results don't require the underlying set of transformations to be a group, and also include studying the question of model misspecification i.e. when the model and data symmetries don't match, which necessitates an analysis of the approximation error in addition to the generalization error. Together, they represent the most general results of their type in the literature. I have also been involved in some of the earliest (if not the first) works on equivariant graph networks, which was also applied to the case of molecular property prediction. I continue working on related problems and applications, especially in the physical sciences. For some questions of current interest please see the list below.Conformal Prediction and Uncertainty Quantification:
As machine learningbased decisionmaking pipelines become increasingly ubiquitous in various critical applications, their safe and trustworthy deployment is becoming exceedingly important. To be trustable, such pipelines should support proactive assessment and continuous monitoring at each stage. To enable proactive assessment, we need provable and easytointerpret quantification of uncertainty at every step that can allow human decision makers to intervene when required. Indeed, the theoreticallygrounded quantification of predictive uncertainty can serve as an additional layer of security, permitting more honest decisionmaking. Conformal prediction provides an attactive general framework for provable uncertainty quantification with minimal assumptions on the data distribution and the model. With Zhen Lin and Jimeng Sun, we have worked on the development of conformal methods that are scalable, efficient and can provably work in general settings, all without reducing the accuracy of the base deep learning model. We have developed approaches to construct valid and efficient prediction intervals (PIs) (a band of possible outputs rather a point prediction) for general deep neural networks. Validity means that the PI contains the true output with high probability and efficiency means they have small width. We have also developed conformal methods for the difficult problem of crosssectional timeseries forecasting which can handle validity both along the longitudinal dimension (across points) and the temporal dimension. We have also produced methods for provable full calibration of probabilistic predictions of NNs (not just for the predicted class), which reduces the level of over or underconfidence typically seen in large neural networks. Much of this work has been directly inspired by realworld scenarios in healthcare such as differential diagnosis, prediction of vital statistics of patients, and also work on automating scientific experiments. Some current problems and themes of interest in this space are listed in the bullet points below.Discriminative Learning of Similarity and Distance:
I worked on similarity learning in the period from 2013 to 2015 under the supervision of Gregory Shakhnarovich (also collaborating with David McAllester and Samory Kpotufe) and it constituted a fair chunk of my 2018 PhD thesis (the rest of which was on group equivariant neural networks). Some of this work was in the old "metric learning" mould while some of it had more of a classical nonparametric and semiparametric statistics flavour. However, the ideas and formulational insights remain relevant in the deep learning era. We presented a formulation for metric learning that made a more direct attempt to optimize for the kNN accuracy. It considered the choice of k neighbours as a discrete valued latent variable, and cast the metric learning problem as a large margin structured prediction problem. We also worked on extensions of this formulation to metric learning for kNN regression, discriminative learning of Hamming distance and kernel regression. We also considered situations where we operated on a limited computational budget which made optimization over a space of possible metrics infeasible. Nevertheless, a labelaware and wellmotivated metric was desirable. We presented an approach based only on gradient estimates with connections to work on slicedinverse regression and sufficient dimension reduction. Some of this work could be seen as a precursor to some of the more recent work on the empirical Neural Tangent Kernel (NTK). Apart from these more direct contributions, I have often developed pipelines using similarity learning methods in industrial settings from time to time.Machine Learning on GraphStructured and Combinatorial Data:
I have worked with graphstructured data in different academic research and applied industrial contexts since 2010. Graphstructured data and graphlike structures occur naturally under many guises. Machine learning on such data comes with its own unique challenges compared to the usual real vectorvalued data due to its inherently combinatorial nature, requiring a more careful consideration of (sub)structure and symmetry. During my master's I worked on graph mining from data originating in an intelligent tutoring context, modeled as bipartite graphs. A central contribution of my master's thesis was the use of the Szemerédi Regularity Lemma for graph compression and using it to propose a fast clustering procedure with similar performace as spectral clustering on the orignal graph. As mentioned on the section on equivariance, I was also involved in a project that identified that message passing neural networks (a popular graph neural network formalism) in its basic form lacked an analog for steerability, limiting their expressive power. An equivariant graph neural network was proposed in response. We also used it for the task of molecular property prediction. I have also been involved in a long project on using graph neural networks to understand the glass transition. In industry, I have used graph learning methods for link prediction, early adopter prediction and methods for modeling temporally evolving graphs. More generally, I am interested in methods for operating on combinatorial data  such as sets, posets, multisets  beyond graphs, which require a different set of considerations. My current interests in this space include partial differential equationsbased formalisms for graph neural networks, hierarchical representations for partially ordered data, and the connections between learning and the expressive power of graph neural networks.Artificial Intelligence in Education:
Between late 2010 to 2012 I did core data mining and graph mining (what would now be called data science) work for research problems originating from a large intelligent tutoring project called ASSISTments, a free public service operated by WPI and the ASSISTments foundation, working with Neil Heffernan and Gábor Sárközy. This personalized tutoring system, used by thousands of students every day, provides immediate feedback for students as they do homework while simultaneously assessing them. It also gives teachers actionable perstudent data, and works around the instructional philosophy of Bloom's mastery learning. In this period I worked on modeling student knowledge; a very simple bootstrap aggregation strategy using clustering; prediction of student's future test scores, and improving the classical knowledge tracing method. Some of this work has directly inspired work that has been incorporated into the system. Owing to this experience, I still maintain a residual interest in item response theory, the design of randomized control trials (which I briefly explored in 2018), and the use of machine learning in adaptive learning systems more generally.Industrial Research Work:
Some of my industrial research work has sought to integrate my work on equivariant modeling and uncertainty quantification in the healthcare domain. This includes drug repurposing, disease phenotyping, identification of rare diseases from massive scale eletronic health records (EHR) or insurance claims data. I also worked on problems in knowledge graph engineering and knowledge representation, dialogue systems, and the use and adaptation of large language models to very specific, niche, use cases. Some of my interest in uncertainty quantification comes directly from applications in healthcare. In older interactions, I have worked on problems in robust optimization, portfolio optimization in an operations research context, and Application Specific Integrated Circuits (ASICs) and signal processing in the semiconductors industry.Machine Learning for the Physical Sciences:
Much of the inspiration for my work in machine learning comes from applications in physics and chemistry. I have been involved in a multiyear and multiinstitution collaboration (led by Brian D. Nord and Camille Avestruz) working on the frontiers of the use of deep learning techniques in astrophysics and cosmology. We have produced work on using deep learning to understand observational data about the Cosmic Microwave Background, and to identify SunyaevZel'dovich galaxy clusters. We have also worked on quantifying the uncertainty of predictions in these contexts and their implications for different cosmologies. The longterm goal of this work is to integrate work on equivariant networks, uncertainty quantification, and Nbody simulations to theorize about different cosmological parameters and ask basic questions about the underlying physics. I have also worked on the uses of equivariant models in soft matter, chemical physics, molecular synthesis, and molecular property prediction, and this remains a chief application of interest to me. Some of my recent efforts, primarily with Brian D. Nord, have been in designing systems for automating the scientific discovery process in certain contexts – in effect “closingtheloop” – from data collection to conducting an experiment and generating results. Some of this work leverages techniques from simulationbased inference, equivariant modeling and provable uncertainty quantification. We are currently working towards producing a vision paper on this broader area.Some questions/connections/themes of continual or intermittent recent research interest:
Research reports
A more complete list on google scholar

Mircea Petrache, Shubhendu Trivedi, Advances in Neural Information Processing Systems 36 (NeurIPS), 2023 arXiv preprint arXiv:2305.17592 

Zhen Lin, Shubhendu Trivedi, Jimeng Sun, Preprint, 2023 arXiv preprint arXiv:2305.19187 [Code] 

Zhen Lin, Shubhendu Trivedi, Cao Xiao, Jimeng Sun, International Conference on Machine Learning (ICML), 2023 arXiv preprint arXiv:2302.00839 [Code] 

Zhen Lin, Shubhendu Trivedi, Jimeng Sun, International Conference on Learning Representations (ICLR), 2023 arXiv preprint arXiv:2202.07679 [Code] 

Zhen Lin, Shubhendu Trivedi, Jimeng Sun, Advances in Neural Information Processing Systems 35 (NeurIPS), 2022 arXiv preprint arXiv:2205.09940 [Code] 

Zhen Lin, Shubhendu Trivedi, Jimeng Sun, Transactions on Machine Learning Research (TMLR), 2022 arXiv preprint arXiv:2205.12940 [Code] 

Matthew Farrell, Blake Bordelon, Shubhendu Trivedi, Cengiz Pehlevan, International Conference on Learning Representations (ICLR), 2022 arXiv preprint arXiv:2110.07472 [Code] Oral presentation at NeuReps 2022 (link). 

Zhen Lin, Nicholas D. Huang, Camille Avestruz, W. L. Kimmy Wu, Shubhendu Trivedi, João Caldeira, Brian D. Nord, Monthly Notices of the Royal Astronomical Society (MNRAS) 507 (3), 2021 arXiv preprint arXiv:2102.13123 [Code] 

Zhen Lin, Shubhendu Trivedi, Jimeng Sun, Advances in Neural Information Processing Systems 34 (NeurIPS), 2021 arXiv preprint arXiv:2106.00225 [Code] 

Suhas Lohit, Shubhendu Trivedi, Technical Report., 2020 arXiv preprint arXiv:2012.04474 

Shubhendu Trivedi, Technical Report., 2020 arXiv preprint arXiv:2006.03550 

Kirk Swanson, Shubhendu Trivedi, Joshua Lequieu, Kyle Swanson, Risi Kondor, Soft Matter, The Royal Society of Chemistry, 2020 arXiv preprint arXiv:1909.04648 [Code] 

J. Amundson et al. , White Paper for NSF, 2019 arXiv preprint arXiv:1911.05796 

Pramod Kaushik Mudrakarta, Shubhendu Trivedi, Risi Kondor, Technical Report., 2019 arXiv preprint arXiv:1910.05132 

João Caldeira, W. L. Kimmy Wu, Brian D. Nord, Camille Avestruz, Shubhendu Trivedi, Kyle T. Story, Astronomy and Computing 28, 100307 , 2019 arXiv preprint arXiv:1810.01483 [Code] 

Hy Truong Son, Shubhendu Trivedi, Horace Pan, Brandon M. Anderson, Risi Kondor, 15th International Workshop on Learning and Mining with Graphs, 2019 

Shubhendu Trivedi, PhD Thesis , 2018 arXiv preprint arXiv:1808.10078 

Hy Truong Son, Shubhendu Trivedi, Horace Pan, Brandon M. Anderson, Risi Kondor, The Journal of Chemical Physics (JCP) 148 (24), 241745 , 2018 Editor's Pick in JCP's Special Issue on DataEnabled Theoretical Chemistry 

Risi Kondor^{†}, Zhen Lin^{†}, Shubhendu Trivedi^{†}, Advances in Neural Information Processing Systems 31 (NeurIPS) , 2018 arXiv preprint arXiv:1806.09231 [Code] ^{†} denotes alphabetical author ordering 

Risi Kondor, Shubhendu Trivedi^{†}, International Conference on Machine Learning (ICML) , 2018 arXiv preprint arXiv:1802.03690 

Risi Kondor^{†}, Hy Truong Son^{†}, Horace Pan^{†}, Brandon M. Anderson^{†}, Shubhendu Trivedi^{†}, International Conference on Learning Representaions (ICLR)  WS , 2018 arXiv preprint arXiv:1801.02144 [Code] ^{†} Author ordering is entirely arbitrary 

Shubhendu Trivedi, Zachary A. Pardos, Neil T. Heffernan, Technical Report., 2015 arXiv preprint arXiv:1509.06163 

Shubhendu Trivedi, David McAllester, Gregory Shakhnarovich, Advances in Neural Information Processing Systems 27 (NeurIPS), 2014 

Shubhendu Trivedi, Jialei Wang, Samory Kpotufe, Gregory Shakhnarovich, Uncertainty in Artificial Intelligence (UAI), 2014 

Fei Song, Shubhendu Trivedi, Yutao Wang, Gábor N. Sárközy, Neil T. Heffernan, AAAI FLAIRS, 2013 

Shubhendu Trivedi, M.S. Thesis , 2012 WPI ETD043012104639 

Gábor N. Sárközy^{†}, Fei Song^{†}, Endre Szemerédi^{†}, Shubhendu Trivedi^{†}, Technical Report WPICSTR1205, 2012 arXiv preprint arXiv:1209.6540 ^{†} denotes alphabetical author ordering 

Zachary A. Pardos, Shubhendu Trivedi, Neil T. Heffernan, Gábor N. Sárközy, Intelligent Tutoring Systems (ITS), 2012 

Shubhendu Trivedi, Zachary A. Pardos, Gábor N. Sárközy, Neil T. Heffernan, Educational Data Mining (EDM), 2012 International Educational Data Mining Society 

Zachary A. Pardos, Qingyang Wang, Shubhendu Trivedi, Educational Data Mining (EDM), 2012 International Educational Data Mining Society 

Shubhendu Trivedi, Zachary A. Pardos, Neil T. Heffernan, Educational Data Mining (EDM), 2011 International Educational Data Mining Society 

Shubhendu Trivedi, Zachary A. Pardos, Neil T. Heffernan, Artificial Intelligence in Education (AIED), 2011 
Patents
Theses & etc.
Teaching
Service and other activities
Refereeing activities: I referee roughly 50 papers a year from various venues in machine learning, computational physics, computational and applied mathematics, experimental mathematics, information theory and applied statistics. These venues include the following:Invited Talks