A central challenge in data-driven model discovery is the presence of hidden, or latent, variables that are not directly measured but are dynamically important. Takens' theorem provides conditions for when it is possible to augment these partial measurements with time delayed information, resulting in an attractor that is diffeomorphic to that of the original full-state system. However, the coordinate transformation back to the original attractor is typically unknown, and learning the dynamics in the embedding space has remained an open challenge for decades. Here, we design a custom deep autoencoder network to learn a coordinate transformation from the delay embedded space into a new space where it is possible to represent the dynamics in a sparse, closed form. We demonstrate this approach on the Lorenz, Rössler, and Lotka-Volterra systems, learning dynamics from a single measurement variable. As a challenging example, we learn a Lorenz analogue from a single scalar variable extracted from a video of a chaotic waterwheel experiment. The resulting modeling framework combines deep learning to uncover effective coordinates and the sparse identification of nonlinear dynamics (SINDy) for interpretable modeling. Thus, we show that it is possible to simultaneously learn a closed-form model and the associated coordinate system for partially observed dynamics.
In the absence of governing equations, dimensional analysis is a robust technique for extracting insights and finding symmetries in physical systems. Given measurement variables and parameters, the Buckingham Pi theorem provides a procedure for finding a set of dimensionless groups that spans the solution space, although this set is not unique. We propose an automated approach using the symmetric and self-similar structure of available measurement data to discover the dimensionless groups that best collapse this data to a lower dimensional space according to an optimal fit. We develop three data-driven techniques that use the Buckingham Pi theorem as a constraint: (i) a constrained optimization problem with a non-parametric input-output fitting function, (ii) a deep learning algorithm (BuckiNet) that projects the input parameter space to a lower dimension in the first layer, and (iii) a technique based on sparse identification of nonlinear dynamics (SINDy) to discover dimensionless equations whose coefficients parameterize the dynamics. We explore the accuracy, robustness and computational complexity of these methods as applied to three example problems: a bead on a rotating hoop, a laminar boundary layer, and Rayleigh-Bénard convection.
Statistical (machine learning) tools for equation discovery require large amounts of data that are typically computer generated rather than experimentally observed. Multiscale modeling and stochastic simulations are two areas where learning on simulated data can lead to such discovery. In both, the data are generated with a reliable but impractical model, e.g., molecular dynamics simulations, while a model on the scale of interest is uncertain, requiring phenomenological constitutive relations and ad-hoc approximations. We replace the human discovery of such models, which typically involves spatial/stochastic averaging or coarse-graining, with a machine-learning strategy based on sparse regression that can be executed in two modes. The first, direct equation-learning, discovers a differential operator from the whole dictionary. The second, constrained equation-learning, discovers only those terms in the differential operator that need to be discovered, i.e., learns closure approximations. We illustrate our approach by learning a deterministic equation that governs the spatiotemporal evolution of the probability density function of a system state whose dynamics are described by a nonlinear partial differential equation with random inputs. A series of examples demonstrates the accuracy, robustness, and limitations of our approach to equation discovery.
Multiscale modeling and simulation of complex materials is still a formidable task. The challenge arises due to the lack of scale separation in systems that exhibit nonlinear and chaotic dynamics.This thesis develops stochastic, multiscale and data-driven methods for modeling complex systems, with a focus on the dynamic compaction of heterogeneous granular materials. Hybrid and statistical techniques are shown to successfully bridge space and time scales, discrete and continuum descriptions, known and uncertain laws, as well as experimental/simulation data and analytical models.In particular, we tackle the problem of quantifying the effect of heterogeneous microstructures on thermal localization. We first develop an energy-conserving multiscale hybrid model that explicitly solves for macro-pore collapse dynamics to study the compaction evolution. The results motivate a generalization that models the initial microstructure as a random field. Monte Carlo simulations of a multiscale system of hyperbolic conservation laws show that the heavy-tailed probability density function (PDF) of the temperature explains the formation of hotspots where reactions are most likely to occur. The significance of this discovery motivates the development of a joint PDF equation in the relevant quantities of interest to accelerate simulations and discover the analytical characteristics of the probability distribution. For that purpose, we use the method of distributions to derive an unclosed PDE in the joint PDF and motivate the use of machine learning for closing and marginalizing the equation. Accordingly, a data-driven method is developed to discover the marginal PDF equation from Monte Carlo simulation data. This method has shown to be a promising direction for data-driven coarse-graining in a wide range of physics and engineering applications. Furthermore, we introduce subscale granular fluctuations that arise due to chaotic force chains through the formalism of stochastic differential equations (SDE). At the mesoscale, a stochastic Carroll-Holt model is developed to predict the probabilistic evolution of macropores by relaxing the assumptions of axisymmetric and homogeneous stress distribution. Our results demonstrate that, due to nonlinearity, the mean of the probability distribution of the temperature at the pore surface is higher in the presence of subscale fluctuations. Finally, the relationship between stochastic microscale (discrete)and deterministic macroscale (continuum) models is explored in the context of advection-diffusion systems. Reverse Brownian motion is used to solve stochastic differential equations locally in space and time to bridge the various discrete and continuum descriptions developed in this dissertation.
Thermal localization leads to reaction initiation in granular materials. Observations show that it occurs in the vicinity of large pores and, thus, depends on a material's microstructure. Since the spatial variability of the latter cannot be ascertained in all its relevant details, we treat the material's initial porosity as a random field. The so-called “hotspots” are then modeled as rare events in a complex nonlinear dynamical system. Their probability of occurrence is quantified by the tails of the distributions of the temperature and the corresponding reaction rate. These are computed via Monte Carlo simulations of a two-phase five-equation dynamic compaction model, which are supplemented with a mesoscale model of the thermal localization at the solid-gas interface. Our results demonstrate a strong nonlinear dependence of the probability of hotspot initiation on the variance of the initial porosity.
Hybrid socio-hydrological modeling has become indispensable for managing water resources in an increasingly unstable ecology caused by human activity. Most work on the subject has been focused on either qualitative socio-political recommendations with an unbounded list of vague factors or complex sociological and hydrological models with many assumptions and specialized usability. In this paper, we propose a simple agent-based socio-hydrological decision modeling framework for coupling dynamics associated with social behavior and groundwater contamination. The study shows that using social health risk, instead of contaminant concentration, as an optimization variable improves water management decisions aimed at maximizing social wellbeing. The social models and computational framework are designed with enough flexibility and simplicity to encourage extensions to more general socio-hydrological dynamics without compromising either computability or complexity for better data-/model-driven environmental management.
Multiscale and multiphysics simulations are two rapidly developing fields of scientific computing. Efficient coupling of continuum (deterministic or stochastic) constitutive solvers with their discrete (stochastic, particle-based) counterparts is a common challenge in both kinds of simulations. We focus on interfacial, tightly coupled simulations of diffusion that combine continuum and particle-based solvers. The latter employs the reverse Brownian motion (rBm), a Monte Carlo approach that allows one to enforce inhomogeneous Dirichlet, Neumann, or Robin boundary conditions and is trivially parallelizable. We discuss numerical approaches for improving the accuracy of rBm in the presence of inhomogeneous Neumann boundary conditions and alternative strategies for coupling the rBm solver with its continuum counterpart. Numerical experiments are used to investigate the convergence, stability, and computational efficiency of the proposed hybrid algorithm.
A reduced-order model of a Microfluidic Transistor is presented. The transistor is essentially a long micro channel between substrate and a membrane that is pressure actuated. The proposed model captures steady (DC) and small signal (AC) behavior of the device in a manner analogous to standard semiconductor transistor models. The model is based on steady and perturbed unsteady solutions of the conservation of mass and momentum, coupled with an elastic model for the membrane. To improve the accuracy and to enhance the range of validity, the model is enhanced by numerical simulations of the coupled fluid-structure problem. The model predicts dependence of the transconductance on the pressure differentials across the membrane and along the channel. The proposed model also investigates the impact of flow inertia, among other effects, on the dynamic behavior of the transistor.
Stochastic models for pore collapse in granular materials are developed. First, a general fluctuating stress-strain relation for a plastic flow rule is derived. The fluctuations account for non-associativity in plastic deformations typically observed in heterogeneous materials. Second, an axisymmetric spherical shell compaction model is extended to account for fluctuations in the material microstructure due to granular interactions at the pore scale. This changes the stress-strain constitutive equation determining the dynamics of pore collapse. Results show that stochastic differential equations can account for multiscale interactions in a statistical sense.
A music glove instrument equipped with force sensitive, flex and IMU sensors is trained on an electric piano to learn note sequences based on a time series of sensor inputs. Once trained, the glove is used on any surface to generate the sequence of notes most closely related to the hand motion. The data is collected manually by a performer wearing the glove and playing on an electric keyboard. The feature space is designed to account for the key hand motion, such as the thumb-under movement. Logistic regression along with bayesian belief networks are used learn the transition probabilities from one note to another. This work demonstrates a data-driven approach for digital musical instruments in general.
We develop a supervised learning algorithm that predicts the publication date of news articles from their content. This study is an exploratory analysis of the relationship between semantic meaning and time. We collect New York Times articles written over 30 years and develop heuristic feature selection measures for choosing the most time-predictive words. We analyze the performance of various supervised learning algorithms, expose the issues faced in NLP data filtering and learning, and proposes solutions. The algorithm correctly classifies around a third of the test set; which is much better than human performance.