Synthetic Data Generation on The Hype Train
By Stefan Keller, Reik Leiterer, Nicolas Lenz
When it comes to predictions, we should always be careful. But Synthetic Data Generation is certainly one of the trend topics. Gartner is not the only one to say that this technology will become established in the next few years.
On April 13, 2022, the Expert Group “Spatial Data Analytics” met in Zurich on the topic of “Geospatial Synthetic Data” in-person and virtually. In the modern premises of the Gleisarena, provided by the FFHS, the host, Aldo Lamberti of Syntheticus.ai, presented three top-class talks to the 20 participants. The following is a brief summary.
The Spatial Data Analytics Expert Group is a nice place to share ideas. It’s part of the data innovation alliance which is instrumental in making Switzerland a recognized hub for data-driven value creation.
Aldo Lamberti began with a presentation on “How to securely collaborate and compute on synthetic geo data”. Syntheticus envisions a world in which the full potential of data is realized, while at the same time preserving fundamental privacy rights. Synthetic data is the solution. Synthetic data mimics real data while preserving the utility of data and protecting privacy – it is poised to revolutionize the way the world realizes the full potential of data. Public and private entities around the world trust us to unlock and monetize untapped data without violating compliance. They are setting new standards by securely collaborating and processing Syntheticus data across the entire data value chain. Syntheticus provides an SaaS platform for enterprises to generate synthetic data at scale while maintaining privacy.
Jakob Dambon of SwissRe spoke on “Spatial and Spatio-Temporal Statistics, Using Both Frequency and Bayesian Approaches.” He explained that one of the best known regression methods is Ordinary Least Squares (OLS). It is easy to model and interpret. However, when dealing with spatial data, the model assumptions are usually no longer valid. More specifically, observations that are close together are more dependent than observations that are far apart. This is where geostatistical methods come into play. These methods attempted to explain the remaining dependencies using, for example, a Gaussian process. These processes capture the dependence on the observations over distance in their covariance function. Finally, using geostatistical methods, he modeled the covariates as a fixed external trend while allowing the intercept to vary over space.
Josef Boesze of itopia ag spoke about “Developing and Testing without any Risks or Side Effects using iSynth”. itopia – as a boutique IT consulting firm for the financial world – has long suspected that testing based on production data – even when anonymized – leads to risks and undesirable side effects. Moreover, machine learning and Big Data analytics have become the natural enemies of solutions based on anonymized data. In his opinion, it’s time for a change. The alternative is synthetic data. However, until now, generating synthetic data was too costly, the results were not satisfactory, or it was simply not practical. Efficient and risk-free development, testing and training is now possible thanks to consistent synthetic test data. itopia offers an agile and object-oriented approach as well as suitable test data factory tools for projects and DevOps.
After a lively discussion, the participants present went out for pizza together at a nearby casual industrial-style venue. The food and drinks were kindly sponsored by ExoLabs. While networking, the next host was also already determined. This means that we can look forward to more interesting meetings!