Skip to main content

Archives: All Ads

All Ads

ML model and dataset licensing

In many applications of machine learning (ML), foundation models are becoming the starting point for downstream analyses and tasks. For example, any text-bound task will often begin with a large language model, which has been pre-trained on vast text corpora scraped from the web. Subsequently, these models are tuned on a few annotated samples for the specific task at hand. This unsupervised pre-training ensures that the model has learned good representations, such that the concrete task can be solved with only a few annotated samples.

Recently, we have observed a trend of big tech companies limiting the licensing of foundation models to non-commercial uses. While this gatekeeping is understandable from a business perspective, it poses a significant challenge for innovative small to mid-sized companies, given that they rely heavily on the results of state-of-the-art research, which is dominated by these very companies.

At the same time, it is unclear how licensing of public datasets impacts the usability of models that used these datasets, e.g., is a model that was trained on restrictive data still subject to the same license restrictions, even if it was trained on other (more permissive) data? Given these observations, we foresee a need to clarify the impact of ML model and dataset license restrictions and the extent to which companies within the data innovation alliance are facing similar challenges and how they are addressing them.

Objectives

    1. To determine the extent to which small to mid-sized companies are encountering the
      challenge of restrictive licenses on ML models and datasets.
    2. To understand the implications of combining various license texts in a given ML project,
      e.g., training a MIT licensed model on a non-commercial creative commons license (such
      as CC BY-NC-SA 4.0).
    3. To identify amendments to and/or modifications of original licensing texts after, e.g.,
      finetuning pre-trained models on internal data, making architectural changes, etc.
    4. To explore the possibility of forming a collaboration to curate data and/or pre-train models
      on public datasets, if there is sufficient interest among the participants.

Conclusion:
The ML model and data landscape is less and less open posing grave challenges to small- and mid-size companies with limited computational resources. We consider it a timely question to address at this point and hope that by leveraging the data booster program, a future bottleneck due to restrictive licensing can be evaded.

Requested support
Companies might find themselves in a similar situation, which could pave the way to a collaborative effort in the future.
Research partners could provide the necessary legal expertise.

Call for Participation description (pdf)

Deadline: March 03, 2023

Unleashing the full potential of precision medicine – equitable access to cutting edge data for research at every scale

Requester: Psephisma Health

Even though medicine has always been personal, in a sense, it hasn’t been very personalized. The steady development of diagnostic and research technologies has made it abundantly clear that individuals differ in many aspects. It is the appreciation of molecular differences (genetic, transcriptomic and proteomic ones) that gave rise to term of “personalized” or “precision” medicine as something worth striving to. The promise of it – “The right drug for the right patient at the right time” – remains elusive still, the main reason being our lack of understanding as to what some of the molecular features really mean and which ones to observe when prescribing treatments. The key in making further progress is to study greater volumes of such molecular profiling data in the context of health and disease. That is, however, easier said than done as such data is very scarce due to the costs and efforts associated with its generation. Also, positive legal regulations in the domain of patient privacy and research on humans make the extension of existing data attributes challenging, if not impossible, which negatively impacts data reuse. With that, simply scaling up current efforts and ways of working is inefficient and unable to produce the desired outcome. We need qualitatively different approaches.

Goals
To address existing shortcomings and create a robust precision medicine framework that scales easily in the future, we have conceived an innovative health data management platform – Psephisma Health. The platform holds molecular profiling data generated in the course of clinical research directly associated with its owner – the patient. By having patients directly involved, molecular data attributes can easily be extended and data itself augmented by matching data from other sources, like regular diagnostics. For customers, external entities which require molecular real-world data for research (biotechnology and pharmaceutical companies, other), our platform functions as a virtual cohort builder and data marketplace. It allows simple access and provides powerful search tools, allowing to narrow down on the data of interest which can then be leased for research according to the available consent. The lease proceeds are automatically distributed between the patient, care-providing institution and the entity that funded the initial data generation. With the above, a whole new model of precision medicine takes shape, one which offers numerous additional opportunities for the patients to take part in and profit from the clinical research.

Requested support
1. Ethics
Our proposal assumes redistribution of existing roles between patients and physicians, some of which requires ethics-focused attention.
In general, there appear to be 3 main areas in the PH business model which need such attention:
a. New roles for patients/individuals, as they are now able to manage secondary research
uses of their protected health information (PHI) directly.
b. The implications of the transactions associated with the data lease (lease proceeds for
patients, healthcare institutions and entities that funded the data generation). What is
the most ethically appropriate way to distribute monetary proceeds associated with the data lease. What is the meaningful way to reimburse the funders of data generation when those are public institutions?
c. How to best address possible caveats stemming from the ability of patients to “learn” from their data and indirect interaction with research entities without involvement of their healthcare institution?

2. Business and financial projections
Assistance in formulating metrics expected by the potential investors – exit strategies, margins, returns.

3. Investor outreach
Assistance in setting the scope and depth of investor materials best suited for recommended/prospective funding sources.

Measuring and managing value and risk for AI projects in an organization

It is a well-known fact that most Data (AI) projects don’t make it to production. This is due to various
risks associated with people, process, and technology aspect of use-case development and
productization. In this project our goal is, to create an advanced business process automation and
optimization tool that helps organizations quantify, measure and manage value and risks
associated with data (AI) projects, track progress of data (AI) projects through different stages of
development (ideation, experimentation, MVP, Production, maintenance and retire or
decommissioning) and recommend actions organizations can take to move forward on their data (AI)
journey i.e. from a low data maturity organization to a high data maturity organization. To the best of
our knowledge such a tool does not exist though similar tools are commonly used in other business
processes within organizations e.g., financial risk estimation tools, business process automation tools
etc.

The goal of the project is to develop a software solution that

  1. Follows a research-backed approach to quantify value and risks associated with data (AI)
    projects
  2. Allows organizations to manage the value and risk of data (AI) initiatives along the AI project
    lifecycle

The goal of this call for workshop participation is to determine the desirability and viability of such a
software solution.

We are looking for participants for the 1/2-day databooster shaping workshops to confirm
desirability and viability.

To confirm desirability, we want to understand whether this is currently a question that is raised in
organizations. If the question is raised, how are they solving the problem of risk estimation and
measuring value of data and AI initiatives? We want to understand if the timing is right, as well as what
further pain points can be identified in organizations that are not on our radar yet.

To confirm viability, we will explore what the economic value would be for organizations to be able to
measure and manage value and risks of AI initiatives, from which we aim to derive a potential business
model.

The initial involvement will be 1/2 day, we will however invite the workshop participants based on
the ability and desire to become industry partners for the research or development partner for an MVP
of the product.

We are looking for industry partners in mature organizations that have a significant workforce working
on AI (ca. 5-10 or more FTEs focusing on data and AI) and have brought at least one data (AI) project
into production. A consulting company in data space could also be such a partner. We are interested
in participants coming from both a business background as well as the data science technical side.
Participants should be (or be in direct contact with) decision makers. We are looking for participants
that either are regularly involved in their organization’s AI strategy or in meetings that are deciding on
continuation of specific projects.

Depending on availabilities the workshop will take place either in December 2022 or January 2023

Virtual Traffic Counter

Replace the traditional physical traffic counters (motorized individual traffic) with a solution that compiles traffic information from various sources that are available on the internet using free and paied for data.

The aim is to give a user the possibility to perform a traffic count on the exact location he desires without the need to setup a physical counter therefore reducing all the efforts of handling the physical counters. The data collected shall be stored in the cloud and can be assessed to derive the desired information (generally, responding to a question regarding traffic patterns).

I see the service in two distinct variations:

  • Traffic counting in a street
  • Traffic counting in a intersection with the possibility to track source/destination traffic

See attachment for detailed information about the data sources. Need is for academic or industry support to validate the idea and to lay the groundwork for the solution.

Web Crawl – Data Collection & Analysis

Describe your challenge or idea in a few sentences.
We struggle to automate the identification of privately owned / family-owned businesses whose owners can be considered Ultra High Net Worth Individuals (UHNWI). We currently undertake manual research to create a database of private investors, wealth owners, business owners, entrepreneurs, and family office principals with a net worth of at least EUR 50 million. Apart from identifying those individuals, we need to understand what their specific investment / business activities are and which struggles they might face – e.g. wealth transfer and legacy building, development of an investment strategy, tax optimization, deal flow, peer-to-peer network building, use of data, AI, industry 4.0 etc.

Describe the goal which you want to achieve.
Our goal is to create the most relevant agenda for the CEE and Africa Wealth Summit respectively. We want to address very specific topics that are educational, offer solutions to concrete needs and add value to the participants in an unprecedented way. Grammig Advisory provides a completely new conferencing experience. Moving away from sponsored content like sales pitches and fund raising, and towards relevant insights, solutions, and expert views.

Describe the requested support. For which contribution you are looking for?
Grammig Advisory (organizer of the CEE and Africa Wealth Summits) is looking for a research and data analysis partner that can provide us with unprecedented insights and access to structured data about Eastern European and African private wealth owners. We would provide search criteria and the type of information we are looking for.

Whom are you looking for?
Research partner to help us identify family-owned businesses from CEE and Africa, to categorize the identified companies and to understand their specific needs / interests to build an agenda for the CEE Wealth Summit and Africa Wealth Summit relevant to those businesses. Companies that can educated family businesses at the respective geographically focused summit on matters identified above.

Call for Participation description (pdf)

Determination of the rooftops use

The rooftops are features that can play a role in many cantons or cities’ strategies to answer the challenges of tomorrow. They can host solar panels to produce green energy or vegetation to decrease the heat island effect and favor biodiversity. However, those surfaces are restricted, and the elements built cannot overlap. The roofs are already used by air conditioner outlets or antennas. Sometimes, they are also protected as heritage.
A better knowledge of the rooftops covering and use would allow to assess a large variety of indicators. A better estimation of the potential of the solar energy could be made. With this knowledge, the administration could decide of an adapted strategy and better guide the property owners toward solar energy or a vegetation of their roof depending on the available spaces that are left, their neighborhood, and the type of roof.
The goal of this workshop is, firstly, to define the scope of this project in terms of objects to detect in order to best serve the public interest. Secondly, we want to define the best method to ensure trustable results based on aerial or satellite imagery. Finally, we bring together domain expert from different fields in order to create a project that serves as many actors as possible.

Call for Participation description (pdf)

Component Recognition Tool to Display Technical Information in the Users Field of View – Contextual View and Mixed Reality Technical Information

In terms of industry 4.0 and the digitalization the application of mixed reality technologies such as augmented reality gain increasing importance within the field of technical communication. These technologies provide user-friendly and context-sensitive technical information i.e. intelligent information for users. The aim is to develop an application that can be used with mobiles phones, tablets and augmented reality glasses for users to obtain technical information in real device context enriched with virtual AR elements. The solution and outcome shall be gained in an
transdisciplinary project in cooperation with one or more industry partner(s) and as an Innosuisse project.

2022_data_innovation_alliance_CfP_challenge_description

Call for Collaboration: Multimodal AI – combining text, images video, audio and structured as well as graph-based data

Opportunity

An exciting frontier in current AI research and application is building systems which can combine multiple modalities such as text, images, video, audio and structured as well as graph-based data, called Multimodal AI. This allows to gain a deeper understanding of the tasks at hand and thus achieve solutions of much higher accuracy or for much more complex problems than was previously possible.

Two research groups at the recently founded Centre for Artificial Intelligence (CAI) at the Zurich University of Applied Sciences ZHAW, the Computer Vision, Perception and Cognition (CVPC) as well as the Natural Language Processing (NLP) groups, would like to join forces in the domain of multimodal AI. Both teams have many years of experience in their respective domains, evidenced by successful applied research projects with industrial and academic partners, as well as realized products and scientific publications.

Call for collaboration

We are looking for project partners and research collaborations which provide real world challenges to solve with multimodal deep learning approaches by digesting data from multiple sources such as text, audio, images, video as well as tabular data. In particular, we would like to partner with a company that has a suitable use case, and potentially with other researchers, to apply for joint 3rd party funding for finding and implementing a solution for the company’s case.

Application areas

Example industrial use cases include, but are not limited to, the medical domain (e.g., combination of medical imaging with textual or tabular data related to patient histories, treatments, drugs etc.), news, tv and journalism (combining newspaper articles, news videos, press images), insurance (case data with text forms and images), engineering (descriptions, part lists, drawings, videos) and domain-specific visual question answering or web search tasks. Other emerging applications are in the areas of conversational AI, image and video search using language, autonomous robots and drones as well as multimodal assistants. All these tasks require systems which can interact with the world using all available modalities.

Background

In recent years, deep-learning based AI solutions have reached unprecedented and often better-than-human performance in many natural language processing (NLP) and computer vision (CV) tasks using highly specialized neural network architectures trained on single modality data (either text or images/video). Examples include machine translation and text generation (e.g., SuperGLUE, SQuAD), as well as image classification, object detection (e.g., ImageNet) and segmentation (e.g., PASCAL VOC, COCO) tasks.

While the achievements on these single modality learning tasks are impressive, the next important step towards more powerful AI in real-world use case scenarios is creating systems which have more cognitive abilities, because they are able to digest inputs and fuse knowledge from multiple modalities. In particular, they would ultimately be able to accumulate world knowledge or “common sense” understanding about the world.

Current tasks and architectures in multimodal AI research are mainly combining image and text data. Prominent examples include image description or “caption” generation, as well as text-to-image generation systems (e.g., OpenAI’s DALL-E [1] and GLIDE [2]) and visual question answering systems (see e.g., METER [3]).

A related research avenue is to develop more general deep learning architectures, which are not specifically optimized for a single modality, but are able to process inputs from different modalities without making domain-specific assumptions (inductive bias), e.g., Deepmind’s Perceiver networks [4] for classification tasks, as well as PerceiverIO, which is able to produce arbitrary outputs in addition to digesting arbitrary inputs [5].

 

Contacts

In case you would like to know more or want to discuss how your use case would fit into this applied research, do not hesitate to contact us.

Prof. Dr. Thilo Stadelmann
Director & head CVPC group @ CAI
thilo.stadelmann@zhaw.ch

Prof. Dr. Mark Cieliebak
Head NLP group @ CAI
mark.cieliebak@zhaw.ch

References

[1] DALL-E: https://openai.com/blog/dall-e/
[2] GLIDE: https://arxiv.org/abs/2112.10741
[3] METER: https://arxiv.org/abs/2111.02387
[4] Perceiver: https://deepmind.com/research/publications/2021/Perceiver-General-Perception-with-Iterative-Attention
[5] Perceiver IO: https://deepmind.com/research/open-source/perceiver-IO

SILVA – Satellite-based Inspection of Large Vegetated Areas

Requester: ExoLabs GmbH

ExoLabs is a spin-off of the University of Zurich and its expertise is based on over 10 years of research in the field of applied Earth Observation (EO) and software development – with the mission, to provide state-of-the-art and user-friendly EO products and services across various spatial and temporal scales. ExoLabs’ track record includes (1) active monitoring and quantification of natural resources from local to global scale, (2) build-up of EO based in-house capacities for insurances and the energy sector, and (3) big data processing and management for cantonal and federal offices. ExoLabs works closely with leading research groups from Swiss universities in their respective fields (e.g. with UZH, ETHZ, FHNW, HLU), and are/were involved in projects with national and international institutions in the EO sector.

Description

Almost a third of Switzerland is covered with forests. These forests are exposed to severe climatic changes. For example, the atmosphere near the ground warmed up by around 2.1°C in the last 170 years – almost twice as much as the rise in mean global temperature. This puts increased pressure on forest ecosystems, resulting in a change of tree species compositions, more risk in general and more risk of biotic and abiotic damage, especially.

To develop adaptation and coping strategies and to provide stakeholders and decisionmakers the necessary information, a continuous and objective forest monitoring approach is essential. In the frame of the SILVA project (Satellite-based Inspection of Large Vegetated Areas – https://www.exolabs.ch/silva) – a satellite-based observation system that provides continuous and comprehensive information on the current forest conditions in Switzerland is developed together with ETHZ and WSL. The main objective of SILVA is to detect forest disturbances in near real-time with daily updates in a high spatial resolution of 20 m for the entire area of Switzerland and its surroundings.

Requested Support

For a successful implementation, the consideration of user requirements is of particular importance. This is a challenge because the product is intended to address very different target groups – from government authorities to environmental organizations to insurance companies. With this call, we are looking for partners who are interested in a pilot project or who want to make themselves available as beta users. The evaluation of the functionalities and the related feedback as well as the validation of the reliability are in the foreground. Based on these findings, an application is being sought to transfer the methodology to Europe and to be able to further develop it to TRL 8/9. In this context, expansions of the existing consortium are also possible.

Additional information

Several federal policies advocate the sustainable forest management to achieve the objectives set out in the Kyoto Protocol, the Paris Agreement, the Swiss Energy Strategy 2050, the Swiss Nuclear Energy Act, the Sustainable Development Strategy, the Forestry Policy 2020, the Wood Resource Policy and the 2000-Watt Society, to name just a few. Within the Forest Policy 2020, forest conservation is a primary objective to preserve its wide-ranging services in terms of protection against natural hazards and as supplier of wood, recreational space, habitats for animals and plants, and drinking-water under changing climate conditions. To ensure these 4 conservation efforts, several measures and regular forest monitoring shall provide an uptodate and meaningful inventory.

The importance of forests can also be assessed by its economic value. In 2016, the wood market in Switzerland was offering employment for more than 100’000 persons in the forestry economy and in the industry of wood transformation1. The organization “Economie Forestière Suisse” (EFS) represents 3’500 public entities and 250’000 private forest owners. About 90% of the wood production capacity comes from the members of the association “Holzindustrie Schweiz”. Every year, between 7 and 8 million cubic meters of wood can be harvested in Switzerland without overexploitation, out of those only 5 million cubic meters with a value of more than 400 million CHF are actually collected. The gross added value of the Swiss forest economy and wood industry amounts to CHF 4.5 billion per year.

The main innovations central to this call for participation include the following thematic and technical aspects:
• Current forest monitoring efforts in Switzerland can be significantly improved by including daily satellite observation to quantify changes over vast areas in a high spatial resolution
• Our near real-time monitoring service is based on a multi-sensor approach, which fuses data from optical (Sentinel-2 & 3, Landsat 7/8) and SAR (Sentinel-1) sensors with different spatial, spectral and temporal characteristics to maximize the spatiotemporal resolution and observational coverage
• Our consortium includes deep machine learning experts from the ETHZ/EcoVision Lab, federal forestry experts from WSL and EO-data processing experts from ExoLabs to ensure the optimal accuracy, thematic relevance, and highest reliability of our services.
• Next to novel data products on near real-time changes and annual assessments in forests, we additionally focus on different distribution services to provide actionable information to pilot users.


Figure: Satellite based monitoring of different types of forest disturbances

ESG Model Validation

Ascentys operates in the ESG (Environmental, Social and Governance) space.

We have defined a model of 360 KPIs across each aspect of ESG and use technology to systematically extract the relevant data from customers documents. We then use the data to inform and calculate each KPI.
As a result, we are able to automate ESG assessments, finally making it possible for companies of all sizes to measure their ESG performance in a way that is easy, cost-effective and accurate.

For more information see also the Introduction to Ascentys V4 (short).

Context:

Until recently, a company’s performance was measured primarily by how well it does financially. Over the past decades, stakeholders (shareholders, employees, customers, regulators etc.) have pushed for a more holistic approach to determine how well a company performs.
This approach has grown into what is now commonly referred to as ESG (Environmental, Social and Governance). ESG considers how well a company is run, its role and impact on the Environment and Society.

Idea: We are building a tool to automatically assess the ESG (Environmental, Social and Governance) performance of companies. We ingest data and documents (HR files, Policies, corporate presentation, etc.) from customers and using AI we extract ESG-relevant information. We then feed that information to our model of 360 ESG KPIs (120 for each of the Environmental, Social and Governance domains) The model will process this data and produce an assessment with a rank, a score for each KPI.

Challenge:

Since we are assessing companies across such a broad scope, we want to make sure that
we carry out this assessment in a way which:
– is ethical and free of bias
– is fair and adequate in company assessment and rating
– includes the latest thinking and best practices from academia


Requested Support:

We are looking for research partners in academia to review, validate, examine, and expand
the KPIs, as well as the method used to assess and rate each KPI, especially in the following
areas:
– “Diversity, Equity & Inclusion”. E.g., support from someone involved in social studies
in the workplace regarding topics such as gender equality, disability, and inclusion.
– “Environmental Impact”, especially about carbon footprint measurement, considering
relatively new topics such as impact of Working from Home or measuring indirect
emissions (Scope 2 & 3).
– “Tax Transparency”, especially to devise a formula to assess a company’s potential
lack of tax transparency or on the other hand, potential excess of “Tax optimization”.