• Home
  • About
  • People
    • Management
    • Staff
    • Scientific Board
    • Advisors
    • Members
    • Fellows
  • Partners
    • Institutional Members
    • Local Open Science Initiatives
    • Other LMU Support Services
    • External Partners
    • Funders
  • Training
  • Events
  1. Training Tracks
  2. Research Cycle Handbook
  3. Plan & Design

Thanks for visiting the LMU Open Science Center–our site is still under construction for now, but we’ll be live soon!

  • Training Tracks
    • Self-Learning Catalog
      • Principles
        • Assessing Research Replicability
        • Credible Science
        • Replicability Crisis
      • Study Planning
      • Data Management
        • Maintaining Privacy with Open Data
        • Introduction to Open Data
      • Reproducible Processes
        • Advanced Git
        • Readable Code
        • Reproducible Protocols
      • Publishing Outputs
        • Open Access, Preprints, Postprints
    • Research Cycle Handbook
      • Plan & Design
      • Collect & Manage
      • Analyze & Collaborate
      • Preserve & Share
    • Educator Toolkit
  1. Training Tracks
  2. Research Cycle Handbook
  3. Plan & Design

1. Plan & Design

4. Preserve & Share3. Analyze & Collaborate2. Collect & Manage1. Plan & Design

Set the foundation for open & reliable research

Explore & Reuse Check Legal Frameworks Write Data Management Plan Design Study
Checkpoints: Study Plan Presentation & Preregistration Submission

1.1 Explore & Reuse

Any resource that inspire you or that you want to reuse and/or adapt must minimally be cited using their persistent identifiers e.g. DOIs (and otherwise URL with author, date, and time of access) and you must follow the license and/or usage agreement provided by the authors. A research output (e.g. data, code) without a license or statement granting your permission for reuse cannot legally be reused even if they appear publicly online.

  • 1.1.1. Articles
  • 1.1.2. Preregistrations
  • 1.1.3. Data
  • 1.1.4. Code
  • Review existing literature to come up with a well-founded research question. We recommend to use open source discipline-agnostic registries like OpenAlex which contains published articles, thesis, and preprints (scholarly work that are not (yet) peer-reviewed) of all disciplines, or discipline-specific open source registries such as Europe PubMed Central for life sciences preprints and published articles.
  • Use a reference manager to keep track of your bibliography. Zotero is an open source software formatting your bibliography in any desired format and that can be integrated within e.g. Microsoft Word, Google Doc, or RStudio for writing reproducible manuscripts (see 3. Analyze & Collaborate).

LEARN MORE

LMU OSC logo
Zotero logo
OSC Tutorial

Introduction to Zotero

Use an open source reference manager. (1h)

TOOLS & RESOURCES

OpenAlex logo

OpenAlex

All the world's research, connected and open.

Europe PMC logo

Europe PubMed Central

Comprehensive access to life sciences literature.

A preregistration typically consists of a hypothesis and predictions, a plan for data collection (when relevant), and a plan for data analysis, that researchers upload before starting their projects, often in order to increase the rigor of confirmatory research (see 1.4.1. Pre-analysis planning).


  • Get insight into projects that are not (yet) published, either currently ongoing or abandoned, by looking for projects that were preregistered. Projects that are left unpublished typically have a note attached to their preregistration. Some registries are discipline-specific while others are discipline-agnostic (see below).

TOOLS & RESOURCES

OSF icon

Open Science Framework

Registry of preregistrations. Widely used across fields.

AsPredicted icon

AsPredicted

Registry of simple preregistrations

PreclinicalTrials.eu icon

PreclinicalTrials.eu

Registry of preclinical animal study protocols.

animalstudyregistry.org icon

AnimalStudyRegistry.org

Registry of animal studies.

ClinicalTrials.gov icon

ClinicalTrials.gov

Registry of clinical trial protocols.

How to find existing dataset?
  • Search for discipline-specific repositories on re3data which is a central registry of all repositories
  • Exploring subject agnostic repositories such as DataCite, FigShare, Open Science Framework (OSF), or Zenodo (oftentimes with the corresponding analysis code).

These platforms either give you access to existing data or provide metadata and explanations on how to request access to the data.

NoteDefinition

Metadata are data about your data, such as author, date, measurement device, unit of measurement, context of data collection, ect.


How to reuse a dataset?
  • Review the license and data use agreement. Make sure your understand what you are allowed to do with the data and under what conditions. Even if the license does not request attribution of the authors, you have to cite the source of the data for any of your scholarly work based on it.
  • Review metadata and documentation. Make sure you know where your data comes from, how the data was collected and processed, and reflect on whether any of it poses problems for your research question.
  • Check what additional requirements the data sources have. Some requirement to access the data may be to submit a preregistration (see 1.4. Study Design & Analysis Plan).
  • Use the metadata to plan your analysis. A data dictionary (or “codebook”) and other documentation describing the variables, range of values, ect., may be available and should be reviewed. To minimize confirmation or hindsight bias, do not plot the data immediately; instead, prepare a pre-analysis plan by examining the documentation.(see 1.4. Study Design & Analysis Plan).

TOOLS & RESOURCES

re3data icon

re3data

Registry of research data repositories.

DataCite icon

DataCite Commons

Discovery tool connecting works, people, and organizations.

FigShare icon

Fig Share

General-purpose repository for data, software, reports.

OSF icon

Open Science Framework

General-purpose repository for data, materials, reports.

Zenodo icon

Zenodo

General-purpose repository for data, software, reports.

Find code available for reuse archived on Zenodo or actively developed on GitHub. Learn how to work with GitHub in more details in 3. Analyze & Collaborate or start learning Git version control now!


Important

Code publicly visible on GitHub without a license or equivalent text explicitly stating permission for reuse cannot be legally reused. It is best to ask the authors to add an open license to their repository to explicitly allow reuse (to do this, they can, for instance, add a file called LICENSE.txt with the Apache 2.0 license text - see our code publishing tutorial to learn more about licenses).

LEARN MORE

LMU OSC logo
Zotero logo
OSC Tutorial

Git Tutorial

Use version control system Git from within RStudio. (2h)

LMU OSC logo
Zotero logo
OSC Tutorial

GitHub Tutorial

Collaborative coding with GitHub and RStudio (1h)

LMU OSC logo
OSC Tutorial

Code Publishing

Add README and license to a reproducible project.

TOOLS & RESOURCES

Zenodo icon

Zenodo

General-purpose repository for data, software, reports.

GitHub icon

GitHub

Cloud-based platform to collaborate on code.

1.2 Legal Requirements

  • 1.2.1. LMU guidelines
  • 1.2.2. Funders
  • 1.2.3. Ethics

The LMU Guidelines for Safeguarding Good Scientific Practice are legally binding for all academics, researchers, research support staff, teachers, and students at LMU Munich. Only the original text in German prevails, but we provide an English summary of relevant aspects for this guide:


Appropriate level of documentation and standards to allow reproduction:

  • Reproducible methods must be used. (§11)
  • When research software is developed, its source code must be documented. (§12)
Appropriate level of documentation and standards to allow replication:
  • All information relevant to the production of a research result must be documented comprehensively to enable replication. (§7 and §12)
  • If specific professional recommendations exist for review and evaluation, the results must be documented in accordance with these respective specifications. (§12)
  • Individual results that do not support the hypothesis must also be documented; a selection of results is not permitted. (§12)
Public access to research results:
  • Apart from specific exceptions, all findings should be made public. For this, they must be described in a detailed and comprehensible manner which includes making available the research data, materials and information on which the results are based, as well as the methods used and the software employed (including appropriately licensed self-written software) according to the FAIR principles. (§13)
  • Data, material, software made publicly accessible must be appropriately archived, usually for a period of 10 years (§17).

In later sections, you will acquire skills in FAIR data management and reproducible workflow that will enable you to comply with these guidelines.


NoteDefinitions

The FAIR principles are defined as:

  • Findable: metadata should be deposited in a searchable repository and be assigned a permanent identifier
  • Accessible: the data is either open, or accessible upon some authentication process, or closed, but with open metadata.
  • Interoperable: the data is described with a standard terminology (so the dataset can be merged with other ones) and saved in a stable file format
  • Reusable: the data is richly documented (e.g. with a data dictionary) and is accompanied by a data usage license

Metadata are data about your data, such as author, date, measurement device, unit of measurement, context of data collection, ect.


See https://www.go-fair.org/fair-principles/ for more information and section 2. Collect & Manage to learn how to implement the FAIR principles in your research.

TOOLS & RESOURCES

LMU logo

LMU Guidelines for Safeguarding Good Scientific Practice

Implementation of the German Research Foundation's (DFG) Code of Conduct

  • Check all funders’ open science requirements in the call information sheets. Funders may have additional requirements on top of those indicated in the LMU guidelines. For instance, some calls request a Research Data Management plan before making their second payment, some specify the extent and timing of data sharing and provide funds for such activity.

  • Contact the LMU Research Funding Unit to review your grant proposal and assess if your proposal is meeting your funders’ open science requirements.

Data collection and analyses involving human participants or animal subjects typically require approval from Faculties ethics committees to ensure responsible conduct and the protection of data.


Your ethics proposal will typically include information on:
  • Data storage and retention – outlining how data will be securely stored, backed up, and retained over time. This information can be extracted from a more detailed Research Data Management plan (see 1.3. Research Data Management Plans).
  • Risks if the data were leaked – identifying potential consequences for participants or the research project if confidentiality is breached.
  • Data anonymization – describing procedures to remove or obscure personally identifiable information to protect participant privacy (see 2.3.2. Anonymization for options, from simple techniques of anonymization to the creation of synthetic data).
  • Informed consent forms language – ensuring that participants clearly understand the purpose, procedures, and any potential risks of the study. Conditions for sharing their data should be clearly explained here (see 2.3.1. Informed Consent).
  • Power analysis to justify sample size – providing a statistical rationale for the number of participants, which supports the validity and ethical justification of the study. This, and more detailed information on the statistical plan, can be extracted from your pre-analysis plan (see 1.4.1. Pre-analysis planning and 1.4.3. Power analyses).

For data protection guidance, contact the LMU Data Protection Officer or the Research Data Management team of the University Library.


TipTip for research groups to streamline this process
  • Share templates and example resources amongst team members. Include e.g. previously approved ethics proposal language, approved Data Protection Impact Assessment forms, Data Management Plans on a common server space.
  • Create Standard Operating Procedures for the team for processed such as appropriate anonymization technique for a specific data type, power analyses script for a specific kind of common analysis, define when a data management plan must be updated, who is responsible, and how updates are reviewed/approved and communicated.

LEARN MORE

LMU OSC logo
OSC Tutorial

Data Management Plans

Overview of components, tips, and tools. (30 min)

LMU OSC logo
OSC Tutorial

TBA: Data Anonymization

Implement data anonymization techniques in R. (X h)

LMU OSC logo
OSC Tutorial

Power Analysis

Data simulations for GLMs, LMEs, and SEMs in R. (6h)

1.3 Research Data Management Plans

A Data Management Plan (DMP) documents how you will handle research data throughout your project. Writing a DMP prompts you to think and document decisions you might otherwise leave implicit.

  • Decide before data collection whether you will eventually share your data publicly (and where), in order to (i) get ethics approval on the right plan, (ii) design consent form for participants, (iii) collect appropriate metadata for the target repository, ect.
  • Start with what you know, and refine the details as your project develops. Your DMP is a living document that you will refine to match the reality of your project while ensuring data protection and streamline collaborations (see 2.2. Data Management, 3.1. Data Processing & Analysis, and 4.1. FAIR Data Sharing).
Your DMP will ask:
  • What data will you collect or generate (types, formats, volume, sources)? See 2.1. Data Collection.
  • How will you describe it (metadata standards, documentation practices)? See 2.2. Data Management for these and the next questions.
  • How will you organize files (naming conventions, folder structure, versioning)?
  • Where will you store it (locations, backups, access controls)?
  • How will you ensure quality (validation checks, error-handling)?
  • How will you share outputs (repositories, licenses, embargo periods)? See our lecture “Why share data openly?” and 4.1. FAIR Data Sharing
  • What constraints apply (consent, anonymization, GDPR, data use agreements)? See our lecture “Maintaining privacy with open data”, 1.2.3. Ethics and 2.3. Ethics & Privacy.


The specific questions vary by discipline, data type, and funder requirements. DMP tools like RDMO guide you through the relevant questions with funder-specific templates.


TipTip for research groups to streamline this process
  • Share templates and example DMP amongst team members on a common server space.
  • Create Standard Operating Procedures for the team. Define when a data management plan must be updated, who is responsible, and how updates are reviewed/approved and communicated.

LEARN MORE

LMU OSC logo
OSC Lecture

Why share data openly?

An introduction to the what, why, and how to make data open (30 min)

LMU OSC logo
OSC Lecture

Maintaining Privacy with Open Data

How to make data open without revealing sensitive information (1h)

LMU OSC logo
OSC Tutorial

Data Management Plans

Overview of components, tips, and tools. (30 min)

TOOLS & RESOURCES

LMU OSC logo
RDMO logo
Supported at LMU

RDMO

Funder-compliant DMP templates (e.g. DFG, ERC).

RIOjournal icon

RIOjournal

Examples of DMPs by discipline.

1.4 Study Design & Analysis Plan

  • 1.4.1. Pre-analysis planning
  • 1.4.2. Simulation of Data
  • 1.4.3. Power analyses
Why should you plan your statistical plan prior to collecting data?

Humans are prone to cognitive biases such as confirmation bias (seeking information that supports existing beliefs) and hindsight bias (believing outcomes were predictable after the fact). In research, these biases can distort findings, especially when researchers make analytic decisions after seeing results. Although statistical testing typically accepts a 5% false positive rate, “researcher degrees of freedom” — choices about data collection, exclusions, transformations, sample size, covariates, etc. — can dramatically inflate false positives when decisions are made post hoc. Practices like increasing sample size until reaching statistical significance, selectively removing outliers, or trying multiple analytic strategies increase the likelihood of false-positive results. See how easy it is to find false “significant” results by using our p-hacking tool.


The core problem is that analyses guided by observed outcomes allow biases to influence decisions, making many reported effects unreliable. A key remedy is transparency and preregistration.


Benefits of preregistration

Preregistration, that is, specifying hypotheses, methods, and analysis plans before data collection or analysis, limits bias in confirmatory testing while still allowing exploratory analyses, clearly distinguishing robust hypothesis tests from hypothesis-generating work. This improves credibility, limits false positives, and often leads to better study design through early methodological feedback.


Preregistration can be beneficial for various type of studies, including:

  • experimental studies (i.e. studies with a manipulated variable): it will define what will be your confirmatory analysis and strengthens your claim
  • observational or exploratory studies: it will help you move along the exploratory-confirmatory continuum
  • qualitative studies: it will provide a way to document e.g. your positionality towards a subject in the course of a project.


What is included in a preregistration?

Several preregistration templates exist. While the standard Open Science Framework (OSF) preregistration template is most commonly used, some are tailored for specific field or specific methods (e.g. systematic review, qualitative work, secondary data analysis).


Your preregistration will define your study’s:

  • Hypothesis and predictions
  • Data collection procedures
  • Sample size and stopping rule
  • Variables (manipulated, measured, indices)
  • Statistical method (model, dependent and independent variables, covariables, transformations)
  • Data exclusion criteria
  • How to deal with missing data

A great tool to create your statistical plan, especially for early career researchers still learning statistics and needing feedback from supervisors, collaborators, or statisticians on their design, is to simulate data, and write the possible statistical tests to analyze that data (see 1.4.2. Simulation of data and 1.4.3. Power analyses). Including an analyses script (developed on simulated data) with your preregistration is optional but recommended.


To get support with pre-analysis planning, you can book a consultation with the LMU statistical consulting unit (StaBLab).


Publishing process

Once your study plan is finalized:


  • Submit your preregistration before collecting new or analyzing existing data. You can do so on discipline specific registries (see 1.1.2. Preregistrations) or discipline agnostic repositories such as the OSF.
  • Embargoe your plan if you are concerned about scooping. On the OSF, your preregistration can be kept private for a predetermined amount of time, and for a maximum of 4 years.
  • Include your pregistration’s DOI in your manuscript. Make your registration public upon the publication of your manuscript.

Creating a preregistration improves transparency and allows for valuable early feedback from collaborators. An even stronger approach is submitting preregistrations directly to journals (then called “Registered Reports”), enabling peer review at a stage where methodological adjustments are still possible.


Registered Reports

Registered Reports are a publication format, now adopted by over 300 journals (see participating journals), where preregistrations are peer-reviewed before data collection. Reviewers evaluate the hypotheses, methods, and planned analyses, allowing methodological improvements. If the plan is approved, the journal grants in-principle acceptance, meaning publication is guaranteed provided researchers follow the protocol.


After completing the study, authors add results and discussion sections, clearly separating preregistered confirmatory analyses from exploratory ones. Final review focuses on adherence to the approved plan and the validity of conclusions, not on whether results are significant. This model shifts incentives toward asking important questions and using rigorous methods rather than chasing striking outcomes.

LEARN MORE

LMU OSC logo
OSC Tutorial

TBA: Preregistration tutorial

Step-by-step guide to creating preregistration. (Xh)

TOOLS & RESOURCES

LMU OSC logo
OSC Tool

P-hacking tool

Interactive app to realize how easy it is to find false "significant" results.

COS icon

Center for Open Science

List of journals offering Registered Reports.

OSF icon

Open Science Framework

Preregistration templates, embargoes, file storage.

In our context, a computer simulation is the generation of artificial data to build up an understanding of real data and the statistical models we use to analyze them. You can simulate data to:


  • Test your statistical intuition or demonstrate mathematical properties you cannot easily anticipate.
    Example: Check whether there are more than 5% significant effects for a variable in a model when supposedly random data are generated.

  • Understand sampling theory and probability distributions or test whether you understand the underlying processes of your system.
    Example: See whether simulated data drawn from specific distributions is comparable to real data.

  • Perform power analyses.
    Example: Assess whether the sample size (within a simulation repetition) is high enough to detect a simulated effect in more than 80% of the cases. (see 1.4.3. Power analyses)

  • Prepare a pre-analysis plan.
    Example: To strengthen your planned confirmatory analyses before collecting data, consider sharing a simulated dataset with a statistician or mentor. This allows for specific feedback on suitable statistical tests. The resulting analysis code can accompany your preregistration or registered report (see 1.4.1. Pre-analysis planning) so reviewers can clearly see your intended approach. When real data are collected, they can be directly substituted into the code to generate results.


Generating an artificial dataset in R (see our simulation tutorial) is much easier than researchers often believe it to be and is often very helpful, even when you need to make assumptions about variable distribution or when the parameter space is not well known.

LEARN MORE

LMU OSC logo
R logo
OSC Tutorial

R Tutorial

Learn R programming. (3h)

LMU OSC logo
OSC Tutorial

Data simulation in R

Easy data simulations in R. (2h)

Power analysis is relevant whether you are designing a project from scratch or running an analysis on already existing data. There are two main types of power analyses:


A priori power analysis

Simulate data to calculate the smallest sample size required to detect the smallest effect of interest. See our advanced power analyses tutorial.


For a very basic power calculation, you can use simple R functions if you know 3 out of 4 of these parameters:

  • required sample size n (usually the one missing)
  • desired power (default 0.80)
  • the alpha level (default 0.05)
  • the expected effect size (has to be estimated or extracted from the literature on the form of d, f, etc.)

To get support with pre-analysis planning, you can book a consultation with the LMU statistical consulting unit (StaBLab).


Post-hoc power analysis

Compute a post-hoc power when you are not be able to control the sample size for your project. Beware: This power computation comes in two flavors - one is legitimate, and one is flawed and not defensible.


The legitimate post-hoc power is computed with your actual n, and the same effect size that you plugged into your a-priori power analysis. This analysis gives you the achieved power to detect your assumed effect.


The flawed version of post-hoc power is called “observed power”: If an analysis yields a non-significant result, some researchers calculate the post-hoc power, but plug in the observed effect size. “Observed power”, however, is just a one‑to‑one function of the p‑value (a non-significant p-value returns a low power < 50 %, a just significant p-value of .05 always yields a power of exactly 50%). Observed power adds no new information to the p-value and is essentially meaningless. Do not compute this type of post-hoc power!

LEARN MORE

LMU OSC logo
R logo
OSC Tutorial

R Tutorial

Learn R programming. (3h)

LMU OSC logo
OSC Tutorial

Power Analyses

Data simulations for GLMs, LMEs, and SEMs in R. (6h)

Plan & Design Checklist

To complete before presenting your final study plan to your research group and, if applicable, submitting your ethics proposal and/or preregistration. Not all items are relevant for all fields of research or study types.

Background Information

Study Design

Data Management Planning

Project Management

Before Data Collection

Download checklist

Ludwig-Maximilians-Universität
LMU Open Science Center

Leopoldstr. 13
80802 München

Contact

  • Prof. Dr. Felix Schönbrodt (Managing Director)
  • Dr. Malika Ihle (Coordinator)
  • OSC team

Join Us

  • Subscribe to our announcement list
  • Become a member
  • Chat with us on Matrix

Imprint | Privacy Policy | Accessibility