GDPR Data Discovery: The Complete Guide for Microsoft 365 Organisations

GDPR data discovery is the process of finding, classifying, and managing personal data that lives in your organisation's systems - so you can protect it, delete what is no longer needed, and demonstrate compliance with the General Data Protection Regulation. For most organisations today, that data hides in Microsoft 365: in email attachments, OneDrive folders, SharePoint sites, and Teams conversations. This guide explains what GDPR data discovery actually involves, where personal data tends to accumulate in Microsoft 365, and the practical steps for finding and cleaning it up.

Key takeaways

  • GDPR data discovery is a continuous process, not a one-time project. New personal data is created every day, so discovery must be ongoing to keep your data inventory accurate.
  • Personal data hides in unexpected places. Across most Microsoft 365 environments, personal data is scattered across email attachments, personal OneDrive folders, shared SharePoint sites, and Teams chats - often without anyone realising it.
  • Finding personal data is harder than it sounds. Much of the data hides in scanned documents, images, and attachments that simple search tools cannot read. Effective data discovery requires technology that can look beyond plain text.
  • Cleaning up takes more than technology. Even once you can see where personal data lives, deleting and restructuring it requires employee involvement and clear management support.
  • Simplicity beats sophistication. A good GDPR data discovery tool stays focused on the core task: finding the personal data that actually matters, presenting it without legal jargon, and giving employees a clear, intuitive way to act on it. Tools with too many options, too much configuration, or too much complexity end up unused.
  • Low false positives matter more than feature lists. A tool that flags everything as "potentially sensitive" creates noise that employees learn to ignore. Discovery tools should be precise enough that employees trust the results.
  • GDPR data discovery is not just a compliance exercise. Done well, it reduces security risk, lowers storage costs, and gives leadership a clear picture of where personal data lives across the organisation.

What is GDPR data discovery?

GDPR data discovery is the process of identifying, classifying, and managing personal data in your organisation's systems. It exists because most organisations - even those with good policies on paper - have no clear view of where personal data actually sits across their files, emails, and shared drives.

In practical terms, GDPR data discovery answers three questions:

1. Where is personal data stored? Personal data is rarely kept in one neat place. It moves into email attachments, ends up in shared folders, sits in old documents, and follows employees across systems.
2. What kind of personal data is it? Not all personal data carries the same risk. A CPR number, a passport copy, or a piece of health information carries far more risk than an internal phone list. Classification by type and sensitivity is what makes the data actionable.
3. What should happen to it now? Once you know what you have and where, you can decide what to keep, what to delete, and what to restructure.

This last point is what separates GDPR data discovery from general data mapping or pure scanning. A data discovery exercise that only identifies data without leading to action is incomplete. The point of finding personal data is to do something useful with it - typically deleting what is no longer needed, moving what belongs somewhere else, or documenting what serves a legitimate purpose.

Data discovery is different from data mapping

The terms are often used interchangeably, but there is a useful distinction:

- Data mapping focuses on understanding flows of personal data - where it comes from, where it goes, who has access, and which legal bases apply. It is largely a documentation exercise and forms part of your Records of Processing Activities (RoPA) under Article 30 GDPR.
- Data discovery focuses on locating personal data that *already exists* in your systems - particularly the data that you may not know about. It is largely an operational exercise.

Most organisations need both. Data mapping documents what should be happening; data discovery shows what actually is.

Data discovery is different from DLP

Data Loss Prevention (DLP) tools are designed to prevent personal data from leaving your organisation - blocking outbound emails containing credit card numbers, for example. Data discovery, by contrast, focuses on personal data that is already inside your systems and has often been there for years. Both have a role to play, but they solve different problems. A DLP tool will not help you find a folder full of old passport copies; a data discovery tool will not stop someone from emailing one out today.

Why does GDPR data discovery matter?

Under the General Data Protection Regulation, organisations are responsible for protecting personal data throughout its entire lifecycle - from collection to deletion. That responsibility does not disappear because data has been forgotten in an old email or buried in a shared folder. If the data exists, it is your obligation to handle it correctly.

GDPR data discovery matters because most organisations cannot meet that obligation without it. You cannot protect, delete, or document data that you do not know exists.

The core GDPR principles that depend on data discovery

Several core principles of the GDPR are difficult or impossible to honour without ongoing data discovery:

- Data minimisation (Article 5). You should keep only personal data necessary for a clear, defined purpose. In practice, that requires knowing what data you actually hold - and being able to identify the data that no longer serves a purpose.
- Storage limitation (Article 5). Personal data should not be kept longer than needed. Without visibility into where data lives and how old it is, retention policies stay on paper rather than being enforced in practice.
- Accountability (Article 5). You must be able to demonstrate compliance. That means producing documentation of the personal data you process, where it is stored, and the decisions made about it.
- Integrity and confidentiality (Article 32). Personal data must be protected with appropriate security measures. You cannot secure data effectively if you do not know where it is.

Bridging the gap between privacy policy and reality

Most organisations have a privacy policy that defines how personal data should be handled, how long it can be kept, and where it can be stored. The challenge is making that policy live in practice.

This is the gap that GDPR data discovery closes. By continuously identifying where personal data actually sits in your systems, you can spot the cases where reality has drifted away from policy - and act on them. Over time, this turns GDPR compliance from a one-time documentation exercise into an ongoing operational practice.

GDPR data discovery and the move to AI

A newer reason data discovery has become important is the rise of AI inside Microsoft 365. Tools like Microsoft Copilot can search across emails, documents, and shared drives - which means any personal data sitting in those systems is now potentially within reach of AI-generated outputs and AI training data.

For most organisations, that raises a question worth answering before AI is rolled out broadly: do we know what personal data is in the environment where AI is about to operate? Cleaning up unnecessary personal data first is the safer path - it reduces the risk of AI surfacing sensitive information unexpectedly, and removes a category of data that should not be feeding AI in the first place.

GDPR data discovery is therefore both a compliance exercise and a preparation step for the next generation of workplace tools.

Where personal data hides in Microsoft 365

For most organisations, the vast majority of personal data is not stored in a structured database or a dedicated CRM. It lives in the everyday workspace: emails, documents, shared folders, and chats. Microsoft 365 is where most of that everyday work happens - and consequently where most of the personal data hides.

Understanding where the data tends to accumulate is the first step toward actually finding it. Below are the places we most often see personal data hiding across Microsoft 365 environments.

Email attachments in Exchange

Email is the single biggest source of forgotten personal data in most organisations. People send and receive CVs, signed contracts, passport copies, sick notes, and financial documents - and once the email is read, the attachment sits in the inbox indefinitely.

A few patterns we see repeatedly:

- HR inboxes containing years of applicant data. CVs, references, and identity documents from candidates who were never hired.
- Manager inboxes with sensitive employee data. Sick notes, performance documents, and salary information that should ideally never have been sent by email in the first place.
- Finance inboxes with payment and identity data. Credit card details, bank statements, and ID documents sent for verification.
- Sent items. Often overlooked, but contains a complete archive of every sensitive attachment the employee has ever sent.

Exchange In-Place Archive is another common hiding place. When email mailboxes hit storage limits, older emails get archived automatically - and out of sight. The personal data is still there, still your responsibility, but rarely reviewed.

Personal OneDrive folders

OneDrive is positioned as personal storage, but in practice it accumulates a mix of personal and professional documents. Employees save downloaded attachments, draft documents, scanned copies, and personal notes that often contain personal data.

The challenge with OneDrive is that the data is by design private to each employee. As an administrator, you cannot simply browse through it. That is why employee-led discovery and clean-up matters - the people who know what is in their OneDrive are the only ones who can decide what to do with it.

SharePoint sites and shared drives

SharePoint is where the cross-organisational sharing happens, and consequently, where data tends to spread. A site set up for a specific project ends up containing personal data long after the project ended. Documents are uploaded once and then forgotten as people move on to other things.

Common patterns:

- Old project sites with personal data from clients, partners, or candidates.
- Department drives with HR records, employee onboarding documents, and reviews.
- Team sites with broad access where personal data sits visible to many more people than it should.
- Files from Teams. When someone shares a file in a Teams channel or chat, it is stored in SharePoint or OneDrive in the background. That means Teams-shared files are also part of the data discovery picture - even though Teams chat messages themselves are not.

Scanned documents, images, and screenshots

This is the category that traditional search tools miss. A scanned passport, a screenshot of a credit card, a photo of a driving licence - all of these contain personal data, but the data is not text. It is pixels.

Standard search and DLP tools cannot read images. That means an entire category of high-risk personal data is effectively invisible to most data clean-up efforts. A modern data discovery tool needs to look beyond plain text - reading the content of images and scanned documents, not just file names.

The data you would not think to look for

The pattern we see in most data discovery projects is that the obvious places contain less personal data than expected, and the unexpected places contain more. Examples:

- A meeting notes document that quotes someone's CPR number for context.
- A scanned ID document attached to an email thread about something else entirely.
- A "draft" email saved years ago that was never sent but still contains a copy of someone's passport.
- A signed PDF stored in a folder named "Old stuff - probably delete", containing financial account details.

These are the cases that matter most for GDPR. They are the data points that no policy explicitly authorises - and the ones most likely to surface uncomfortably during an audit or a breach investigation.

What kinds of personal data should you look for?

GDPR applies to all personal data - but in practice, not all personal data carries the same risk. A name in an email signature is technically personal data; so is a passport copy stored on a shared drive. The difference in real-world risk is enormous.

An effective GDPR data discovery effort focuses on the categories of personal data that pose the greatest risk if they are mishandled, exposed, or stored too long. This is the risk-based approach that the GDPR itself encourages, and it is the most practical way to direct your clean-up efforts where they matter most.

High-risk categories worth prioritising

The categories below are typically present in large volumes across most organisations and pose elevated risk under the GDPR:

- Personal identification numbers. National identity numbers such as CPR numbers (Denmark), personnummer (Sweden and Norway), or equivalent identifiers in other countries. These are widely used for authentication and identity verification, making them high-value targets in case of a breach.
- Credit card numbers. Often attached to emails for payment verification, refund requests, or financial documentation. Particularly sensitive due to direct financial risk and PCI-DSS implications.
- Official identification documents. Passport copies, driver's licences, and similar documents - often attached to emails for verification purposes and then forgotten.
- Sensitive health information. Diagnoses, sick notes, or other health-related documents that fall under GDPR's special category data (Article 9), requiring stronger protection.
- Criminal records. Background checks and criminal record extracts, also classified as special category data with stricter handling requirements.
- HR-related documents. CVs and similar documents that typically combine multiple categories of personal data in a single file.

These are the categories where a small number of files can carry significant risk - and where finding and acting on them quickly delivers the most compliance value.

Why basic identifiers are not always the right focus

Names, email addresses, phone numbers, and home addresses are technically personal data under GDPR, but treating them as the primary focus of data discovery often does more harm than good. Three reasons:

- Volume. Basic identifiers appear in virtually every email, document, and system in your organisation. Flagging all of them creates so much noise that meaningful patterns disappear.
- Context. A name in an email signature is not the same as a name on a list of sick employees. The risk lives in the combination of data, not the data point itself.
- Action. Most basic identifiers cannot be deleted without breaking legitimate business operations. Focusing on them creates work without reducing risk.

This is why a risk-based approach starts with the categories that actually carry meaningful risk - and where action can be taken without disrupting day-to-day business.

Local variations matter

Personal data does not look the same in every country. A CPR number in Denmark, a personnummer in Sweden, a National Insurance number in the UK, a Personalausweis number in Germany, and a PESEL number in Poland are all national identity numbers - but the formats are completely different.

For organisations operating across multiple Northern European markets, effective GDPR data discovery means recognising the right formats for each country. A tool built only for one country's data formats will systematically miss high-risk personal data in the others.

The 4 steps of GDPR data discovery

A well-run GDPR data discovery effort follows a clear sequence. You connect to your systems, you scan and classify, you review and act, and then you keep going. The steps are simple in principle - the difference between a project that works and one that stalls comes down to how each step is set up.

Step 1: Connect to your systems

The first step is to connect your data discovery tool to the systems where personal data actually resides. For most organisations today, that means Microsoft 365 - Exchange (including In-Place Archive), OneDrive, SharePoint, and the files shared through Teams.

A few things matter at this stage:

- Coverage. All relevant Microsoft 365 services should be included from the start. Leaving out OneDrive or Exchange In-Place Archive creates blind spots that undermine the entire effort.
- Permissions. The tool needs read access to scan content, and write access to act on deletions when employees decide to remove a file. Permissions should be scoped strictly to what is needed and nothing more - both for privacy and for security.
- Setup time. A modern tool should connect to Microsoft 365 in minutes, not days. Complex setup is one of the most common reasons data discovery projects get delayed or abandoned.

Step 2: Scan and classify

Once connected, the tool scans your environment for personal data and classifies what it finds by type and sensitivity. This is the technical heart of data discovery.

What makes this step succeed or fail:

- Reading beyond text. Much of the highest-risk personal data lives in scanned documents, images, and PDFs. A tool that only reads plain text will miss the categories that matter most.
- Format awareness. National identity numbers, passport numbers, and other regulated formats vary by country. Detection logic needs to recognise the formats relevant to where your organisation operates.
- Low false positives. A tool that flags everything as potentially sensitive teaches employees to ignore it. Precision matters more than catching every possible match - a smaller list of accurate findings is far more useful than a long list of noise.

The output of this step is a clear picture of where personal data sits across your environment, broken down by type, sensitivity, and location.

Step 3: Review and act

This is the step where most data discovery projects either deliver value or quietly stall. Identifying personal data is only useful if something happens with it afterwards.

Two patterns work well in practice:

- Employee-driven decisions. The people who created or work with a file are usually the best placed to decide whether it should be kept, deleted, or moved. They know the context that no scanning tool can see. Routing findings to the relevant employee - rather than asking a single administrator to review thousands of files - is what makes clean-up actually happen.
- Clear, simple choices. Employees should not need to read legal text to act. The best discovery tools present findings in plain language and offer a small number of clear actions: delete, keep, decide later. Too many options or too much complexity is the fastest way to make people disengage.

Behind the scenes, administrators set the policies that determine what counts as a violation, what data should be excluded, and how often employees should be prompted to act. The combination of administrator policy and employee action is what turns data discovery from a one-off scan into a working process.

Step 4: Keep going

GDPR data discovery is not something you finish. New emails arrive, new documents are created, new shared sites are spun up. Personal data accumulates continuously, and so the discovery process needs to be continuous too.

In practice, this means:

- Initial full scan. The first scan covers everything in your Microsoft 365 environment. This establishes the baseline and surfaces the historical data that has accumulated over the years.
- Daily updates. After that, the tool only needs to look at what has changed - new files, modified files, new shared sites. This keeps the picture current without rescanning the entire environment every day.
- Periodic campaigns. Continuous scanning works best when paired with periodic employee engagement campaigns. Sending a reminder to relevant employees once a month or once a quarter keeps clean-up moving without becoming intrusive.

Done this way, GDPR compliance shifts from a one-off project into something that runs quietly in the background of normal work.

Manual vs automated data discovery

There are two ways to approach GDPR data discovery: manually or with a dedicated tool. Manual discovery has its place - some organisations run dedicated "deletion days" where teams sit down to find and clean up personal data - but the results are typically disappointing. People find something, but rarely close to everything, and the time invested often does not match the value returned.

What manual data discovery looks like

A manual data discovery effort typically means asking every employee in the organisation to go through their own mailbox and drives looking for personal data. In a Microsoft 365 environment, this could mean:

- Asking employees to search their own mailboxes for keywords like "passport", "CPR", or "credit card"
- Asking them to review their own folders in OneDrive
- Running organisation-wide "deletion days" where everyone is asked to spend hours cleaning up
- Compiling the results into spreadsheets or reports for documentation

The approach produces results, but the results are uneven. The fundamental problem is not effort - it is that manual methods cannot see the personal data that matters most:

- Search misses most of the highest-risk data. Standard search cannot read scanned documents, images, or PDFs - which is exactly where high-risk personal data often hides. You can spend a full day searching mailboxes and still miss the passport copies sitting as image attachments.
- It is slow. A thorough manual review of a mid-sized Microsoft 365 environment can take weeks or months of dedicated work - and even then, large gaps remain.
- It goes out of date immediately. Even if you complete a manual review, new personal data is being created every day. Without continuous scanning, the picture you built is already outdated by the time you finish.
- It puts the burden in the wrong place. Asking each employee to manually find personal data in their own mailbox and drives is reasonable in principle - employees know their own data best. But without a tool to surface what is worth reviewing, employees end up either reviewing nothing or trying to review everything. Neither works.

What automated data discovery does differently

An automated data discovery tool scans your systems continuously, identifies personal data including the data hidden in images and scanned documents, and presents findings in a structured way that lets employees and administrators act on them.

The shift is not just about speed. It is about making something that used to be impossible practical:

- Comprehensive coverage of all relevant Microsoft 365 services, including the formats that manual search cannot see.
- Continuous updates that keep the picture current without rerunning the whole exercise from scratch.
- A clean workflow that routes findings to the right people with the right information to act on them.
- Documented decisions, so you have evidence of what was found, what was kept, and what was deleted - the records that matter when a regulator asks.

The role of simplicity

The argument for automated data discovery is straightforward. The harder question is which tool to choose - and that comes down to whether the tool is actually simple enough to use.

Plenty of data discovery tools exist on paper. In practice, many of them are so complex that they require dedicated specialists to operate, generate so much noise that the results are ignored, or produce reports that no employee will ever read.

A tool that is hard to use ends up unused. The most valuable data discovery tools are the ones that hide their complexity behind a clear, intuitive interface - both for administrators setting policies and for employees acting on findings. Simplicity is not a feature; it is what determines whether the effort produces results or not.

What to look for in a GDPR data discovery tool

If you are evaluating GDPR data discovery tools, the feature lists tend to look similar. The differences that actually matter often only become visible when the tool is in use. Below are the criteria worth weighing carefully - based on what we see make the difference between projects that deliver value and those that stall.

Depth of Microsoft 365 integration

If most of your personal data lives in Microsoft 365, your discovery tool needs to cover all of it. That means Exchange (including In-Place Archive), OneDrive, SharePoint, and the files shared through Teams. Anything less leaves blind spots.

Generic data discovery tools that connect to "all major cloud services" often turn out to have shallow integration with Microsoft 365 specifically. A tool built around Microsoft 365 will typically deliver better results than one that treats it as one option among many.

Ability to read scanned documents and images

This is the criterion that separates basic search tools from genuine data discovery tools. A meaningful share of the highest-risk personal data - passport copies, scanned ID documents, photographs of credit cards - lives in image form rather than as plain text.

A discovery tool that cannot read images will systematically miss this category. When evaluating tools, ask specifically whether they can detect personal data inside scanned PDFs, images, and screenshots - and how reliably.

Low false positive rate

Detection accuracy is the criterion most prospective buyers underestimate - until they start using a tool that generates thousands of false positives.

A tool that flags every number that looks vaguely like a CPR number is technically thorough, but practically useless. Employees stop trusting the system, ignore the results, and disengage from the clean-up effort. Precision matters more than the size of the catch.

When evaluating a tool, ask for benchmarks on false positive rates and, if possible, run it on a sample of your own data to see how it performs.

Employee-led workflow

Personal data clean-up only happens when the people who own the data are involved. A tool designed only for administrators - where one or two people review thousands of files - rarely produces meaningful clean-up.

Look for tools that:

- Route findings to the relevant employee, not to a central queue
- Present findings in plain language without legal jargon
- Offer a small number of clear action choices
- Make it easy to take action without leaving the tool

The goal is to make the right action the easy action.

Multi-country format support

Personal data formats vary by country. A CPR number in Denmark is structurally different from a personnummer in Sweden, a National Insurance number in the UK, a Personalausweis number in Germany, or a PESEL number in Poland.

For organisations operating across Northern Europe, a discovery tool needs to recognise the personal data formats of each country it operates in. A tool built only for one country's formats will systematically miss high-risk data in the others.

Speed of setup

The first few hours with a data discovery tool tell you a lot. A modern tool should connect to Microsoft 365 and start producing results within a short setup time - typically around ten minutes from connection to first findings.

Long setup processes are a warning sign. They typically reflect a tool that requires expert configuration to work at all, which often correlates with ongoing complexity in daily use. If a tool takes a specialist to set up, it probably takes a specialist to run.

Data handling and security

Your data discovery tool will, by definition, have access to your most sensitive content. The way the tool handles that data matters as much as what it finds.

Specific things to look for:

- Where data is processed. Some tools store copies of file content in their own systems for ongoing analysis - which means your personal data now lives in two places instead of one.
- What is retained. Tools that retain only metadata, not file content, are typically lower risk than tools that retain the underlying files.
- Security certifications. ISO 27001 and similar certifications are a reasonable baseline expectation for any tool that handles sensitive personal data.
- Use of third-party AI. Some tools send your data to third-party large language models for classification. This may or may not be acceptable depending on your privacy posture - but it is worth knowing.

Simplicity, end to end

The most important criterion is also the hardest to evaluate from a feature list. A good GDPR data discovery tool is simple enough that:

- An administrator can configure it without dedicated training
- An employee can take action on a finding without reading a manual
- A privacy officer can demonstrate compliance without compiling reports manually

Tools that are simple to use get used. Tools that are not simple end up as another expensive piece of software that sits in the corner of the compliance budget without delivering results.

Common challenges and how to overcome them

The hardest part of GDPR data discovery is not the technical side. Modern tools handle the scanning and classification reliably. The harder part is what happens around the technology - how the work is prioritised, how employees engage, and how the findings translate into actual change.

Below are the challenges we see most often, and what tends to work in practice.

The effort is not prioritised highly enough

The most common reason GDPR data discovery projects underdeliver is not technical - it is organisational. When data discovery is run as a side project owned by a single person in compliance or IT, without clear backing from leadership, employees treat it as optional. The result is low engagement, slow clean-up, and a sense that the effort is more bureaucratic than useful.

The pattern we see consistently: data discovery delivers strong results when leadership clearly communicates that it matters - not as a one-off announcement, but as an ongoing message tied to broader goals around data protection, security, and trust. When employees understand that their leadership genuinely cares about the outcome, engagement follows.

In practice, this looks like:

- A short message from the CEO or relevant leader when the effort starts
- Regular check-ins where progress is mentioned as part of normal business updates
- Recognising departments or teams that make meaningful progress
- Connecting data discovery to other strategic priorities the organisation already cares about

The technical setup of a discovery tool can be done in a day. The organisational setup is what determines whether the effort produces results over time.

Exceptions are used to avoid restructuring

A second common pattern is what we sometimes call "exception creep". When data discovery surfaces a problem - say, a SharePoint site with personal data that should not be there - the easy answer is to add an exception that excludes it from future scanning. The harder answer is to restructure how the data is stored so the underlying problem is resolved.

Over time, organisations that default to exceptions end up with a long list of "areas we have decided to ignore", which slowly undermines the value of the discovery effort. The reports look cleaner, but the actual data landscape has not improved.

The better approach is to treat each significant finding as an opportunity to ask: should this data be here at all? Is there a clearer place for it? Can we restructure so the policy applies automatically instead of through an exception? Restructuring is more work upfront, but it produces lasting improvements rather than a growing list of carve-outs.

Exceptions should be used deliberately - for cases where the data genuinely belongs where it is, or where restructuring is not realistic. They should not be the default response to a problem the tool surfaced.

Employees disengage when the workflow is unclear

If employees do not understand what they are being asked to do, or if the tool presents findings in legal language that takes effort to interpret, engagement drops quickly. People do not refuse to help with data clean-up out of bad faith - they refuse because the request feels confusing, time-consuming, or unimportant.

What works better:

- Plain language in every employee-facing interaction
- A small number of clear action choices, not a long list of options
- Clear context for each finding - what was found, why it matters, and what to do
- Short, focused requests rather than overwhelming lists

The principle is the same as for any internal communication: if you make the right action the easy action, people take it.

The dashboard is watched, but no one acts on it

Modern data discovery tools give administrators a live picture of personal data across the environment - updated continuously, broken down by type, sensitivity, and location. The risk is treating that picture as the end goal.

A dashboard that shows you the state of your data is valuable, but only if findings translate into actual decisions and clean-up. We see organisations where administrators log in regularly, see the same findings month after month, and conclude that "we have it under control because we can see it". The findings are not new because the underlying data has not changed - and the underlying data has not changed because no one has acted on it.

The shift that matters is treating discovery findings as a continuous trigger for action, not as a report card. Each finding is a small decision that needs to be made by someone: keep, delete, restructure. When that loop runs - when employees actually engage with their own data and act on it - the dashboard changes over time. When it does not, the dashboard becomes a static snapshot of a problem that no one is solving.

Visibility is the prerequisite. Action is the product.

Frequently asked questions about GDPR data discovery

What is the difference between GDPR data discovery and data mapping?

Data discovery focuses on locating personal data that already exists in your systems - particularly the data you may not know is there. Data mapping focuses on documenting how personal data flows through your organisation: where it comes from, where it goes, who has access, and what legal bases apply. Most organisations need both. Data mapping describes what should be happening; data discovery shows what is actually happening.

Is GDPR data discovery required by law?

The GDPR does not require data discovery as a named activity, but it does require organisations to know what personal data they process, where it is stored, and to demonstrate that data minimisation and storage limitation principles are being followed. In practice, meeting these obligations is not realistic without some form of ongoing data discovery - manual or automated.

How often should GDPR data discovery be performed?

GDPR data discovery should be continuous rather than periodic. New personal data is created every day, and a one-off scan goes out of date almost immediately. A reasonable pattern is an initial full scan to establish a baseline, followed by daily updates that pick up new and changed files. Periodic employee engagement campaigns - monthly or quarterly - keep clean-up moving without overwhelming people.

How long does GDPR data discovery take?

The initial scan of a Microsoft 365 environment typically takes one to two weeks, depending on how much data has accumulated over the years. This is not work for your team - the scanning runs in the background while you carry on with everyday business. Connecting the tool to Microsoft 365 itself takes around ten minutes. The longer timeline is the clean-up that follows: actually reviewing, deleting, and restructuring personal data is the work that takes weeks or months, and that part requires ongoing employee involvement.

Can Microsoft Purview do GDPR data discovery?

Microsoft Purview includes data classification and discovery capabilities, but it is built for centralised IT governance rather than for ongoing GDPR-focused personal data clean-up. Findings sit with administrators, not with the employees who own the data and can make informed decisions about it. Purview is also stronger as a DLP tool - preventing personal data from leaving the organisation - than as a tool for cleaning up the personal data that has already accumulated. Whether Purview is enough depends on your organisation's compliance approach and what kind of clean-up workflow you need.

What is the difference between data discovery and DLP?

Data Loss Prevention (DLP) tools prevent personal data from leaving your organisation in real time - for example, by blocking outbound emails containing credit card numbers. Data discovery focuses on personal data that is already inside your systems, often for years. The two solve different problems. A DLP tool will not help you find a folder of old passport copies; a data discovery tool will not stop someone from emailing one out today.

Does GDPR data discovery require employee involvement?

In practice, yes. While a discovery tool can locate personal data automatically, the decisions about what to keep, delete, or restructure depend on context that only the people who own the data can provide. A scanning tool cannot tell whether a particular file still serves a legitimate business purpose - the employee who created it can. The most effective data discovery efforts combine automated scanning with employee-driven decisions.

What about AI and data discovery?

The rise of AI inside Microsoft 365 has made data discovery more important, not less. Tools like Microsoft Copilot can search across emails, documents, and shared drives, which means any personal data sitting in those systems is potentially within reach of AI-generated outputs. Cleaning up unnecessary personal data before broad AI adoption reduces the risk of AI surfacing sensitive information unexpectedly, and removes a category of data that should not be feeding AI in the first place.

Bringing it all together

GDPR data discovery is not a single project with a beginning and an end. It is a continuous practice - finding personal data, deciding what to do with it, and keeping the picture current as new data is created every day.

The technical side has matured significantly. Modern tools can scan Microsoft 365 thoroughly, read content beyond plain text, and classify findings by risk with low false positive rates. The harder part remains organisational: making sure leadership backs the effort, employees understand what they are being asked to do, and the findings translate into action rather than just dashboards.

The organisations that succeed with GDPR data discovery tend to share a few things in common:

- They treat it as ongoing, not one-off. Continuous scanning paired with regular employee engagement produces lasting change.
- They keep it simple. Tools that are intuitive get used. Tools that require specialists end up unused.
- They involve the people who own the data. Employees know context that no scanning tool can see - and they are the ones who can make the right decisions.
- They focus on the data that matters. A risk-based approach, anchored in the categories that carry real GDPR risk, is more effective than trying to catalogue every name and email address in the organisation.

Done this way, GDPR data discovery becomes something that runs quietly in the background of normal work - reducing risk, supporting compliance, and preparing the Microsoft 365 environment for the next generation of AI-powered tools.

If you would like to see what GDPR data discovery looks like in practice, you can explore Sheltr Data Discovery - or browse our Help Center for practical answers to specific questions about the solution.

Næste artikel