PowerShell Script to Find Duplicate Files and Delete Them


Thanks. I enterd c:\users\marti\desktop as the directory but the script returned,

Get-ChildItem : Cannot find path 'C:\Users\GARLIN\Downloads\' because it does not exist.
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2 26100.2894
    Computer type
    Laptop
    Manufacturer/Model
    Acer Swift SF114-34
    CPU
    Pentium Silver N6000 1.10GHz
    Memory
    4GB
    Screen Resolution
    1920 x 1080
    Hard Drives
    SSD
    Cooling
    fanless
    Internet Speed
    150 Mbps
    Browser
    Brave
    Antivirus
    Webroot Secure Anywhere
    Other Info
    System 3

    ASUS T100TA Transformer
    Processor Intel Atom Z3740 @ 1.33GHz
    Installed RAM 2.00 GB (1.89 GB usable)
    System type 32-bit operating system, x64-based processor

    Edition Windows 10 Home
    Version 22H2 build 19045.3570
  • Operating System
    Windows 11 Pro 23H2 22631.2506
    Computer type
    Laptop
    Manufacturer/Model
    HP Mini 210-1090NR PC (bought in late 2009!)
    CPU
    Atom N450 1.66GHz
    Memory
    2GB
    Browser
    Brave
    Antivirus
    Webroot
Run it on garlin’s computer. It should work. 🤣
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC12WSHi7
    CPU
    12th Gen Intel Core i7-1260P, 2100 MHz
    Motherboard
    NUC12WSBi7
    Memory
    64 GB
    Graphics Card(s)
    Intel Iris Xe
    Sound Card
    built-in Realtek HD audio
    Monitor(s) Displays
    Dell U3219Q
    Screen Resolution
    3840x2160 @ 60Hz
    Hard Drives
    Samsung SSD 990 PRO 1TB
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
    Antivirus
    Microsoft Defender
  • Operating System
    Linux Mint 21.2 (Cinnamon)
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC8i5BEH
    CPU
    Intel Core i5-8259U CPU @ 2.30GHz
    Memory
    32 GB
    Graphics card(s)
    Iris Plus 655
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC12WSHi7
    CPU
    12th Gen Intel Core i7-1260P, 2100 MHz
    Motherboard
    NUC12WSBi7
    Memory
    64 GB
    Graphics Card(s)
    Intel Iris Xe
    Sound Card
    built-in Realtek HD audio
    Monitor(s) Displays
    Dell U3219Q
    Screen Resolution
    3840x2160 @ 60Hz
    Hard Drives
    Samsung SSD 990 PRO 1TB
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
    Antivirus
    Microsoft Defender
  • Operating System
    Linux Mint 21.2 (Cinnamon)
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC8i5BEH
    CPU
    Intel Core i5-8259U CPU @ 2.30GHz
    Memory
    32 GB
    Graphics card(s)
    Iris Plus 655
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
  • Like
Reactions: OAT
Here's my updated version.
Code:
while (1) {
    $Path = Read-Host "Please Enter a Directory Path to Scan"

    if (-not (Test-Path -Type Container $Path)) {
        Write-Host "Invalid directory path, please try again.`n"
    }
    else {
        break
    }
}

# Create array as type [System.Collections.ArrayList], so we can delete items from the list.

$FilesList = [System.Collections.ArrayList]@()

Get-ChildItem $Path -Recurse -File | select FullName,LinkType,Target | foreach {
    $File = $_
    $FullName = $File.FullName

    switch($_.LinkType) {
        HardLink {
            if ($FilesList -notcontains $File.Target) {
                $FilesList += $FullName
            }
            else {
                Write-Host "Skipping Hard Link: `"$FullName`""
                $Skipped = $true
            }
        }
        SymbolicLink {
            Write-Host "Skipping Symbolic Link: `"$FullName`""
            $Skipped = $true
        }
        default {
            $FilesList += $FullName
        }
    }
}

if ($Skipped) {
    Write-Host ""
}

$HashList = [System.Collections.ArrayList]@(
    $FilesList | foreach {
        Get-FileHash -LiteralPath $_ -Algorithm MD5 | select Hash,Path
    }
)

if (($HashList | Group-Object -Property Hash | Where-Object { $_.Count -gt 1 }).Count -eq 0) {
    Write-Host "No duplicate files found."
    exit 0
}

while (1) {
    $FilenameList = @{}
    $Index = 1

    foreach ($Hash in ($HashList | Group-Object -Property Hash | Where-Object { $_.Count -gt 1 })) {
        Write-Host "MD5: $($Hash.Name)"
        foreach ($File in $Hash.Group.Path) {
            Write-Host "[$Index] `"$File`""

            #  Build list of duplicated files, in numbered order
            $FilenameList[$Index] = $File
            $Index++
        }
        Write-Host ""
    }

    $Selection = Read-Host "Pick one of the files to delete, 'q' to quit"

    if ($Selection -match 'q') {
        break
    }
    else {
        # Recast $Selection as integer to avoid problems later
        $Selection = [int]$Selection
    }

    if ($Selection -lt 1 -or $Selection -ge $Index) {
        Write-Host "$Selection is not valid, or out of range"
    }
    else {
        $DeletedFile = $FilenameList[$Selection]
        Write-Host "Deleting `"$DeletedFile`"`n"
        Remove-Item $DeletedFile -Force

        # Remove matching file from $HashList & $FilenameList arrays
        $HashList = ($HashList | where { $_.Path -notmatch [regex]::Escape($DeletedFile) })
        $FilenameList.Remove($Selection)
    }

    if ($FilenameList.Count -eq 1) {
        break
    }
}
Thanks. It works! Can I select multiple duplicate to delete if I separate then by commas? It did not seem to work.

But I found two more empty files which, obviously, generated the same hash, so I deleted them.
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2 26100.2894
    Computer type
    Laptop
    Manufacturer/Model
    Acer Swift SF114-34
    CPU
    Pentium Silver N6000 1.10GHz
    Memory
    4GB
    Screen Resolution
    1920 x 1080
    Hard Drives
    SSD
    Cooling
    fanless
    Internet Speed
    150 Mbps
    Browser
    Brave
    Antivirus
    Webroot Secure Anywhere
    Other Info
    System 3

    ASUS T100TA Transformer
    Processor Intel Atom Z3740 @ 1.33GHz
    Installed RAM 2.00 GB (1.89 GB usable)
    System type 32-bit operating system, x64-based processor

    Edition Windows 10 Home
    Version 22H2 build 19045.3570
  • Operating System
    Windows 11 Pro 23H2 22631.2506
    Computer type
    Laptop
    Manufacturer/Model
    HP Mini 210-1090NR PC (bought in late 2009!)
    CPU
    Atom N450 1.66GHz
    Memory
    2GB
    Browser
    Brave
    Antivirus
    Webroot
Do you write these scripts entirely manually or is there a tool to continually check for typos and syntax?
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2 26100.2894
    Computer type
    Laptop
    Manufacturer/Model
    Acer Swift SF114-34
    CPU
    Pentium Silver N6000 1.10GHz
    Memory
    4GB
    Screen Resolution
    1920 x 1080
    Hard Drives
    SSD
    Cooling
    fanless
    Internet Speed
    150 Mbps
    Browser
    Brave
    Antivirus
    Webroot Secure Anywhere
    Other Info
    System 3

    ASUS T100TA Transformer
    Processor Intel Atom Z3740 @ 1.33GHz
    Installed RAM 2.00 GB (1.89 GB usable)
    System type 32-bit operating system, x64-based processor

    Edition Windows 10 Home
    Version 22H2 build 19045.3570
  • Operating System
    Windows 11 Pro 23H2 22631.2506
    Computer type
    Laptop
    Manufacturer/Model
    HP Mini 210-1090NR PC (bought in late 2009!)
    CPU
    Atom N450 1.66GHz
    Memory
    2GB
    Browser
    Brave
    Antivirus
    Webroot
Do you write these scripts entirely manually or is there a tool to continually check for typos and syntax?
Visual Studio Code is quite nice for writing PowerShell. It color codes the brackets so you can see if you have more “lefts” than “rights.” It warns you if you have unused, unnecessary variables, and so on. Lots of helpful advice from a free product.
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC12WSHi7
    CPU
    12th Gen Intel Core i7-1260P, 2100 MHz
    Motherboard
    NUC12WSBi7
    Memory
    64 GB
    Graphics Card(s)
    Intel Iris Xe
    Sound Card
    built-in Realtek HD audio
    Monitor(s) Displays
    Dell U3219Q
    Screen Resolution
    3840x2160 @ 60Hz
    Hard Drives
    Samsung SSD 990 PRO 1TB
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
    Antivirus
    Microsoft Defender
  • Operating System
    Linux Mint 21.2 (Cinnamon)
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC8i5BEH
    CPU
    Intel Core i5-8259U CPU @ 2.30GHz
    Memory
    32 GB
    Graphics card(s)
    Iris Plus 655
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
As a manager, I had a young dev on my team, who was fairly intelligent, but was a terrible speller. So bad, the other devs and QA privately complained to me and thought he wasn't very smart. But I defended him.

One day I suggested he take more time reviewing his code, because it would make him look more professional. After a minute, he looked at me and smiled "But you know what? I'm making the same spelling errors all the time, otherwise my code wouldn't work!"
 

My Computer

System One

  • OS
    Windows 7
Visual Studio Code is quite nice for writing PowerShell. It color codes the brackets so you can see if you have more “lefts” than “rights.” It warns you if you have unused, unnecessary variables, and so on. Lots of helpful advice from a free product.
do the indents and outdents have to agree or is it just for elegance?
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2 26100.2894
    Computer type
    Laptop
    Manufacturer/Model
    Acer Swift SF114-34
    CPU
    Pentium Silver N6000 1.10GHz
    Memory
    4GB
    Screen Resolution
    1920 x 1080
    Hard Drives
    SSD
    Cooling
    fanless
    Internet Speed
    150 Mbps
    Browser
    Brave
    Antivirus
    Webroot Secure Anywhere
    Other Info
    System 3

    ASUS T100TA Transformer
    Processor Intel Atom Z3740 @ 1.33GHz
    Installed RAM 2.00 GB (1.89 GB usable)
    System type 32-bit operating system, x64-based processor

    Edition Windows 10 Home
    Version 22H2 build 19045.3570
  • Operating System
    Windows 11 Pro 23H2 22631.2506
    Computer type
    Laptop
    Manufacturer/Model
    HP Mini 210-1090NR PC (bought in late 2009!)
    CPU
    Atom N450 1.66GHz
    Memory
    2GB
    Browser
    Brave
    Antivirus
    Webroot
For almost every programming language, extra tabs and spaces are optional. They're encouraged to make it easier to review your code for obvious errors while you're writing it.

This is the same code.
Code:
if ($var -eq 123) {
    Write-Host "123"
}
elseif ($var2 -lt 3) {
    Write-Host "less than 3"
}
else {
    Write-Host "something else"
Code:
if ($var -eq 123) { Write-Host "123"
                }
                elseif ($var2 -lt 3) {
Write-Host "less than 3"
}
         else {
    Write-Host "something else"
 

My Computer

System One

  • OS
    Windows 7
Using the "hash everything" method, on an internal SSD, takes me about 19.8 seconds, because it's hashing that big WIM file. There is no reason to hash this file, because looking at its size, it cannot possibly be a duplicate of the other files. We only need to compare the two text files. Doing that takes 0.013 seconds.

Using a real-world example, I have a folder of 36,977 files, 80.9 GB in size, on an external spinny disk attached via USB 3. Using the "hash everything" method takes just over 15 minutes to run through this folder, and that's using MD5 as the hashing algorithm.

Using the method I outlined in #21 takes less than 2 seconds. Using @abactuon's method in #43, after I fixed it, reports similar times. Both of our methods default to SHA-2 algorithms, SHA-256 specifically. So, we are hashing more slowly, but it matters little because we're hashing only when needed.
I do agree with your points here. I am scanning a network folder now with 1200pdf over wifi (much slower then usb3). Already taking over 15min. So definitly you made some good suggestions for improvement.
 

My Computer

System One

  • OS
    Windows 11
    Computer type
    PC/Desktop
@FreeBooter and @pseymour merged you two codes, only wondering if i should have kept the hardlink/symbolic link checks.
Anyway the first scan took about 2hours to complete with code from FreeBooter.
Now with this code it takes about 1minute on a NAS over wireless.

Powershell:
function Get-DuplicateFile {
    param (
        [Parameter (Mandatory = $true)]
        [string]$Path,
        [Parameter (Mandatory = $false, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
        [string]$Filter = '*.*'
    )
   
    begin { }
   
    process {
        if (Test-Path -Path $Path) {
            Get-ChildItem -Path $Path -Filter $Filter -Force -Recurse -File -ErrorAction SilentlyContinue |
            Group-Object -Property @('Length') | Where-Object { $_.Count -gt 1 } |
            ForEach-Object {
                $_.Group | Get-FileHash | Group-Object -Property @('Hash') | Where-Object { $_.Count -gt 1 }
            }
        }
    }
   
    end { }
}

# PowerShell Script to Find Duplicate Files and Delete Them in Folder
while (1) {
    $Path = Read-Host "Please Enter a Directory Path to Scan"

    if (-not (Test-Path -Type Container $Path)) {
        Write-Host "Invalid directory path, please try again.`n"
    } else {
        break
    }
}

$DuplicateFiles = Get-DuplicateFile -Path $Path

if ($DuplicateFiles.Count -eq 0) {
    Write-Host "No duplicate files found."
    exit 0
}

while (1) {
    $FilenameList = @{}
    $Index = 1

    foreach ($Hash in $DuplicateFiles) {
        Write-Host "MD5: $($Hash.Name)"
        foreach ($File in $Hash.Group.Path) {
            Write-Host "[$Index] `"$File`""

            # Build list of duplicated files, in numbered order
            $FilenameList[$Index] = $File
            $Index++
        }
        Write-Host ""
    }

    $Selection = Read-Host "Pick one of the files to delete, 'q' to quit"

    if ($Selection -match 'q') {
        break
    } else {
        $Selection = [int]$Selection
    }

    if ($Selection -lt 1 -or $Selection -ge $Index) {
        Write-Host "$Selection is not valid, or out of range"
    } else {
        $DeletedFile = $FilenameList[$Selection]
        Write-Host "Deleting `"$DeletedFile`"`n"
        Remove-Item $DeletedFile -Force

        # Remove matching file from $DuplicateFiles and $FilenameList arrays
        $DuplicateFiles = $DuplicateFiles | Where-Object { $_.Group.Path -notcontains $DeletedFile }
        $FilenameList.Remove($Selection)
    }

    if ($FilenameList.Count -eq 1) {
        break
    }
}
 

My Computer

System One

  • OS
    Windows 11
    Computer type
    PC/Desktop
The three 'duplicates' that FB's script found in my Desktop folder were unique text files with only a few words.
According to File Explore and using a DIR command, they were all zero bytes. But they are all different and should not have had
identical hashes.

While i did my scans on my e-books, but now really maually checking each possible duplicated file, i also found Duplicate 0 File files, that actually are download wrong / copyed as zero or whatever happend to them. I also found files that are 0kb, but has only information in there filename. Sometimes i put txt files with only information about the folder, without information inside these .txt files. That way i can just read that info about the folder in the filenames, without opening those files manually.

To make that more clear to the user, i have adjusted the script to let the user manual check those files.

Powershell:
function Get-DuplicateFile {
    param (
        [Parameter (Mandatory = $true)]
        [string]$Path,
        [Parameter (Mandatory = $false, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
        [string]$Filter = '*.*'
    )

    begin { }

    process {
        if (Test-Path -Path $Path) {
            $allFiles = Get-ChildItem -Path $Path -Filter $Filter -Force -Recurse -File -ErrorAction SilentlyContinue
            $zeroSizeFiles = @()
            $fileGroupsBySize = @{}

            # Collect the zero-size files
            foreach ($file in $allFiles) {
                if ($file.Length -eq 0) {
                    $zeroSizeFiles += $file  # Collect zero-size files
                } else {
                    # Group files by size
                    if (-not $fileGroupsBySize.ContainsKey($file.Length)) {
                        $fileGroupsBySize[$file.Length] = @()
                    }
                    $fileGroupsBySize[$file.Length] += $file
                }
            }
      
            # Collect duplicate files based on size and hash
            $duplicateFiles = $fileGroupsBySize.Values | Where-Object { $_.Count -gt 1 } |
                ForEach-Object {
                    $_ | Get-FileHash | Group-Object -Property Hash | Where-Object { $_.Count -gt 1 }
                }

            # Return both duplicate files and zero size files
            [PSCustomObject]@{
                Duplicates = $duplicateFiles
                ZeroSize   = $zeroSizeFiles
            }
        }
    }

    end { }
}

# PowerShell Script to Find Duplicate Files and Delete Them in Folder
while ($true) {
    $Path = Read-Host "Please Enter a Directory Path to Scan"

    if (-not (Test-Path -Type Container $Path)) {
        Write-Host "Invalid directory path, please try again.`n"
    } else {
        break
    }
}

$DuplicateResult = Get-DuplicateFile -Path $Path

$DuplicateFiles = $DuplicateResult.Duplicates
$ZeroSizeFiles = $DuplicateResult.ZeroSize

if ($DuplicateFiles.Count -eq 0 -and $ZeroSizeFiles.Count -eq 0) {
    Write-Host "No duplicate or zero size files found."
    exit 0
}

if ($DuplicateFiles.Count -gt 0) {
    while ($true) {
        $FilenameList = @{}
        $Index = 1

        foreach ($Hash in $DuplicateFiles) {
            Write-Host "MD5: $($Hash.Name)"
            foreach ($File in $Hash.Group.Path) {
                Write-Host "[$Index] `"$File`""
                $FilenameList[$Index] = $File
                $Index++
            }
            Write-Host ""
        }

        if ($ZeroSizeFiles.Count -gt 0) {
            Write-Host "The following files are 0kb, these files are possible corrupt and/or perhaps only contain only file name information. These files could also be placeholders. Copy the full folder path to manual inspect these files."
            Write-Host ""
            foreach ($File in $ZeroSizeFiles) {
                Write-Host "`"$($File.FullName)`" - Size: $($File.Length) KB"
            }
        }

        $Selection = Read-Host "Pick one of the files to delete, 'q' to quit"

        if ($Selection -match 'q') {
            break
        } else {
            $Selection = [int]$Selection
        }

        if ($Selection -lt 1 -or $Selection -ge $Index) {
            Write-Host "$Selection is not valid, or out of range"
        } else {
            $DeletedFile = $FilenameList[$Selection]
            Write-Host "Deleting `"$DeletedFile`"`n"
            Remove-Item $DeletedFile -Force

            # Remove matching file from $DuplicateFiles and $FilenameList arrays
            $DuplicateFiles = $DuplicateFiles | Where-Object { $_.Group.Path -notcontains $DeletedFile }
            $FilenameList.Remove($Selection)
        }

        if ($FilenameList.Count -eq 1) {
            break
        }
    }
}
 

My Computer

System One

  • OS
    Windows 11
    Computer type
    PC/Desktop
only wondering if i should have kept the hardlink/symbolic link checks.
It's a good idea. Not something I have a lot of on my systems, so I don't usually run into it. I'd just use the -Attributes parameter to filter them out.

Get-ChildItem -Attributes !ReparsePoint -File -Recurse ... etc etc
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC12WSHi7
    CPU
    12th Gen Intel Core i7-1260P, 2100 MHz
    Motherboard
    NUC12WSBi7
    Memory
    64 GB
    Graphics Card(s)
    Intel Iris Xe
    Sound Card
    built-in Realtek HD audio
    Monitor(s) Displays
    Dell U3219Q
    Screen Resolution
    3840x2160 @ 60Hz
    Hard Drives
    Samsung SSD 990 PRO 1TB
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
    Antivirus
    Microsoft Defender
  • Operating System
    Linux Mint 21.2 (Cinnamon)
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC8i5BEH
    CPU
    Intel Core i5-8259U CPU @ 2.30GHz
    Memory
    32 GB
    Graphics card(s)
    Iris Plus 655
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears

Latest Support Threads

Back
Top Bottom