Reliability of hash tags in file comparison


Sigurd

Well-known member
Member
VIP
Local time
10:54 PM
Posts
435
OS
Windows 11 - Updated automatically
I am aware that generally if two files have the same hash tag they will be identical, but collisions do occur so why the question?

I have been monitoring village weather for the parish council using the Davis weather station for many years. It stores the data in .wlk files. Needless to say I have backups - too many! If I use a duplicate checking app (Duplicate Cleaner Pro) I find variable number of duplicates for each file. The hashtags are the same (about 48 digits in each case).

I'd like to settle simplify my backups to just a couple and delete the rest.

There are several other reasons why I am seeking reassurance but they are irrelevant to the query as they relate to some issues with the Davis weather app..
 

My Computer

System One

  • OS
    Windows 11 - Updated automatically
    Computer type
    PC/Desktop
    Manufacturer/Model
    Updated Chillblast
    CPU
    Intel i7 12700K Twelve Core 3.6Ghz
    Motherboard
    MSI PRO Z690-A DDR4 Motherboard
    Memory
    Corsair 32Gb Vengeance RAM
    Cooling
    Air cooled
    Internet Speed
    72Mb down, 18Mb up
    Browser
    Chrome
    Antivirus
    Avast
Hi, Ian.

Are the backup extensions the same. Some programs create two files (don’t ask me why, I am not that clever)
I assume you would have to know where they are stored to find out.
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 23H2 Build 22631.5039
    Computer type
    PC/Desktop
    Manufacturer/Model
    Sin-built
    CPU
    Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (4th Gen?)
    Motherboard
    ASUS ROG Maximus VI Formula
    Memory
    32.0 GB of I forget and the box is in storage.
    Graphics Card(s)
    Gigabyte nVidia GeForce GTX 1660 Super OC 6GB
    Sound Card
    Onboard
    Monitor(s) Displays
    4 x LG 23MP75 - 2 x 24MK430H-B - 1 x Wacom Pro 22" Tablet
    Screen Resolution
    All over the place
    Hard Drives
    Too many to list.
    OS on Samsung 1TB 870 QVO SATA
    PSU
    Silverstone 1500
    Case
    NZXT Phantom 820 Full-Tower Case
    Cooling
    Noctua NH-D15 Elite Class Dual Tower CPU Cooler / 6 x EziDIY 120mm / 2 x Corsair 140mm somethings / 1 x 140mm Thermaltake something / 2 x 200mm Corsair.
    Keyboard
    Corsair K95 / Logitech diNovo Edge Wireless
    Mouse
    Logitech G402 / G502 / Mx Masters / MX Air Cordless
    Internet Speed
    100/40Mbps
    Browser
    All sorts
    Antivirus
    Kaspersky Premium
    Other Info
    I’m on a horse.
  • Operating System
    Windows 11 Pro 23H2 Build: 22631.4249
    Computer type
    Laptop
    Manufacturer/Model
    LENOVO Yoga 7i EVO OLED 14" Touchscreen i5 12 Core 16GB/512GB
    CPU
    Intel Core 12th Gen i5-1240P Processor (1.7 - 4.4GHz)
    Memory
    16GB LPDDR5 RAM
    Graphics card(s)
    Intel Iris Xe Graphics Processor
    Sound Card
    Optimized with Dolby Atmos®
    Screen Resolution
    QHD 2880 x 1800 OLED
    Hard Drives
    M.2 512GB
    Antivirus
    Defender / Malwarebytes
    Other Info
    …still on a horse.
Name can vary slightly (eg 2008-01.wlk and 2008-01 (2).wlk)
File created and file modified are different. Extension always .wlk
Hash always the same for each group of duplicates.
Folders can vary eg 2008-01.wlk and 2008-01 (2).wlk can be in the same folder or different folder

Frankly I think I am good to go - keep two and zap the rest. I need to hand the data to someone else and it's a bit of a mess at present. I understand my mess; the person taking oven over might not! An oven wouldn't help!
 

My Computer

System One

  • OS
    Windows 11 - Updated automatically
    Computer type
    PC/Desktop
    Manufacturer/Model
    Updated Chillblast
    CPU
    Intel i7 12700K Twelve Core 3.6Ghz
    Motherboard
    MSI PRO Z690-A DDR4 Motherboard
    Memory
    Corsair 32Gb Vengeance RAM
    Cooling
    Air cooled
    Internet Speed
    72Mb down, 18Mb up
    Browser
    Chrome
    Antivirus
    Avast
TLDR: I think it's perfectly fine to use the hashes in question to remove duplicates.

Since you said "about 48" characters, I'm going to assume you were using SHA-1, which produces hashes of 40 characters. Duplicate Cleaner Pro also supports SHA-256, but that produces 64-character strings.

There are certainly collisions with SHA-1. It's been theoretically broken for almost 20 years now, and definitely broken for over a decade. That being said, the real danger in hashing collisions lies with encrypted communications; think certificates, HTTPS and what-not.

I think verifying the contents of private, local backups, today, with SHA-1 is perfectly fine. Not as reliable as SHA-256, but certainly faster.

A quick Google search finds a collision attack in 2020, which took about 107 GPU years and cost about $45,000 (USD). Attacks would certainly be faster and cheaper now, but is someone going to spend the money to alter your files in such a way to trick you into thinking you have a certain data backup when you really don't? Probably not.
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 24H2
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC12WSHi7
    CPU
    12th Gen Intel Core i7-1260P, 2100 MHz
    Motherboard
    NUC12WSBi7
    Memory
    64 GB
    Graphics Card(s)
    Intel Iris Xe
    Sound Card
    built-in Realtek HD audio
    Monitor(s) Displays
    Dell U3219Q
    Screen Resolution
    3840x2160 @ 60Hz
    Hard Drives
    Samsung SSD 990 PRO 1TB
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
    Antivirus
    Microsoft Defender
  • Operating System
    Linux Mint 21.2 (Cinnamon)
    Computer type
    PC/Desktop
    Manufacturer/Model
    Intel NUC8i5BEH
    CPU
    Intel Core i5-8259U CPU @ 2.30GHz
    Memory
    32 GB
    Graphics card(s)
    Iris Plus 655
    Keyboard
    CODE 104-Key Mechanical with Cherry MX Clears
Thanks. A knowledgeable confirmation of a gut feeling. :D
 

My Computer

System One

  • OS
    Windows 11 - Updated automatically
    Computer type
    PC/Desktop
    Manufacturer/Model
    Updated Chillblast
    CPU
    Intel i7 12700K Twelve Core 3.6Ghz
    Motherboard
    MSI PRO Z690-A DDR4 Motherboard
    Memory
    Corsair 32Gb Vengeance RAM
    Cooling
    Air cooled
    Internet Speed
    72Mb down, 18Mb up
    Browser
    Chrome
    Antivirus
    Avast
Name can vary slightly (eg 2008-01.wlk and 2008-01 (2).wlk)
File created and file modified are different. Extension always .wlk
Hash always the same for each group of duplicates.
Folders can vary eg 2008-01.wlk and 2008-01 (2).wlk can be in the same folder or different folder
Yeah, anything with the same extension, hash code, file name with the only difference being that they are (numbered) would be a duplicate.
I wonder why it’s making duplicate backups?
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 23H2 Build 22631.5039
    Computer type
    PC/Desktop
    Manufacturer/Model
    Sin-built
    CPU
    Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (4th Gen?)
    Motherboard
    ASUS ROG Maximus VI Formula
    Memory
    32.0 GB of I forget and the box is in storage.
    Graphics Card(s)
    Gigabyte nVidia GeForce GTX 1660 Super OC 6GB
    Sound Card
    Onboard
    Monitor(s) Displays
    4 x LG 23MP75 - 2 x 24MK430H-B - 1 x Wacom Pro 22" Tablet
    Screen Resolution
    All over the place
    Hard Drives
    Too many to list.
    OS on Samsung 1TB 870 QVO SATA
    PSU
    Silverstone 1500
    Case
    NZXT Phantom 820 Full-Tower Case
    Cooling
    Noctua NH-D15 Elite Class Dual Tower CPU Cooler / 6 x EziDIY 120mm / 2 x Corsair 140mm somethings / 1 x 140mm Thermaltake something / 2 x 200mm Corsair.
    Keyboard
    Corsair K95 / Logitech diNovo Edge Wireless
    Mouse
    Logitech G402 / G502 / Mx Masters / MX Air Cordless
    Internet Speed
    100/40Mbps
    Browser
    All sorts
    Antivirus
    Kaspersky Premium
    Other Info
    I’m on a horse.
  • Operating System
    Windows 11 Pro 23H2 Build: 22631.4249
    Computer type
    Laptop
    Manufacturer/Model
    LENOVO Yoga 7i EVO OLED 14" Touchscreen i5 12 Core 16GB/512GB
    CPU
    Intel Core 12th Gen i5-1240P Processor (1.7 - 4.4GHz)
    Memory
    16GB LPDDR5 RAM
    Graphics card(s)
    Intel Iris Xe Graphics Processor
    Sound Card
    Optimized with Dolby Atmos®
    Screen Resolution
    QHD 2880 x 1800 OLED
    Hard Drives
    M.2 512GB
    Antivirus
    Defender / Malwarebytes
    Other Info
    …still on a horse.
I have too much respect and affection for my computer to falsely accuse it of such a crime.
It's not, I the guilty party!
 

My Computer

System One

  • OS
    Windows 11 - Updated automatically
    Computer type
    PC/Desktop
    Manufacturer/Model
    Updated Chillblast
    CPU
    Intel i7 12700K Twelve Core 3.6Ghz
    Motherboard
    MSI PRO Z690-A DDR4 Motherboard
    Memory
    Corsair 32Gb Vengeance RAM
    Cooling
    Air cooled
    Internet Speed
    72Mb down, 18Mb up
    Browser
    Chrome
    Antivirus
    Avast
I have too much respect and affection for my computer to falsely accuse it of such a crime.
It's not, am I the guilty party!
Ha! If it was you accidentally copy/pasting into the same folder, they would be renamed “filename -copy.extension”
Unless you’ve been running two consecutive backups, I don’t know, Ian
 

My Computers

System One System Two

  • OS
    Windows 11 Pro 23H2 Build 22631.5039
    Computer type
    PC/Desktop
    Manufacturer/Model
    Sin-built
    CPU
    Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (4th Gen?)
    Motherboard
    ASUS ROG Maximus VI Formula
    Memory
    32.0 GB of I forget and the box is in storage.
    Graphics Card(s)
    Gigabyte nVidia GeForce GTX 1660 Super OC 6GB
    Sound Card
    Onboard
    Monitor(s) Displays
    4 x LG 23MP75 - 2 x 24MK430H-B - 1 x Wacom Pro 22" Tablet
    Screen Resolution
    All over the place
    Hard Drives
    Too many to list.
    OS on Samsung 1TB 870 QVO SATA
    PSU
    Silverstone 1500
    Case
    NZXT Phantom 820 Full-Tower Case
    Cooling
    Noctua NH-D15 Elite Class Dual Tower CPU Cooler / 6 x EziDIY 120mm / 2 x Corsair 140mm somethings / 1 x 140mm Thermaltake something / 2 x 200mm Corsair.
    Keyboard
    Corsair K95 / Logitech diNovo Edge Wireless
    Mouse
    Logitech G402 / G502 / Mx Masters / MX Air Cordless
    Internet Speed
    100/40Mbps
    Browser
    All sorts
    Antivirus
    Kaspersky Premium
    Other Info
    I’m on a horse.
  • Operating System
    Windows 11 Pro 23H2 Build: 22631.4249
    Computer type
    Laptop
    Manufacturer/Model
    LENOVO Yoga 7i EVO OLED 14" Touchscreen i5 12 Core 16GB/512GB
    CPU
    Intel Core 12th Gen i5-1240P Processor (1.7 - 4.4GHz)
    Memory
    16GB LPDDR5 RAM
    Graphics card(s)
    Intel Iris Xe Graphics Processor
    Sound Card
    Optimized with Dolby Atmos®
    Screen Resolution
    QHD 2880 x 1800 OLED
    Hard Drives
    M.2 512GB
    Antivirus
    Defender / Malwarebytes
    Other Info
    …still on a horse.

Latest Support Threads

Back
Top Bottom