Manage, Search, and Again Up Information with Python’s Pathlib

Manage, Search, and Again Up Information with Python’s PathlibManage, Search, and Again Up Information with Python’s Pathlib
Picture by Writer

 

Python’s built-in pathlib module makes working with filesystem paths tremendous easy. In How To Navigate the Filesystem with Python’s Pathlib, we regarded on the fundamentals of working with path objects and navigating the filesystem. It’s time to go additional.

On this tutorial, we’ll go over three particular file administration duties utilizing the capabilities of the pathlib module:

  • Organizing recordsdata by extension
  • Looking for particular recordsdata
  • Backing up vital recordsdata

By the top of this tutorial, you will have discovered how one can use pathlib for file administration duties. Let’s get began!

 

1. Manage Information by Extension

 

While you’re researching for and dealing on a undertaking, you’ll usually create advert hoc recordsdata and obtain associated paperwork into your working listing till it is a litter, and you should set up it.

Let’s take a easy instance the place the undertaking listing incorporates necessities.txt, config recordsdata and Python scripts. We’d wish to kind the recordsdata into subdirectories—one for every extension. For comfort, let’s select the extensions because the identify of the subdirectories.

 

organize-filesorganize-files
Manage Information by Extension | Picture by Writer

 

Right here’s a Python script that scans a listing, identifies recordsdata by their extensions, and strikes them into respective subdirectories:

# set up.py

from pathlib import Path

def organize_files_by_extension(path_to_dir):
    path = Path(path_to_dir).expanduser().resolve()
    print(f"Resolved path: {path}")

    if path.exists() and path.is_dir():
        print(f"The listing {path} exists. Continuing with file group...")
   	 
    for merchandise in path.iterdir():
        print(f"Discovered merchandise: {merchandise}")
        if merchandise.is_file():
            extension = merchandise.suffix.decrease()
            target_dir = path / extension[1:]  # Take away the main dot

            # Make sure the goal listing exists
            target_dir.mkdir(exist_ok=True)
            new_path = target_dir / merchandise.identify

            # Transfer the file
            merchandise.rename(new_path)

            # Test if the file has been moved
            if new_path.exists():
                print(f"Efficiently moved {merchandise} to {new_path}")
            else:
                print(f"Failed to maneuver {merchandise} to {new_path}")

	  else:
       print(f"Error: {path} doesn't exist or just isn't a listing.")

organize_files_by_extension('new_project')

 

The organize_files_by_extension() operate takes a listing path as enter, resolves it to an absolute path, and organizes the recordsdata inside that listing by their file extensions. It first ensures that the required path exists and is a listing.

Then, it iterates over all objects within the listing. For every file, it retrieves the file extension, creates a brand new listing named after the extension (if it would not exist already), and strikes the file into this new listing.

After transferring every file, it confirms the success of the operation by checking the existence of the file within the new location. If the required path doesn’t exist or just isn’t a listing, it prints an error message.

Right here’s the output for the instance operate name (organizing recordsdata within the new_project listing):

 
organizeorganize
 

Now do this on a undertaking listing in your working atmosphere. I’ve used if-else to account for errors. However you may as effectively use try-except blocks to make this model higher.

 

2. Seek for Particular Information

 

Typically you might not wish to set up the recordsdata by their extension into completely different subdirectories as with the earlier instance. However you might solely wish to discover all recordsdata with a particular extension (like all picture recordsdata), and for this you need to use globbing.

Say we wish to discover the necessities.txt file to take a look at the undertaking’s dependencies. Let’s use the identical instance however after grouping the recordsdata into subdirectories by the extension.

In the event you use the glob() methodology on the trail object as proven to search out all textual content recordsdata (outlined by the sample ‘*.txt’), you’ll see that it would not discover the textual content file:

# search.py
from pathlib import Path

def search_and_process_text_files(listing):
    path = Path(listing)
    path = path.resolve()
    for text_file in path.glob('*.txt'):
    # course of textual content recordsdata as wanted
        print(f'Processing {text_file}...')
        print(text_file.read_text())

search_and_process_text_files('new_project')

 

It is because glob() solely searches the present listing, which doesn’t include the necessities.txt file.The necessities.txt file is within the txt subdirectory. So you need to use recursive globbing with the rglob() methodology as a substitute.

So right here’s the code to search out the textual content recordsdata and print out their contents:

from pathlib import Path

def search_and_process_text_files(listing):
    path = Path(listing)
    path = path.resolve()
    for text_file in path.rglob('*.txt'):
    # course of textual content recordsdata as wanted
        print(f'Processing {text_file}...')
        print(text_file.read_text())

search_and_process_text_files('new_project')

 

The search_and_process_text_files operate takes a listing path as enter, resolves it to an absolute path, and searches for all .txt recordsdata inside that listing and its subdirectories utilizing the rglob() methodology.

For every textual content file discovered, it prints the file’s path after which reads and prints out the file’s contents. This operate is beneficial for recursively finding and processing all textual content recordsdata inside a specified listing.

As a result of necessities.txt is the one textual content file in our instance, we get the next output:

Output >>>
Processing /house/balapriya/new_project/txt/necessities.txt...
psycopg2==2.9.0
scikit-learn==1.5.0

 

Now that you know the way to make use of globbing and recursive globbing, attempt to redo the primary activity—organizing recordsdata by extension—utilizing globbing to search out and group the recordsdata after which transfer them to the goal subdirectory.

 

3. Again Up Vital Information

 

Organizing recordsdata by the extension and looking for particular recordsdata are the examples we’ve seen so far. However how about backing up sure vital recordsdata, as a result of why not?

Right here we’d like to repeat recordsdata from the undertaking listing right into a backup listing slightly than transfer the file to a different location. Along with pathlib, we’ll additionally use the shutil module’s copy operate.

Let’s create a operate that copies all recordsdata with a particular extension (all .py recordsdata) to a backup listing:

#back_up.py
import shutil
from pathlib import Path

def back_up_files(listing, backup_directory):
    path = Path(listing)
    backup_path = Path(backup_directory)
    backup_path.mkdir(dad and mom=True, exist_ok=True)

    for important_file in path.rglob('*.py'):
        shutil.copy(important_file, backup_path / important_file.identify)
        print(f'Backed up {important_file} to {backup_path}')


back_up_files('new_project', 'backup')

 

The back_up_files() takes in an present listing path and a backup listing path operate and backs up all Python recordsdata from a specified listing and its subdirectories into a delegated backup listing.

It creates path objects for each the supply listing and the backup listing, and ensures that the backup listing exists by creating it and any mandatory guardian directories if they don’t exist already.

The operate then iterates by means of all .py recordsdata within the supply listing utilizing the rglob() methodology. For every Python file discovered, it copies the file to the backup listing whereas retaining the unique filename. Primarily, this operate helps in making a backup of all Python recordsdata inside a undertaking listing

After working the script and verifying the output, you may at all times verify the contents of the backup listing:

 
backupbackup
 

To your instance listing, you need to use back_up_files('/path/to/listing', '/path/to/backup/listing') to again up recordsdata of curiosity.

 

Wrapping Up

 

On this tutorial, we have explored sensible examples of utilizing Python’s pathlib module to arrange recordsdata by extension, seek for particular recordsdata, and backup vital recordsdata. Yow will discover all of the code used on this tutorial on GitHub.

As you may see, the pathlib module makes working with file paths and file administration duties simpler and extra environment friendly. Now, go forward and apply these ideas in your individual initiatives to deal with your file administration duties higher. Completely happy coding!

 

 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and occasional! At present, she’s engaged on studying and sharing her data with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.