scistag.gitstag.git_scanner.GitScanner

class GitScanner[source]

Bases: object

Scans a git repository and creates a list of non-ignored files and their size.

This is used in the unit test to verify no garbage is committed into the repo.

Note that this class is a very minimalistic approach for a coarse validity check and does not fulfill all git standards ignore masks etc. So feel free to use this class but don’t complain ;).

Initiailizer

Methods

add_git_ignore

Creates an update gitignore mask list for the current directory :param base_path: The base path we are starting in :type org_list: list[str] :param org_list: The previous ignore list :type filename: str :param filename: The path of the .gitignore file :rtype: list[str] :return:

compute_total_size

Computes the total size of the non ignored files :type filelist: List :param filelist: The list of files :rtype: int :return: The size in bytes

find_valid_repo_files

Finds a list of valid files in the current directory.

get_is_ignored

Check is a given name is ignored :type name: str :param name: The file or directory name :type is_dir: bool :param is_dir: Is the element a directory? :type ignore_list: list[str] :param ignore_list: The ignore list :return: True if the element shall be ignored

get_large_files

Returns all files larger than a given threshold which are not on the ignore list :type min_size: int :param min_size: The minimum size in bytes :type hard_limit_size: int :param hard_limit_size: If this size is exceeded even files on the ignore list will not be ignored.

scan

Executes a scan on given base directory.

Attributes

__dict__

__doc__

__module__

__weakref__

list of weak references to the object (if defined)

total_size

The count of valid directories

file_count

The count of valid files

dir_count

The count of directories

dir_list

The list of all directories parsed.

file_list

The list of all non ignored files.

file_list_by_size

The list of all non ignored files by size.

classmethod add_git_ignore(base_path, org_list, filename)[source]

Creates an update gitignore mask list for the current directory :param base_path: The base path we are starting in :type org_list: list[str] :param org_list: The previous ignore list :type filename: str :param filename: The path of the .gitignore file :rtype: list[str] :return:

classmethod compute_total_size(filelist)[source]

Computes the total size of the non ignored files :type filelist: List :param filelist: The list of files :rtype: int :return: The size in bytes

classmethod find_valid_repo_files(path, file_list, ignore_list, dir_list)[source]

Finds a list of valid files in the current directory. Continues the search in subdirectories :type path: str :param path: The base path :type file_list: List :param file_list: The current list of file :type ignore_list: List :param ignore_list: The ignore list :type dir_list: List :param dir_list: The list of parsed directories

Return type

None

classmethod get_is_ignored(name, is_dir, ignore_list)[source]

Check is a given name is ignored :type name: str :param name: The file or directory name :type is_dir: bool :param is_dir: Is the element a directory? :type ignore_list: list[str] :param ignore_list: The ignore list :return: True if the element shall be ignored

get_large_files(min_size, ignore_list, hard_limit_size=-1)[source]

Returns all files larger than a given threshold which are not on the ignore list :type min_size: int :param min_size: The minimum size in bytes :type hard_limit_size: int :param hard_limit_size: If this size is exceeded even files on the ignore list will not be ignored. -1 if there is no hard limit. :type ignore_list: list[str] :param ignore_list: Masks of the files to ignore :rtype: list[str] :return: The list of all remaining files

scan(path)[source]

Executes a scan on given base directory. The results are stored in the member variables of this object (.file_list, .total_size etc.) :type path: str :param path: The repository base path

dir_count

The count of directories

dir_list: list[dict]

The list of all directories parsed. Format: {“path”: path, “ignored”: false}. Note that only the highest ‘level ignored directories will be listed, not the nested ones.

file_count

The count of valid files

file_list: list[dict]

The list of all non ignored files. Format: {“filename”: name, “size”: size_in_bytes}

file_list_by_size: list[dict]

The list of all non ignored files by size. Format: {“filename”: name, “size”: size_in_bytes}

total_size

The count of valid directories