Bash Projects
Moving Machine Learning Model Results
As a data scientist in charge of analyzing some machine learning model results, The production environment moves files into a folder called model_out/
and names them model_RXX.csv where XX is a random number related to which experiment was run. Each file has the following structure (example):
Model Name, Accuracy, CV, Model Duration (s) Logistic,42,4,48
The manager wants that recent work in the organization has meant that tree-based models are to be kept in one folder and everything else deleted. This script works to move the tree-based models (Random Forest, GBM, and XGBoost) to the tree_models/ folder
, and delete all other models (KNN and Logistic).
Schedule Script to Run
As a data scientist managing an end-to-end machine learning environment in the cloud, I have created some great Bash scripts but it is becoming tedious to have to run these scripts every morning and afternoon. I have created cronjobs that can greatly assist here!
- Creating a schedule for 30 minutes past 2am every day.
- Creating a schedule for every 15, 30 and 45 minutes past the hour.
- Creating a schedule for 11.30pm on Sunday evening, every week.
Array Summation Script
A common programming task is obtaining the sum of an array of numbers. I’ve Created a Bash function that will take in an array of numbers and return its sum. An array of numbers used for a test of the function would be the daily sales in an organization this week (in thousands):
- 14 12 23.5 16 19.34 which should sum to 84.84
Extracting Data From Files
As a data scientist for a climate research organization, I want to update some models. Where I need to extract temperature data for 3 regions being monitored. Unfortunately the temperature reading devices are quite old and can only be configured to dump data each day into a folder called /temps
on the server. Each file contains the daily temperature for each region. I’ve then stored these variables in an array, calculated the average temperature of the regions and append this to the array.
- For example, for temperatures of 60 and 70, the array should have 60, 70, and 65 as its elements.
Extracting HR Data from Files
As a data scientist in the HR department of a large IT company, I need to extract salary figures for recent hires, however, the HR IT system simply spits out hundreds of files into the folder /hire_data.
Each file is comma-delimited in the format COUNTRY,CITY,JOBTITLE,SALARY such as Estonia,Tallinn,Javascript Developer,118286
- This script focused to extract the information needed. Depending on the task at hand, it may be needed to go back and extract data for a different city. Therefore, my script will need to take in a city (an argument) as a variable, filter all the files by this city and output to a new CSV with the city name. This file can then form part of my analytics work.
My Build Directory Automation
- One common use of bash scripts is for releasing a βbuildβ of source code. Sometimes the private source code may contain developer resources or private information that they donβt want to release in the published version. In this project, I’ve created a release script to automate coping certain files from a source directory into a build directory.