Basic Git Workflow

#Local work:
        git checkout -b branch_name # creates and switches to branch_name
        git push --set-upstream origin branch_name # create branch on remote repo (github, etc)
        # do work on new branch, test work on new branch
        git add . # add new work to commit
        git commit -m 'commit message here'
        git push # now the remote repo has all of your changes in the branch_name

#Merge in work and push to master branch on remote repo (github, etc)
        git checkout master # switch to master branch so you can merge in your new changes
        git pull # sync up any changes that have been pushed to master by others
        git merge branch_name # merge your changes
        git push #push to remote repo

#All done, now clean up:
        git branch -d branch_name # delete local branch
        git push origin --delete branch_name # delete remote repo branch

Python + Virtualenv + Autoenv = A Dream Come True

Managing virtual environments for python development is not impossible, but also is not fun when you are constantly switching between projects.

Pip install Virtualenv and brew install Autoenv to make this process much more streamlined and productive. Install both, and then create these .env files to activate and deactivate your virtual environments. And add this line to your bash_profile: source /usr/local/opt/autoenv/activate.sh

base_dir:
touch .env && echo "deactivate &>/dev/null" > .env

project_dir: (this must be done for each project)
run: virtualenv venv #creates virtual environment in current folder
touch .env && echo "source venv/bin/activate &>/dev/null" > .env

 

 

Running S3 Uploads in Parallel

I've been trying to shave off a few seconds when uploading large batches to AWS S3. Parallel seems to do a wonderful job in this situation!

164 pdf files of varying sizes on a dual-core i7
25 seconds: s3cmd put *.pdf s3://bucket-name/goes/here/
25 seconds: parallel --jobs 4 s3cmd put {} s3://bucket-name/goes/here/ ::: *.pdf
18 seconds: parallel --jobs 8 s3cmd put {} s3://bucket-name/goes/here/ ::: *.pdf
17 seconds: parallel --jobs 16 s3cmd put {} s3://bucket-name/goes/here/ ::: *.pdf
15 seconds: parallel --jobs 20 s3cmd put {} s3://bucket-name/goes/here/ ::: *.pdf
16 seconds: parallel --jobs 24 s3cmd put {} s3://bucket-name/goes/here/ ::: *.pdf
16 seconds: parallel --jobs 28 s3cmd put {} s3://bucket-name/goes/here/ ::: *.pdf 

Remove embedded canvas from first page of PDF

Sometimes PDFs store too much information. And sometimes that information causes them to not play nice with Ghostscript when bursting and transforming pages. I've been left with awkward white space on too many occasions and finally set out to remove that typically unseen space.

mkdir fixed; \
convert -trim +repage infile.pdf[0] cover.pdf: \
pdftk infile.pdf cat 2-end output outfile.pdf; \
pdftk "cover.pdf" "outfile.pdf" cat output "fixed/infile.pdf";
rm cover.pdf outfile.pdf;

Launch Bash Commands From Text Files

My version of multithreading may not be the most effective or efficient, but it's so easy to remember when in the middle of a task plus the overhead is very light.

while read list; do $list & done < list.txt

Embedded Canvas Size

The more I work with PDFs, the more amazed I am that they ever display correctly. When you burst out pages with ImageMagick, you are often left with awkward amounts of white space.

The best way I found to fix this issue is with Ghostscript, and then running the output PDFs through IM for conversion to other formats.

gs -o output.pdf -sDEVICE=pdfwrite -dUseCropBox input.pdf

 

 

Download and rename a file

I often concatenate these wget/curl commands in Excel or SQL for a quick download and rename for large sets of assets

wget http://www.url.com/filename.ext -O new_filename_goes.ext

-or- 

curl http://www.url.com/filename.ext -o new_filename.ext

Google Cloud Storage - Size Usage

For my personal photo archive, I utilize Google Cloud Nearline. My main editing rig at home is a PC, so I'm using PowerShell and gsutil. The du command stands for disk usage. Here's how I check how many gigabytes/terabytes of storage I'm currently using.

gsutil du -s gs://bucket-name-goes-here/ | %{ $_.Split(' ')[0]/1GB }
gsutil du -s gs://bucket-name-goes-here/ | %{ $_.Split(' ')[0]/1TB }

Amazon S3 Transfers

Most of the time I utilize python and boto to move files to and from S3, but sometimes I have smaller one-off jobs that I need to transfer.

Here is the fastest and easiest way I've found to move an entire directory of assets.

s3cmd put *.pdf s3://s3-bucket-name/path/to/remote/dir/

Simple Way to Rename File Using Excel

I often have the need to modify a large amount of filenames, and I have often found it easiest to turn to Excel and a simple bash script.

Combine this formula with a sequence number and you have a very easy way to rename a massive amount of files and keep the filenames unique.

 =concatenate("cp ",char(34),A1,char(34)," ",char(34),"renamed/", B1,char(34))

cp "oldfilename" "renamed/newfilename" 

  1. Drag that formula down the entire column.
  2. Copy the column to a text editor and save as rename.sh. 
  3. Open a terminal/console and change to the directory containing the files you wish to rename.
  4. If you haven't made a directory called renamed, type mkdir renamed. 
  5. Run bash rename.sh 

Generate Blank Media

Sometimes I have the need to generate truly blank/empty media on demand. Here are some solutions I found using ffmpeg and imagemagick:

-30 Seconds of blank/empty audio:
    ffmpeg -ar 48000 -t 30 -f s16le -acodec pcm_s16le -ac 2 -i /dev/zero -acodec libmp3lame -aq 4 blank.mp3

-30 seconds of blank/empty video:
    touch 1.txt && convert 1.txt -page Letter 1.png && ffmpeg -loop 1 -i 1.png -c:v libx264 -t 30 -pix_fmt yuv420p -aspect 16:9 blank.mp4 && rm 1.*

-5 page blank/empty pdf:
    for f in {1..5}; do touch temp.txt && convert temp.txt -page Letter temp.$f.png; done && convert temp.*.png blank.pdf && rm temp.*

-Blank/empty png file:
    touch 1.txt && convert 1.txt -page Letter blank.png && rm 1.txt

Remove Solid/Blank Images

In my work role, I need to programmatically clean up blank images mixed in with millions of usable images. For this, I turned to Python and PIL:

from PIL import Image
from sys import argv

script, filename = argv

img = Image.open(filename)
colors = img.getcolors()

if colors:
    if len(clrs) == 1:
        print str(filename)+": "+"solid color, remove"
else:
    print str(filename)+": "+"has multiple colors, keep."

Corrupt PDFs

As a digital/media asset manager, corrupt files are the bane of my existence.

To programmatically find corrupt pdf files, I landed with the following utilizing pdftotext from xpdf.

for f in *.pdf; \
do pdftotext -q -f 1 -l 2 $f $f.txt; \
err=$?; \
if [ $err -ne 0 ]; then mv $f _failed_$f; \
elif [ $err -eq 3 ]; then mv $f _locked_$f; \
else rm $f.txt; fi;
done