HackersAreUs

It's legit

Thursday, September 27, 2012

A Walk Down Memory Lane... Through Commercials


So back in March I was asked if I would donate my childhood VHS tapes to a kid that had absolutely NO DISNEY MOVIES. My parents were well-off back then and I had every Disney video imaginable. I couldn't fathom being a kid with no Disney. So before I gave away my childhood, I wanted to preserve the content of the VHS tapes in digital form using my ADS Tech DVD Xpress DX2 box.

Most of the tapes were in good condition and some of the tapes had severely degraded magnetic fields. Bummer. But one thing I noticed while ripping these VHS tapes was that some of them had been recorded-over with the nightly news or some TV show...

I was pissed at first that my timeless childhood memories were replaced by the time-sensitive events of the day. But then came a more interesting aspect to this: the commercial breaks.

It was as if I went back in time and sampled the pop culture. Seeing things like: Beanie Babies, Builder's Square, 84 Lumber, Y2K, Oldsmobile, Plymouth, Datsun, and other things in the commercials really brought back memories of days gone by. The haircuts and clothes were interesting too.

When I wax nostalgic like that, there can only be one solution: preserve these ads. So how to go about that...

Step one was to get the analog information from the VHS tapes to a digital format I could work with. Simple enough. I used the DVD Xpress box and the software that came with it to rip the videos in real time to MPEG files.

Step two was a bit more complicated. How will I segment these MPEG files – separating the commercials from the show segments? There are plenty of softwares out there that will edit-out the commercials from recorded video, but they mostly just discard them. If you know anything about this blog, there is a BASH script involved and there are questions to be asked.

Question: How can you characterize a video segment or commercial?
Answer: Each video segment is typically delimited by a few solid black frames.

Question: How can I tell which frames are solid black?
Answer: I tried several methods including analyzing every single frame of a ripped VHS tape. The solid colored frames tended to have a lower bitrate and a lower frame size. This method took FOREVER! A 6 hour scratch tape took just about 8 hours to analyze frame-by-frame. Then I discovered that ffmpeg has a video filter for that and would tell me which frames are black within a certain threshold of dark pixels.

Question: OK, you said there was a script involved. Where is it?
Answer: Here oh impatient one:

The Script


#!/bin/bash

#Some housekeeping
filename=`echo $1 | cut -d. -f1`
ext=`echo $1 | cut -d. -f2`
rm -rf $filename
mkdir $filename
mkdir $filename/segments

#Log ALL THE THINGS!
ffmpeg -threads 8 -i $1 -vb 1000 -vf blackframe -f mpeg $filename/out.$ext 2> $filename/$filename.log
#Get just the black frames
grep "pblack" $filename/$filename.log | sed 's#\[blackframe.*\]\ ##g' | awk '{ print $1, $5 }'>> $filename/black_frames.log

export IFS='
'
start_pos=0
end_pos=0
for i in `cat $filename/black_frames.log`
do
if [ $i != 'frame= size=' ]
then
timestamp=`echo $i | cut -d: -f3`
if [ "$start_pos" == 0 ] && [ "$end_pos" == 0 ]
then
start_pos=$timestamp

elif [ "$start_pos" != 0 ]
then
end_pos=$timestamp
t=`echo $end_pos $start_pos | awk '{ print $1-$2 }'`

if [ `echo $t | cut -d. -f1` -gt 10 ]
then
echo $start_pos $end_pos $t
ffmpeg -ss $start_pos -t $t -i $1 -vcodec copy -acodec copy $filename/segments/$start_pos.$ext
start_pos=$end_pos
fi
else
echo "$i"
fi
fi
done

Download it HERE

How It Works


First, the script takes an argument of a file name  So if I had a file named “My_Little_Pony.mpg”, I would execute the script with “./get_segments.sh My_Little_Pony.mpg”.

Then the script gets the file name and the file extension of the file being used and creates a directory structure under the current directory. This is just to keep things nice and tidy. I've seen where a 6 hour scratch tape can produce hundreds of segments. It's best to have this off in it's own folder along with the other stuff that will be explained later.

The script will then go through the file provided frame-by-frame and determine whether or not a frame is solid black. It will log all output in the directory created by the stuff in the paragraph above.

When that's done and over with after what seems like forever - really about a half hour for two hours of video – the black frames detected are parsed out of the log file into another log file called black_frames.log.

It then goes through each line of black_frames.log and uses each line as a stop or start point for extracting video segments and commercials into the “segments” folder.

When it's done with that, watch your commercials!

Issues


False Positives
With shows that have an abundance of dark frames or were actually shot at night, there will be a ton of false positives. Star Trek was a wealth of solid black frames. This leads to a lot of small segments and not a whole lot of large ones. There really is no way to correct this behavior.

Levels
If the whole video is over saturated with brightness, even frames that are solid black will appear to ffmpeg as being solid gray. This doesn't jive. You will have to adjust your video recording during the ripping process or by some other means to get the darkest black frames possible.

Final Thoughts


After I get through all of the many scratch tapes that are laying around, I'm going to start posting the interesting ads on YouTube for those that experience a similar nostalgia. I'm not too concerned with copyright in this case. It's advertisement  It's something the company put out there for others to look at. Why would they come after me for giving their old products some sunlight?

I'm wondering if Jason Scott over at archiveteam has thought about this or if it's even within the mission of that organization. Maybe it's just me.

Friday, May 11, 2012

How to Make BlogTalkRadio.com Live Shows Embedable

Background

I've been doing some work for some fellow conservatives at two different media outlets lately: Own The Narrative and All Fired Up Media. Both of which have live talk radio as a central aspect to their operations. There are multiple hosts using different platforms to accomplish this. Ustream and LiveStream are the most forthcoming and easiest to work with of the services. This post isn't about them. It's about the pain in the ass that is BlogTalk.

Research

There is absolutely no API for BlogTalk. They make it very difficult to embed user content outside of their website.
Any claims you read that there is a podcasting API for BTR are horsecrap.

The Solution

This is far from an elegant solution like you would get from an API or even a friendly livecasting site, but it works and here's how:
First it scrapes the html from a designated channel from BTR
Then the contents of the html file are read line-by-line until...
It finds the magic words "On Air Now:"
Then it parses out the link to the event on BTR and formats the url so it is friendly to other sites then...
It returns an iframe with the embed.

function btr_show($channel) {
    $whole_btr = file_get_contents("http://www.blogtalkradio.com/$channel");
    $lines = explode("\n", $whole_btr);
    foreach ($lines as $line) {
        if (stristr($line, 'On Air Now:')) {
            $address = next(explode('"', $line));
            $embed = "http://blogtalkradio.com" . $address . "/scrub/0";
            return "<iframe src='$embed' class='btr' scrolling='no'></iframe>";
        }
    }
}

Note: my .btr css class looks something like this but use whatever you want.

.btr{
    width:230px; 
    height: 200px;
    border:0;
}

Monday, February 27, 2012

How Your Browser Makes You A Criminal

It has been said that you commit at least three felonies a day mostly without realizing you are doing so. Technically, this isn't really felonious - it's more of a Civil matter, but what do I know? I'm not an attorney.

I'm not going to get into the mess that is copyright law or even discuss my stance on copyright (either way would be flamebaiting). There is a good write-up here on what you need to know regarding copyright laws as they are written today. For the sake of this article, the only relevant part is with regard to "innocent infringement".

It's been well known and documented over the years that the things you view on the internets can and will be cached locally. This article is for the benefit of those that are a little new to this.

Some Background


I was poking around in my browser's cache (Chromium on LMDE) since I was troubleshooting an issue. There were several large media files in there. Puzzled as to what they were, I opened them and, lo and behold, it was every YouTube video I had watched since the last time I dumped the browser cache. (Oddly, the sound was better in VLC than it was with the Flash player)

The "So-What"


Some of these videos were music videos from either the original artist or from the record labels. The issue with this is that it was being saved to my computer. Should a copyright owner get a wild hare up their ass, I could be sued for infringement of their copyright regardless of my intent. On a side note, I've been sued before, it's not fun.

Digging Deeper

For the sake of keeping things scientific, I used the same video for every test and urged the others involved to use it in their tests with different browsers on different platforms. The video is "Shadows" by Lindsey Stirling. If you haven't seen her work, I urge you to give her some hits on YouTube and buy her music. It is definitely worth dropping a few bucks on. Do that now before you forget. This blog will still be here when you get back.

Everything done for experimentation purposes used no browser extensions for downloading flash content or plugins except what is usually necessary to load the video in the browsers being tested. So, in essence, all files being downloaded were done so passively and all that needed to be done was to find the needle (video) in the haystack (Browser Cache/Temp Files).

Quality
There are only a few quality levels that I could get to straight-up locally cache the entire video. All quality levels below 720p will cache locally.

720 and above I noticed a strange behavior in the cache: it would start to cache for about 2 seconds and then several 1.7MB binary "data_#" files would appear. I would only get about 3 or 4 at any one time. So it was time to go a bit further and see what those data files contained. I noticed some of the binary data was in human readable form. A quick "head data_1" revealed this little nugget of info:

http://o-o.preferred.buckeyecablesystem-tol1.v2.lscache5.c.youtube.com/videoplayback?key=yt1&sver=3&ptchn=lindseystomp&cp=U0hSRVNUVl9KUkNOMl9LTlpEOlpvVVZjMDVNTU9Z&id=2460accac85453e0&expire=1330398358&cm2=1&source=youtube&algorithm=throttle-factor&ptk=lindseystomp%2Buser&ip=72.0.0.0&keepalive=yes&burst=40&itag=34&signature=48A4583BA1054728BE655C535B24DDAC72345786.718BF285965054FCB8391932B1E51D5D4ECEE89F&ipbits=8&sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Calgorithm%2Cburst%2Cfactor%2Ccp&factor=1.25&fexp=914026%2C910019&range=13-1781759

Yep, it seems the higher bitrate versions of YouTube videos, at least this one, are being cached by my ISP.

...But when I go to a local school with a different ISP, results are a little different. Note the video dimensions.


Browsers/Platforms
Windows
Chrome/Firefox/Internets Exploder


Linux
Chrome/Firefox

Chrome and IE both have the cache all in one folder. They were the browsers that made it very easy to find the cached video. All that needed done was to look for the file that was growing in size.

Firefox on the other hand, uses a somewhat complicated and non-intuitive directory structure in their cache. It took a little more work to find the video with that browser.

It's worth noting that all three browsers that were tested exhibited the same behavior. I and several others were able to passively download videos from YouTube and play them locally in VLC. Everyone involved are now criminals...

How to Load Everything Live

This might not be a good idea since your page load times would increase. But if you are paranoid enough and willing to trade page load time for law abiding citizen status, here are a few things you can do:

In Chrome: Wrench->Tools->Dev Tools->Bottom Right Corner Gear->Disable Cache

In Firefox: go to about:config->*Right click*browser.cache.disk.enable->Toggle

IE: Download one of the browsers above and use the instructions for that browser.

In closing, I have to pose a question. Does browser cache actually constitute copyright infringement? Let me know in the comments below.

Special thanks to Ray for the sanity checks and insight.

UPDATE: Still vague about client side caching. http://en.wikipedia.org/wiki/Online_Copyright_Infringement_Liability_Limitation_Act#.C2.A7_512.28b.29_System_Caching_Safe_Harbor

Monday, January 2, 2012

Automated MP3 Duplicate Removal with BASH

Since I'm bored, and unemployed for the time being, I decided to attack my music collection. More specifically the duplicates in it.

I first must put in a plug for MusicBrainz Picard. It is a wonderful free application that identifies your tracks based on signatures. Whether you are using Windows, Mac or Linux, you should use this. You will not be disappointed.

Here are a few of the methods I used to weed out the dups:

Remove Direct Copies (Slow)

For this, less dups were removed, but I did have the satisfaction of knowing that I didn't have identical data. How I did that was finding directories that had "filename ([1-9]).mp3" in it and compare all of the files based on their MD5 signatures.

The script:

#!/bin/bash #find_hash_matches.sh
old_IFS=$IFS
export IFS="
"
for h in + 0 1 2 3 4 5 6 7 8 9 _ a b c d e f g h i j k l m n o p q r s t u v w x y z U V W X Y Z D E F G H I J K L M N O P Q A B C R S T
do
echo $h
find /path/to/the/rock/$h* -name "*.mp3" -exec md5sum {} \; >> whole && chmod 777 whole


awk -F"  " '{print $1}' whole|sort|uniq -d >> dups && chmod 777 dups


for i in `cat dups`
do
for j in `grep $i whole`
do
the_file=`echo $j | awk -F'  ' '{print $2}'`
#echo $the_file >> dup_log
#echo $the_file
if [ `echo $the_file | grep "([0-9])"` ]
then
if [ -f "$the_file" ]
then
rm "$the_file"
echo $the_file
echo $the_file >> dup_log
fi
fi
done
done
rm whole dups
done

Remove Low Quality and Truncated Tracks (Quick)

This method compares only tracks with the same base name. It keeps a file only if it has the highest bitrate and the longest duration. This is dependant on the program "mp3info". Check your distro's repos for availability.

#!/bin/bash
old_IFS=$IFS
export IFS="
"
rm more_dups_log quality_comparison 2> /dev/null


deleted=0
basedir=/path/to/the/rock


for i in `seq 1 9`
do
find $basedir -name "*($i)*" >> more_dups_log 2> /dev/null
done


for i in `cat more_dups_log | sed 's/ ([0-9])\.mp3//g' | sort | uniq`
do
artist=`echo $i | awk -F'/' '{print $5}' `
album=`echo $i | awk -F'/' '{print $6}' `
title=`echo $i | awk -F'/' '{print $7}' `
dir="$artist/$album/"
for j in `find $basedir/$dir -name "$title*"`
do
info=`mp3info -r m -p "%r   %S"  "$j"`
echo "$j   $info">>quality_comparison
done

best_br=0
best_length=0
for j in `grep $i quality_comparison`
do
br=`echo $j | awk -F'   ' '{print $2}'`
length=`echo $j | awk -F'   ' '{print $3}'`
if [ $br -gt $best_br ]
then
let best_br=$br
fi

if [ $length -gt $best_length ]
then
let best_length=$length
fi
done



if [ `grep "$best_br   $best_length" quality_comparison | wc -l` -le 1 ]
then
for j in `grep $i quality_comparison`
do
if [[ "$j" != `grep "$best_br   $best_length" quality_comparison` ]]
then
file_to_remove=`echo $j |  awk -F'   ' '{print $1}'`
rm "$file_to_remove" 2> /dev/null
let deleted=$deleted+1
fi
done
else
for j in `grep $i quality_comparison`
do
the_br=`echo $j | awk -F'   ' '{print $2}'`
the_length=`echo $j | awk -F'   ' '{print $3}'`
if [[ $the_br != $best_br && $the_length != $best_length ]]
then
file_to_remove=`echo $j |  awk -F'   ' '{print $1}'`
rm "$file_to_remove" 2> /dev/null
let deleted=$deleted+1
fi
done
lines=`grep "$best_br   $best_length" quality_comparison | wc -l`
the_count=1
for j in `grep $i quality_comparison`
do
file_to_remove=`echo $j |  awk -F'   ' '{print $1}'`

if [ $the_count -lt $lines ]
then
rm "$file_to_remove" 2> /dev/null
let deleted=$deleted+1
let the_count=$the_count+1
fi
done
fi


if [[ `grep "([1-9])" quality_comparison` ]]
then
for j in `grep "([1-9])" quality_comparison`
do
file_to_move=`echo $j |  awk -F'   ' '{print $1}'`
if [ -f "$file_to_move" ]
then
mv "$file_to_move" "`echo $file_to_move | sed 's/ ([1-9])//g'`"
fi
done
fi
rm quality_comparison 2> /dev/null
echo $i
done
chmod 777 -R "$basedir"
echo "$deleted files have been deleted."

You will still have some dups left but this will delete A LOT of them. A simple `find -name "*([1-9])*"` will find all that remains. Protip: if you don't really care about the tracks you find with that last command amend it to be `find -name "*([1-9])*" -exec rm {} \;` to remove every match.

If you have any other automated methods, please share.

Thursday, October 27, 2011

PHPDB - My own take on NoSQL

So a few weeks ago, I got a bit frustrated with MySQL. It just wasn't performing how it should with a small amount of data. It didn't help that I was just storing text that referenced several different data types that were stored on the server.

So I thought to myself, "Self: wouldn't it be cool if there was a database out there that did [fill in the blank].". This is what I came up with.

What it is

This is a data storage system that shuns traditional SQL syntax and setup frustration. This is ideal for small websites with not a whole lot of data.

It's a PHP library that stores data in flat files. But more on that later

What it isn't

It isn't meant for massive amounts of data. All features as of yet are experimental.

How it works

Instead of being a relational database, where there are tables that can be related to each other through data relationships and keys and other stuff, the data is stored in flat files in a file structure determined by the data type being stored. The data is indexed in a file called mft.pdb (Master File Table).  This file also stores metadata about the various files. How the different mime types are handled and stored in the MFT and how the attributes are named are stored in a file called data_defs.pdb.

An example of data_defs.pdb:
image::source::type::thumb::title::description::gallery::
text::source::type::title::

Each mime type has it's own line. The only things that are required to remain the same on every line are: [data_type]::source::type. Why double the data type? I don't know. It's a work-around for a problem I found. Anything after the first three fields is fair game to be changed.

An example of mft.pdb:
1234.txt::text/1234.txt::text::Blog Lady 1234
1234.txt::text/1234.txt::text::Blog Post 1234
1234.txt::text/1234.txt::text::name
blue.jpg::image/blue.jpg::image::image/blue_thumb.jpg::Blue::This is a lady and a bridge.::example
chad.jpg::image/chad.jpg::image::image/chad_thumb.jpg::Chad::Chad is about to be in a world of hurt::example
lady.jpg::image/lady.jpg::image::image/lady_thumb.jpeg::Lady and the Tree::This is a lady and a tree.::example
wind.jpg::image/wind.jpg::image::image/wind_thumb.jpg::Wind is Hilarious::The wind told a joke. The lady laughed::example

Getting Started

Copy all from the zip file below to the root of your site directory. "include('phpdb_functions.php');" in the pages you wish to use PHPDB. Setup complete.

Adding A Text File

There are many data types that are handled the same as just plain text. Html, plain text and where the mime type can't be determined are a few examples.

An example of an add of a simple text file: db_add('text/1234.txt', 'Blog Post 1234');
echo $debug; 


"1234.txt::text/1234.txt::text::Blog Post 1234" is added to mft.pdb. The $debug echo is optional, but in case things go awry, it's there.

Adding an Image File

There is an optional $params variable that can be passed to the db_add function where you can add MFT field values specified in the data_defs.pdb in the form of an array.
For example: $params['title'] = 'Blue';
$params['gallery'] = 'example';
$params['description'] = 'This is a lady and a bridge.';
$params['thumb'] = 'image/blue_thumb.jpg';
db_add('image/blue.jpg', 'Blue', $params);
echo $debug;

This adds a line "blue.jpg::image/blue.jpg::image::image/blue_thumb.jpg::Blue::This is a lady and a bridge.::example" to the MFT.

Selection

Select Single: $image = db_select_one('blue', 'image');
echo "<a href='".$image['source']."'><img src='".$image['thumb']."'></a><br>".$image['title']."";

Select Multi: $images = db_select_many("example", 'image');

foreach($images as $image){
echo "<div style='display:block; float:left;text-align:center;'>
        <a href='".$image['source']."'><img src='".$image['thumb']."'></a><br>
            ".$image['title']."
        </div>\n";
}


Example

Download