Check all API variables are included in a REST API for PHP.

I recently have been working on a weather recording project and as part of this endeavour wanted to check that all of the variables I have posted to my API were set, it can be rather a lot of work to have to specify each if (!isset($_GET[])) parameter so I decided to use a function that can take any number of arguments using a ‘variadic’ function in PHP.

// Returns true if all variables are set, else returns false.
function getVariablesSet(string ...$getlinks) {
      foreach ( $getlinks as $link ) {
        if (!isset($_GET[$link]))
        {
            return false;
        }
    }
    return true;
}

Now, when I want to use a new api ‘command’, I can simply do the following,

$browser_response = new stdClass();
$browser_response->message = "Command not specified.";

if ($_GET['command'] == "new-temperature")
    {
        if (getVariablesSet("datetime","temperature","humidity"))
        {
            $browser_response->message = "All GET variables set.";
        }
        else
        {
            $browser_response->message = "All GET variables not set.";
        }
    }

echo json_encode($browser_response);

This way, so long as we have specified the command and the variables required we can enter the scope, otherwise, we can kick them out until their query is formatted correctly. Having lots of variables may become problematic so you may want to use POST or even break them out into subsections to give users a better understanding of their error. I should add this only works for PHP 5.6 and above.

In any capacity, Good Luck. Aidan.

Making a Web Scraper to Download Images off the Internet

One afternoon I read on a popular website that http://prnt.sc/ uses sequential 6 character codes to host user images on their website, this made me wonder what was on there.

The next day I made a small bot to scrape the website and collect all images through a range and then the bot could run multiple times to collect more images if necessary. I left the bot running for a couple of hours and here’s what I managed to find, I’m sure I cannot re-host the images but the range I scraped through was gmmlaq for 1,287 images before the bot was IP banned through Cloudflare, fair enough. I took the time to view each image individually.

Here’s What I Saw

  • A drivers licence and matching passport which was expired.
  • A WordPress username and password combination for a web-host reseller which I did not test.
  • Many Many out of context conversations, half of which were in Cyrillic.
  • A teacher seemingly contacting students and recording the fact they did not pick up through skype.
  • Ominous pictures of a tree posted multiple times.
  • Screenshots of video games, mainly Minecraft, Runescape, Team fortress 2 and League of Legends.
  • A lot of backend-databases of usernames and email addresses for customers and users, in fact, they are a large proportion of the screenshots.
  • A lot of SEO spam.
  • A conversation between two users through skype debating over banning an influencer from their platform for fake referrals.
  • About 2 lewd photos.
  • A few hotel confirmations.
  • Whole credit card information including CVV and 16 digit number.
  • A spamvertising campaign CMS platform.
  • A gambling backend database disabling access to games for specific users.
  • One 4×4 pixel image and One 1×47 pixel image.

What Did we Learn?

  • Stuff like this, particularly URLs should not be sequential.
  • A lot of users on the platform see the randomness of the URL as sufficient security however, its undermined by the fact the website can be scraped sequentially.
  • They did eventually ban the bot after 1,287 images, which is probably closer to 1,500 images before testing however Cloudflare seems to be the one preventing access, so it may be a service they offer.
  • A lot of users on the platform are web developers and use every trick in the book to boost their numbers.
  • A lot of users are Eastern European and American.

How I Made the Scraper

I made this bot using Python 3.7 however it may work on older versions. The URL is base 26 encoded to match the alphabet, incremented and then converted back to a string for scraping. Images are saved with their counterpart names. I do not condone running the scraper yourself.

import requests
import configparser
import string
from bs4 import BeautifulSoup
from functools import reduce

# Scraper for https://prnt.sc/


# Headers from a chrome web browser used to circumvent bot detection.
headers = {
    "ACCEPT" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "ACCEPT-LANGUAGE": "en-US,en;q=0.9",
    "DEVICE-MEMORY": "8",
    "DOWNLINK": "10",
    "DPR": "1",
    "ECT": "4g",
    "HOST": "prnt.sc",
    "REFERER": "https://www.google.com/",
    "RTT": "50",
    "SEC-FETCH-DEST": "document",
    "SEC-FETCH-MODE": "navigate",
    "SEC-FETCH-SITE": "cross-site",
    "SEC-FETCH-USER": "?1",
    "UPGRADE-INSECURE-REQUESTS": "1",
    "USER-AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
    "VIEWPORT-WIDTH": "1920",
}

# https://stackoverflow.com/a/48984697/2697955
def divmod_excel(n):
    a, b = divmod(n, 26)
    if b == 0:
        return a - 1, b + 26
    return a, b


# Converts our '89346963' -> 'gmmlaq'
# https://stackoverflow.com/a/48984697/2697955
def to_excel(num):
    chars = []
    while num > 0:
        num, d = divmod_excel(num)
        chars.append(string.ascii_lowercase[d - 1])
    return ''.join(reversed(chars))

# Converts our 'gmmlaq' -> '89346963'
# https://stackoverflow.com/a/48984697/2697955
def from_excel(chars):
    return reduce(lambda r, x: r * 26 + x + 1, map(string.ascii_lowercase.index, chars), 0)

# Load config or start a new one.
# Image start is random
def get_config():
    try:
        config = configparser.ConfigParser()
        with open('config.cfg') as f:
            config.read_file(f)
        return config
    except:
        config = configparser.ConfigParser()
        config['Screenshots'] = {'imagestart': 'gmmlaq', 'url': 'https://prnt.sc/', 'iterations': '20'}
        with open('config.cfg', 'w') as configfile:
            config.write(configfile)
        return config

# Save image from url.
def get_image_and_save(website_url, image_url):
    try:
        html_content = requests.get(website_url + image_url, headers=headers).content
        soup = BeautifulSoup(html_content, "lxml")
        #with open('image_name.html', 'wb') as handler:
             #handler.write(html_content)
        ourimageurl = soup.find(id='screenshot-image')['src']
        #print(ourimageurl)
        image = requests.get(ourimageurl).content
        with open(image_url + '.png', 'wb') as handler:
             handler.write(image)
    except:
        print (image_url + " was removed probably.")

def increment_image(image_url):
    return to_excel(from_excel(image_url) + 1)

config = get_config()
print ("Starting at '" + config["Screenshots"]["imagestart"] + "'.")

website_url = config["Screenshots"]["url"]
current_image_url = config["Screenshots"]["imagestart"]
for x in range(0, int(config["Screenshots"]["iterations"])):
    print("Currently downloading image " + current_image_url)
    get_image_and_save(website_url, current_image_url)
    current_image_url = increment_image(current_image_url)

# Set new config code to current location for next run.
config.set('Screenshots', 'imagestart', current_image_url)
with open('config.cfg', 'w') as configfile:
    config.write(configfile)

The bot requires Python, configparser and BeautifulSoup4. The scraper cannot handle numbers in the URL so please remove them and replace them with letters before picking a starting point, this was an oversight on my part.

Don’t do anything against their terms of service, Aidan.

Scraping Canvas (LMS)

Because my time at university is ending I thought it best to archive the canvas pages available to me for later reference should I not be able to access canvas later if they change platforms or disable my account. I should probably add this is for archival purposes and I will not be able to share the data I was able to collect. Thankfully I was able to get the whole thing going in a few minutes and downloading took a lot longer.

The first snippet I got from here, didn’t complete the first time, it seemed some image was causing issues so I moved to another gist, at this rate we could be done in half an hour 😊.

Unfortunately it also borked out on a similar place,

FileNotFoundError

I think it is because there’s something missing or I don’t have access to it. But the real problem is that its downloading content for a course I didn’t care about because I was enrolled in it but it’s full of junk I’m not interested in, so we can remove it by using the second scrapers code and specifying the course id’s which I had to manually go through, there was about 15 of them but it didn’t take too long. Which gave me the full command.

F:\Downloads\canvas>python canvas.py https://canvas.hull.ac.uk/ 4738~DUI9Nha9weSuemu1M2qsmhljoBcQtR0zghXTs3QA7ECHDHQkpsgBQ9RllbaEwySf output 52497,56148,56149,52493,54499,54452,54456,53441,52496,22257,22274,22276,22277,22278,22279,22280,50664,50656,22275,50652

The access token you can see above should be expired by now. You can do it yourself by downloading the same file and installing python3, pathvalidate and pycanvas. You need to generate a security token from /profile/settings and you can get the course id by clicking on the course like this /courses/56149. When you generate a new token you should receive an email about it.

Canvas online with our starred modules displated.

I decided to make a small adaptation to catch the FileNotFoundError and went off to the races. It took over an hour so I decided it was best to leave it running overnight, when I returned in the morning I had 116 errors (failed downloads) and the rest is the course content!

Our Canvas Modules saved to Windows File Explorer.

Unfortunately I don’t seem to have the submissions for each of these courses so I needed to manually download them aswell and then our archive was completed.

Thanks for reading.

Inside a Western Digital Blue Hard Drive

I thought I’d share pictures I took when I took apart a 250GB dead hard drive.

Rest in pieces my WD2500AAKX

I got this hard drive as part of a Dell Optiplex 780 and used it as a server for my internal network. It worked great until it wouldn’t boot. I checked on it and sure enough, it was stuck in ubuntu server boot recovery. I tried to recover it but I think I did more damage than good. I decided to move to a Windows computer and tried to recover the data with Recuva which didn’t do anything because it couldn’t pick up the disk, so then I moved to TestDisk which was able to see the drive and partitions but never got past profiling the disk. So then I decided to take it apart.

The hard drive in the Dell Optiplex 780 covered in dust
The hard drive in the Dell Optiplex 780

First I unscrewed all the screws, there is another screw holding the read/write head under the label.

Hard drive and hard drive mainboard
Front of hard drive and hard drive mainboard

After that I took it apart a little more, it has one platter internally and one big old magnet which I kept.

WD2500AAKX internals with platter and read write head exposed
Well, I’ve let the magic smoke out now.

Interestingly there seems to be a metal piece on the bottom and side of the hard drive which I think is for easy destruction. CrystalDiskInfo said it had 29202 hours on it and 2875 power ons, nearly exactly the same as my ST2000DM001-1CH164 T2B hard drive. The smart data also had warnings for its Reallocated Sectors Count.

It’s in the bin now. Thanks for reading.