I didn't really write any code for this one, but I still thought it was fun. I've been keeping track of the number of times that I sneeze, and trying to correlate it with pollen counts from Pollen.com, and Weather.com. Here's my (normalized) results so far.
It seems to track pretty well with Pollen.com. I'll see how it goes...
Thursday, October 16, 2014
Friday, September 19, 2014
Visualizing a Processing Queue
As many of you know, I have a thing for fancy charts. It's been a while since I've posted anything, so I thought I'd take some time to share a new fancy chart I've come up with to view the processing time of queues.
We have many queues where I work. Some application will throw a job into a queue, and another application will pick up that job and process it. Usually these queues are implemented as tables in a SQL database, and they almost always have these columns in common:
The Gist
- Every point in the graph represents a job processed by some queue.
- The y axis shows the total time the job was in the queue, including processing time
- The x axis shows the time that the job completed
- The red line trailing on the back of the point represents the time that job waited before being processed
- The green line trailing on the back of the point represents the processing time for that job
Why
We have many queues where I work. Some application will throw a job into a queue, and another application will pick up that job and process it. Usually these queues are implemented as tables in a SQL database, and they almost always have these columns in common:
- Insertion Time (What time did we add the job to the queue)
- Start Time (What time did the application start the job?)
- End Time (What time did the application finished the job?)
- Success (Did the application finish successfully?)
Whenever there's a problem, the question "Why is it going slow" inevitably ends up on someone's shoulders. Just looking at the times in our logs doesn't always give you a good answer. Ultimately, there are two things that need to be considered when jobs start to slow down.
- How long are jobs taking to process? (EndTime-StartTime)
- How long are jobs waiting before processing starts? (StartTime-InsertionTime)
My new graph answers these questions perfectly!
An extremely small how to
Let's say I've got 3 different jobs that processed. Job 1,2, and 3 were inserted at time 1,2, and 3 respectively. For simplicity, I'm going to show times as integers. Here's the log:
Job | InsertionTime | StartTime | StopTime | Success | Duration |
1 | 1 | 1 | 3 | 1 | 2 |
2 | 2 | 3 | 5 | 0 | 3 |
3 | 3 | 5 | 7 | 1 | 4 |
Here, duration is that Total Time in the processing queue (EndTime - InsertiontTime).
Let's plot the Duration and StopTime together:
This gives me this:
Now lets add on the line segments:
This gives me my final output:
And that's it!
Let's plot the Duration and StopTime together:
This gives me this:
Now lets add on the line segments:
This gives me my final output:
And that's it!
Conclusion
This has proven to be extremely valuable. With this graph, I can quickly determine whether an application is genuinely taking too long, or if we just got hit with a bunch of input, and we need to start up new instances of said application. Ultimately, I can now run a quick R script and have a great picture of what's happening to our system as a whole.
Wednesday, August 13, 2014
Data Mining News and the Stock Market Part 5 - Opinion Mining
If it's not obvious by now, this is the fifth part of my "Data Mining News and the Stock Market" post, but if you' don't want to go back and read my previous posts, here's where I'm at. I have a sqlite database that's filled with news articles about companies on the NYSE.
I don't have all the data that I'm going to need, but I have enough to start getting some sentiment. While doing some searching on how to do opinion mining in python, I found this repo: https://github.com/kjahan/opinion-mining.
On the description, they guy described how they used a sentiment analysis API.
I don't mean to sound lazy, but that's just about the best news I've heard in a long time.
Let's be honest. For a one man project, this was going to be a bit of a nightmare. I could have spent my time annotating articles, POS tagging sentences, and building a dictionary of positive/negative tokens. But, it would have either taken me a really long time, or cost me a lot of money. In the end, it might not have even worked. I was really excited to find a simple API.
Also, classifications tend to work better in aggregate. Keeping that in mind, I can architect my code to use several different sentiment analysis APIs. My opinion mining will really just aggregate other people's opinion mining software! I only need to keep the API's pricing model and data restrictions in mind, and that can be controlled by software. To make things even easier, I found a list of API's here: http://blog.mashape.com/list-of-20-sentiment-analysis-apis/
The API that the opinion-mining repo used is on Mashape, and will allow me 45,000 API calls per month for free (and .01 after that). I've stopped my scraping app after about 12,000 articles, and was about 40% through Reuters. I'm really only going to need to get any article's sentiment once, so this should work well for me as a test.
Now my machine learing is done by someone else! I can get the sentiment of an article with a script like this:
After running this, I did a quick sanity check. Good articles seemed to have a high score, and bad articles had a low score! Done!
Also, the ranges of dates seem to be all over the place. Based on some of my initial testing, I was expecting to only have things from around this year, but I have articles all the way back from 2009. The pricing isn't really an issue because downloading price data is super cheap. I'm just worried about the inconsistency of my data. News and Prices might have had a much different relationship 5 years ago. I can't be sure.
I may be getting a lot of articles that have nothing to do with anything. While browsing my database, I stumbled on an article about the Olympics. It had come up while searching for "Sprint".
I don't have all the data that I'm going to need, but I have enough to start getting some sentiment. While doing some searching on how to do opinion mining in python, I found this repo: https://github.com/kjahan/opinion-mining.
On the description, they guy described how they used a sentiment analysis API.
I don't mean to sound lazy, but that's just about the best news I've heard in a long time.
Let's be honest. For a one man project, this was going to be a bit of a nightmare. I could have spent my time annotating articles, POS tagging sentences, and building a dictionary of positive/negative tokens. But, it would have either taken me a really long time, or cost me a lot of money. In the end, it might not have even worked. I was really excited to find a simple API.
Also, classifications tend to work better in aggregate. Keeping that in mind, I can architect my code to use several different sentiment analysis APIs. My opinion mining will really just aggregate other people's opinion mining software! I only need to keep the API's pricing model and data restrictions in mind, and that can be controlled by software. To make things even easier, I found a list of API's here: http://blog.mashape.com/list-of-20-sentiment-analysis-apis/
The API that the opinion-mining repo used is on Mashape, and will allow me 45,000 API calls per month for free (and .01 after that). I've stopped my scraping app after about 12,000 articles, and was about 40% through Reuters. I'm really only going to need to get any article's sentiment once, so this should work well for me as a test.
Adding Sentiment to my System
I've added some models to my database to keep track of all the API's I'm going to be using. Each API will have a unique url and key, but they're all going to be called the same way. I've also added an API response object. This will contain the response that I get back, and a score. These API's seem to have different kinds of results, so the score will have to be calculated differently depending on which one I'm using. Since these API's are going to be used to get opinions, I'm calling them OpinionAPI'sNow my machine learing is done by someone else! I can get the sentiment of an article with a script like this:
After running this, I did a quick sanity check. Good articles seemed to have a high score, and bad articles had a low score! Done!
Now what?
So now I've got some work to do. I've got a little bit of data, and I have a good framework for adding classifiers.- Build more web scrapers
- Get more data
- Add more API's
- Get prices into database (Simple, but still haven't done it)
Some Concerns
Some of the assumptions I made throughout this project haven't really panned out. For instance, the number of articles I have per company is way off. I was expecting around 1000 articles per company, but I only have a handful of articles for a several companies. It ranges anywhere from 0 to a little over 1000. I'll need to take a closer look at my Reuters scraper.Also, the ranges of dates seem to be all over the place. Based on some of my initial testing, I was expecting to only have things from around this year, but I have articles all the way back from 2009. The pricing isn't really an issue because downloading price data is super cheap. I'm just worried about the inconsistency of my data. News and Prices might have had a much different relationship 5 years ago. I can't be sure.
I may be getting a lot of articles that have nothing to do with anything. While browsing my database, I stumbled on an article about the Olympics. It had come up while searching for "Sprint".
with players forced to sprint and stretch behind the goal-lines in order to preserve the surface.This is a huge concern for me. The whole article had a really negative score, and I have no way of knowing how many articles like this are in my database. I may have to search Reuters by stock ticker instead of by company name. I know I said earlier that I was going to look for the company name in the title, but that could just as easily have the same problem.
Monday, August 11, 2014
Data Mining News and the Stock Market Part 4 - Collecting News Articles
This is the fourth part of my "Data Mining News and the Stock Market" post.
First thing I did was set up a repo on github. I needed a good name. "Newtsocks" is an anagram for "news" and "stock", so I went with that.
Next, I set up an isolated python virtual environment using virtualenv. It's pretty handy, and keeps me from cluttering up my computer. At first, I made the mistake of including virtualenv directories in my repo. For future reference, don't do that.
I also needed a database to store all this info I'm going to be grabbing. After a bit too much research, I decided to stick with a relational database. I'm just more comfortable with them, and decided I could build this out a little faster if I stick to a traditional rdbms. Also, it's not important (right now) that my application be able to scale out.
Next, I wanted a good lightweight ORM. I know I'm doing basic web scraping here, and this probably seems like overkill, but I feel like the upfront work will save me some time later on. I decided to go with peewee. Seemed like it would be pretty simple to get running. And it was! I created my database, and added my list of companies.
Then I needed to do the actual scraping. I wrote up a simple script. You can check out the process I went through by looking at the history on the repo, but I ended up with something like this:
Now I've got some data! It's just a prototype for the scraper. It's nothing novel, but it's working. I started running it about 30 minutes ago, and I've got a little less than 5000 news articles. I can start work on the scraper for the Washington Post, but I'm more interested in getting to work on the sentiment analysis aspect of it.
I would like to add, I spent a bit too much time trying to make my code pretty. I added my repo on landscape.io because I wanted to see how my code was ranked. Landscape seems to have some trouble loading peewee. My code came up with a lot of errors on their site, even though it runs fine. I may be doing something wrong, but based on Landscape's issue tracker, this may be fixed soon. I'll keep an eye on it.
For now, I'm pushing forward. Time for some opinion mining!
First thing I did was set up a repo on github. I needed a good name. "Newtsocks" is an anagram for "news" and "stock", so I went with that.
Next, I set up an isolated python virtual environment using virtualenv. It's pretty handy, and keeps me from cluttering up my computer. At first, I made the mistake of including virtualenv directories in my repo. For future reference, don't do that.
I also needed a database to store all this info I'm going to be grabbing. After a bit too much research, I decided to stick with a relational database. I'm just more comfortable with them, and decided I could build this out a little faster if I stick to a traditional rdbms. Also, it's not important (right now) that my application be able to scale out.
Next, I wanted a good lightweight ORM. I know I'm doing basic web scraping here, and this probably seems like overkill, but I feel like the upfront work will save me some time later on. I decided to go with peewee. Seemed like it would be pretty simple to get running. And it was! I created my database, and added my list of companies.
Then I needed to do the actual scraping. I wrote up a simple script. You can check out the process I went through by looking at the history on the repo, but I ended up with something like this:
Now I've got some data! It's just a prototype for the scraper. It's nothing novel, but it's working. I started running it about 30 minutes ago, and I've got a little less than 5000 news articles. I can start work on the scraper for the Washington Post, but I'm more interested in getting to work on the sentiment analysis aspect of it.
I would like to add, I spent a bit too much time trying to make my code pretty. I added my repo on landscape.io because I wanted to see how my code was ranked. Landscape seems to have some trouble loading peewee. My code came up with a lot of errors on their site, even though it runs fine. I may be doing something wrong, but based on Landscape's issue tracker, this may be fixed soon. I'll keep an eye on it.
For now, I'm pushing forward. Time for some opinion mining!
Wednesday, August 6, 2014
Data Mining News and the Stock Market Part 3 - Collecting Some Prices
This is the third part of my "Data Mining News and the Stock Market" post.
This one turned out to be super easy.
I found this website: http://www.eoddata.com/.
I had an account in a few minutes, and purchased all of the NYSE's end of day data from the past 5 years for $12.50. So far, that's the only money I've spent on this project, and it's well worth it. Since I can't get news data older than April, I really only need the last year. For $12, who cares? I'll take it!
I go to their download page, and this is what I see:
That's right. Less than 10Mb.
The zip files contain a list of text files for every day of 2014. The text files look like this:
I can definitely work with that.
Next up, I'm going to start work on my web scrapers. They're going to need a place to put all the articles I download. I haven't messed with any non relational databases yet. This might be a good project to try one out.
This one turned out to be super easy.
I found this website: http://www.eoddata.com/.
I had an account in a few minutes, and purchased all of the NYSE's end of day data from the past 5 years for $12.50. So far, that's the only money I've spent on this project, and it's well worth it. Since I can't get news data older than April, I really only need the last year. For $12, who cares? I'll take it!
I go to their download page, and this is what I see:
That's right. Less than 10Mb.
The zip files contain a list of text files for every day of 2014. The text files look like this:
I can definitely work with that.
Next up, I'm going to start work on my web scrapers. They're going to need a place to put all the articles I download. I haven't messed with any non relational databases yet. This might be a good project to try one out.
Tuesday, August 5, 2014
Data Mining News and the Stock Market Part 2 - Picking Some Companies
This is the second part of my "Data Mining News and the Stock Market" post. I apologize for my stream of consciousness approach. I was writing this as I was doing it.
A quick Google search of NYSE top companies led me to this Wall Street Journal article: NYSE Most Active Stocks. Seems like the perfect place to start. Now, I said before that the number of companies that I pick is largely going to be determined by how easily I can get data, and how much data I can get. So I need to do a little bit of work now before I pick my companies. There are two things I need for each company; news articles and historical price data.
This may just be conjecture, but I feel, given the rise of the Internet, the relationship between news and the market has probably changed drastically over the past 10 years. So information from the 90's probably won't help me. Generally, it's best not to ignore data, but in some circumstances it can be justified. Reading through the LDC's site gave me a great list of news sources to look at:
First, I'm going to go through my news sources and search for one of the company names: "Bank of America". Then I'll see if I can build out an API for that company.
Kimono worked great with Reuters. The site works exactly as advertised. In about 10 seconds, I built out my API. I even made a nice little mobile app for it here!
Actually, the mobile app stopped working after I let it crawl for a while. Too much data I guess. But still, pretty sweet.
I let the crawler go through 101 search pages and I got back 914 articles that go as far back as Apr 23, 2014. Actually, it got the title, URL, and date of publication for 914 news articles.
That's not too great. Also, the pagination gets a little weird on Reuters when you get around page 100. It keeps breaking Kimono.
It may be difficult to get data that goes too far back. I'll only be able to get a few months worth of news on any given company based on my Reuters search. I was hoping to get at least the last few years. Given that there are almost 1000 articles about Bank of America in the past few months, maybe the last few years would be a bit much.
So, I'm going to stick with my 100 companies, and try to collect news as far back as I can. My guess is about 1000 articles per company per news source. With 2 news sources, 100 companies, and 1000 articles, that's 200,000 news articles. Shouldn't be too bad. Since I can't get news that goes too far back, getting price data shouldn't be too difficult.
A quick Google search of NYSE top companies led me to this Wall Street Journal article: NYSE Most Active Stocks. Seems like the perfect place to start. Now, I said before that the number of companies that I pick is largely going to be determined by how easily I can get data, and how much data I can get. So I need to do a little bit of work now before I pick my companies. There are two things I need for each company; news articles and historical price data.
Collecting the News
While I was looking for a place to download the news, I stumbled across this academic paper from Columbia University: Snowball: Extracting Relations from Large Plain-Text Collection. I didn't read the paper, but I did see this:Our experiments use large collections of real newspapers from the North American News Text Corpus, available from LDC. This corpus includes articles from the Los Angeles Times, The Wall Street Journal, and The New York Times for 1994 to 1997.The LDC is the Linguistic Data Consortium. They're a membership organization that basically gives data to research labs. As a student at a university, I had access to all of this data. As a guy sitting on his laptop, this is going to cost me some money, and I doubt I could get it anyway. I decided to look around their site to see if they had an updated version of the corpus. They don't. The latest news corpus I could see was from 2008, and it was actually just a parsing of the same corpus from 1997.
This may just be conjecture, but I feel, given the rise of the Internet, the relationship between news and the market has probably changed drastically over the past 10 years. So information from the 90's probably won't help me. Generally, it's best not to ignore data, but in some circumstances it can be justified. Reading through the LDC's site gave me a great list of news sources to look at:
- Washington Post
- New York Times
- Wall Street Journal
- Reuters News Service
Kimono
There's this pretty cool tool I heard about a while ago called Kimono. It lets you easily build out API's for web sites. Basically, it does all your web scraping for you. I'm not going to use it for my final project, but it should give me a pretty good starting point quickly.First, I'm going to go through my news sources and search for one of the company names: "Bank of America". Then I'll see if I can build out an API for that company.
Washington Post
Had some trouble getting this to work. The individual pages are decent, but Kimono doesn't seem to be able to handle the pagination. So I can't get too much news. This will need a scraper.Reuters News Service
Kimono worked great with Reuters. The site works exactly as advertised. In about 10 seconds, I built out my API. I even made a nice little mobile app for it here!
Actually, the mobile app stopped working after I let it crawl for a while. Too much data I guess. But still, pretty sweet.
I let the crawler go through 101 search pages and I got back 914 articles that go as far back as Apr 23, 2014. Actually, it got the title, URL, and date of publication for 914 news articles.
That's not too great. Also, the pagination gets a little weird on Reuters when you get around page 100. It keeps breaking Kimono.
It may be difficult to get data that goes too far back. I'll only be able to get a few months worth of news on any given company based on my Reuters search. I was hoping to get at least the last few years. Given that there are almost 1000 articles about Bank of America in the past few months, maybe the last few years would be a bit much.
So, I'm going to stick with my 100 companies, and try to collect news as far back as I can. My guess is about 1000 articles per company per news source. With 2 news sources, 100 companies, and 1000 articles, that's 200,000 news articles. Shouldn't be too bad. Since I can't get news that goes too far back, getting price data shouldn't be too difficult.
Saturday, August 2, 2014
Data Mining News and the Stock Market
I've always wondered if there was a causal relationship between news and the stock market. If Apple's stock price drops 50%, you'll probably here about it on the news, but if an article pops up online saying that Apple is using slave labor in China, what will happen to the stock price? It's a modern day chicken and egg problem. Which comes first? Granted, I may be over simplifying things. The stock market is a very dynamic system, but I thought I'd do some good old fashion data mining and see how far I can get.
I want to make some things clear up front. I don't know that much about the stock market. I'm not a data mining, machine learning, natural language processing expert. I studied these things when I was in school, but I've slept since then. I have a pretty good foundation, but I've been doing .NET web development for the past few years. I'm nowhere near up to date on the latest tools, techniques, and practices.
I'm also well aware that this isn't necessarily a novel idea. As I've said before, I have a bad habbit of testing things out before I google them. Besides, this a great oportunity totest my limits, to see if I've still got it, to finish a project.
My general hypothesis is that internet news articles about a company have an effect on that company's stock price. My slightly more testable hypothesis is that a company's stock price will fall soon after the publication of a negative internet news article realated to that company. I'll also be testing the inverse; that the price rises when good news articles come out. You get the idea. Here's my basic plan:
My main goal is to give each article a numerical score on how positive or negative it makes a company look. A news article titled "RadioShack Under $1: The Clock Is Ticking" would have a negative score. Let's say -10. What I'll do is keep a running total of this score. This way, I can measure public opinion of a company over time.
That's about it. My hope is that blogging will be enough motivation for me to finish the project. I'll try to keep this as up to date as possible, and put out as much code and data as I can.
I want to make some things clear up front. I don't know that much about the stock market. I'm not a data mining, machine learning, natural language processing expert. I studied these things when I was in school, but I've slept since then. I have a pretty good foundation, but I've been doing .NET web development for the past few years. I'm nowhere near up to date on the latest tools, techniques, and practices.
I'm also well aware that this isn't necessarily a novel idea. As I've said before, I have a bad habbit of testing things out before I google them. Besides, this a great oportunity to
My general hypothesis is that internet news articles about a company have an effect on that company's stock price. My slightly more testable hypothesis is that a company's stock price will fall soon after the publication of a negative internet news article realated to that company. I'll also be testing the inverse; that the price rises when good news articles come out. You get the idea. Here's my basic plan:
Step 1: Pick a some companies
I'm going to pick some big name companies on the NYSE. I'm limiting myself to just the one market for simplification. I'm not sure wether or not I'm going to stick to one particular market area or not. Should I only pick big tech companies, or should I pick big companies from several different market areas. The number of companies and areas that I pick will be determined by how easily I can collect data on them.Step 2: Collect Prices
I'm going to need to collect as much pricing data as I can about these companies from the past 5-10 years. I'm not sure about the granularity of the data yet. I may just collect closing prices for a given day, or I may collect a lot more. I'm going to have to see what's out there.Step 3: Collect News Articles
Collect news articles from a few different news sources about these companies. Not sure where I'm going to get the news. I imagine that this will have a large impact on my results. Hopefully there are some websites with API's for this sort of thing, but a web scraper shouldn't be too difficult to script out. In the end, I need a dataset with the articles title, text, source, and date/time of publication. Also, to simplify things, I'm only going to look at news articles who's titles contain the name or stock symbol of the company. I should be able to work from there.Step 4: Sentament Analysis
This should be the fun part. I'm going to need to do some analysis on these news articles. Basically, I want to be able to measure public opinion of each of these companies over time. I'm going to need to see what's out there in terms of tools. I have some exposure to data/opinion mining, but it's been a few years, so I'm betting the tools that are out there now are a lot better than the ones I was working with. Hopefully, there are some tools out there that won't require any sort of annotation. It's really going to slow me down if I have to start reading these articles.My main goal is to give each article a numerical score on how positive or negative it makes a company look. A news article titled "RadioShack Under $1: The Clock Is Ticking" would have a negative score. Let's say -10. What I'll do is keep a running total of this score. This way, I can measure public opinion of a company over time.
Step 5: Compare Price to Public Opinion?
This is where I start to get a little hazy. I'm not quite sure how I'm going to be able to tell if public opion is actually having an effect. Sure, I might be able to look at some graphs and say so, but there's probably some fancy math that I can't remember that can prove that there's a correlation. I'm going to need to do more research before I can actually say what I'm going to do here, but hopefully I get some fancy graphs out of it.That's about it. My hope is that blogging will be enough motivation for me to finish the project. I'll try to keep this as up to date as possible, and put out as much code and data as I can.
Sunday, July 27, 2014
How Simulations Saved Me From Vegas
A few weeks ago, I came up with the same seemingly brilliant but foolish idea that a lot of hopeful gamblers have thought of. I had just returned from a mildly successful weekend Vegas trip, and I had gambling on my mind. In particular, I was thinking about games of pure luck. Games like Roulette or Casino War.
My first trip to a casino as an 18 year old was unsuccessful. My roommates
and I had been practicing counting cards for weeks, and decided to try
our luck. Turned out, it was a lot easier to remember card counts and
strategy sitting in our living room. I've always enjoyed that small adrenaline
rush that comes with making a bet, but it doesn't really help you when
you're trying to do math in your head. We shifted our focus to poker,
but that's a blog post for another time.
I wasn't planning on making money on my most recent trip. I was there for a friend's birthday. I didn't want to spend 8 hours sweating at a blackjack table. Also, I lost $100 on Blackjack during the first few hours I was there, so I was kind of done with that game. Instead, I wanted to play games of pure luck. It turned out to be strangely relaxing. When I got back, I started doing some math. Questions like "What's the expected value of a Casino War bet?" started popping up in my head. I've got a bad habit of trying to figure stuff out before I google it.
Then, I had an epiphany. What if I double my bet every time I lose? Here's what I came up with.
- Go to a table with $1270 ($10 + $20 + $40 + $80 + $160 + $320 + $640)
- If I win a bet, put that $10 aside.
- If I lose a bet, double it for the next round.
- Stop when my original $1270 is gone.
- Otherwise... profit.
I would have to lose 7 bets in a row to lose all my money. In roulette, if I bet on red, I've got about a 53% chance of losing. On any given betting round, that's about a 99% chance I win $10. It was foolproof. I wrote up some code to simulate it, and to see if I had finally beaten the casinos. Surprisingly, my simulations lost money. A lot of it.
I ran it by someone at work, who quickly pointed out that it was still a horrible bet, and that I wasn't the first to think of it. It's called a Martingale Betting System, and it doesn't work. Well, it does if you have an infinite amount of money, which is a problem for me.
First off, that 1% chance that I lose is still a big hit over the long run, because I'm losing $1270. That 99% chance that I win $10 doesn't cover me enough to give me a positive expected value.
Still, I had some code that made some fancy charts, and I wanted to show it off.
The results of the simulations were still kinda cool. I made 3 gamblers. They would all win and lose at the same times.
- A flat better who always bet the same amount
- A martingale better with no betting restrictions
- A martingale better with a maximum bet
After 15 bets, the martingale better was looking pretty good. I hadn't come anywhere close to the $500 limit, so the 2 martingale betters were doing the same. So, lets see what happens after 100 bets.
Still not too bad. There's a few spots where the martingale betters have to make some pretty big bets, and it takes a bit for the limited one to catch back up, but they still did way better than the flat better. Let's do this with 10000 bets...
The unlimited martingale better did great! But the more realistic limited martingale better did crazy bad. He can't make up for his losses fast enough. He just looses money faster than the flat better.
So... What did I learn? Casinos are smart. Maximum bets exist for a reason. Next time, I'm going to do some googling before I get too excited about something.
I posted the code I used to make the graphs on Github here. Feel free to take a look. Don't judge me too harshly, I usually don't do too much in Python.
Friday, July 25, 2014
How I Won the Company Costume Contest: Using CURL to Submit HTTP Forms
The Problem
It was Halloween, and there was a costume contest at the office. I had a beard, and I love Bill Murray, so I threw on an orange ski hat, cheap aviators, and a blue shirt. I was Steve Zissou. It was pretty glorious but my coworkers failed to recognize me. Everyone thought I was supposed to be the uni-bomber. I was not amused.The people who dressed in costumes all got together for pictures. Someone made up a quick Google survey and sent out a link to vote on your favorite costume. After I made my decision, I noticed I could submit more than one vote. Classic mistake.
The survey was a simple form. All I needed to do was submit a bunch of these forms, with my name selected.
The Solution
After a bit of googleing, I came across a program called Curl. It’s a bit like wget if you’re familiar. Curl is a client that can send to (or receive files from) a server .The command is designed to work without user interaction or any kind of interactivity.Curl can do a lot of stuff, but all I needed to do was make some HTTP posts.
I opened up the developer tools in chrome (F12), clicked the network tab, started logging, and submitted the form.
This gave me all the information I needed.
I opened up an editor, and wrote something along the lines of this:
curl -v --cookie "COOKIE_COPIED_FROM_CHROME" --data "FORM_DATA_COPIED_FROM_CHROME"I saved that script in a file called “vote”. I wanted to repeat my form submission a lot, so I ran something like this in a linux shell:
https://docs.google.com/url/copied/from/chrome
for i in {1...1337}; do ./vote; sleep 1; donei.e. Vote for me every second for 1337 times.
The Results
Needless to say, I dominated the competition. Unfortunately, our company has less than 200 employees, so my landslide victory was a bit suspicious. If I had known there was going to be a $100 prize, I might have been a bit more subtle.
All in all, Curl was a fast, free, easy to use scripting tool. I don't think I've ever had that much entertainment from a ten minute script.
Subscribe to:
Posts (Atom)