Quantcast

Ben McCann

Co-founder at Connectifier.
ex-Googler. CMU alum.

AngelList Twitter LinkedIn Google+

Setting up the RockMongo GUI on Ubuntu

04/17/2012

The easiest way to get started is to install Apache and PHP:

$ sudo apt-get install apache2 php5 php-pear

If you need to edit the Apache ports because you already have another server running on port 80 then edit /etc/apache2/ports.conf.

You’ll need to install the PHP Mongo connector:

sudo pecl install php_mongo

Add “extension=mongo.so” to the “Dynamic Extensions” section of /etc/php5/apache2/php.ini and restart Apache with sudo service apache2 restart.

Download the latest RockMongo and unzip it under /var/www. You should now be able to login with the default username and password of admin/admin.

Brewer’s CAP Theorem Explained

03/24/2012

When dealing with distributed systems, Brewer’s CAP theorem is often brought up when discussing how a system will behave in certain error conditions. The CAP theorem means that you can only have two of: consistency, availability, and partition tolerance.

Here’s what you’ll be giving up for each of the three that you may sacrifice:

  • C: If you give up consistency, two different machines may return different responses for the same query.
  • A: If you give up availability, some requests will not be answered if there’s a network problem.
  • P: If you give up partition tolerance, some requests will not return as long as there’s a network problem.

Web developers never want to give up P since having a request hang when there’s a network problem is worse than having it fail.  As a web developer, CAP means you must make the choice between having a site that never goes down, but regularly return stale data or a site that never returns stale data, but goes down if there’s a problem.  Thus the real choice is between C and A in this context.  A bank website would choose consistency over availability.  Getting the balance in someone’s account wrong is worse than having the site be down.  Google chose availability, which is why you never see it go down.  The tradeoff is that it may be looking at a slightly stale version of the index when ranking some queries.

Installing Oracle Java JDK on Ubuntu

03/23/2012

Due to licensing restrictions, Ubuntu no longer comes with Oracle’s Java JDK. You can install it by running:

sudo mkdir -p /opt/java/64
cd /opt/java/64
sudo wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-x64.bin
sudo chmod 755 ./jdk-6u31-linux-x64.bin
sudo ./jdk-6u31-linux-x64.bin
sudo rm ./jdk-6u31-linux-x64.bin
sudo update-alternatives --install /usr/bin/java java /opt/java/64/jdk1.6.0_31/bin/java 2000 \
    --slave /usr/bin/javac javac /opt/java/64/jdk1.6.0_31/bin/javac \
    --slave /usr/bin/javadoc javadoc /opt/java/64/jdk1.6.0_31/bin/javadoc \
    --slave /usr/bin/javah javah /opt/java/64/jdk1.6.0_31/bin/javah \
    --slave /usr/bin/javap javap /opt/java/64/jdk1.6.0_31/bin/javap

To verify:

$ java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

Using Python’s Pandas inside IPython Notebook

02/14/2012

IPython is a cool shell to run Python from and Pandas is a Python library for holding tabular data similar to R’s data frame.

To install the software run:

sudo apt-get install libzmq-dev python-dev g++ libfreetype6-dev libpng12-dev libblas-dev liblapack-dev gfortran cython libhdf5-serial-dev
sudo pip install ipython
sudo pip install tornado
sudo pip install pyzmq
sudo pip install pygments
sudo pip install numpy
sudo pip install matplotlib
sudo pip install scipy
sudo pip install patsy
sudo pip install statsmodels
sudo pip install pandas
sudo pip install pytz
sudo pip install numexpr
sudo pip install tables

To run IPython notebook run:

ipython notebook --pylab inline

As an example, you can run the following code in the IPython web notebook to draw a chart of the S&P 500:

import datetime
import matplotlib.pyplot as plt
from pandas.io.data import DataReader

sp500 = DataReader("^GSPC", "yahoo", start=datetime.datetime(2000, 1, 1)) # returns a DataFrame
top = plt.subplot2grid((3,1), (0, 0), rowspan=2)
top.plot(sp500.index, sp500["Adj Close"])
bottom = plt.subplot2grid((3,1), (2,0))
bottom.bar(sp500.index, sp500.Volume)
plt.gcf().set_size_inches(18,8)

Migrating from MySQL to Percona Server

12/11/2011

Percona Server is just MySQL with a few extra options added in by Percona. It’s backwards compatible and based off the same code base. If you’re not familiar with Percona, they are the world’s leading MySQL consultants. The main reason I switched is because Ubuntu uses an old version of MySQL. Ubuntu is about a year behind in packaging MySQL. Something to do with checking the copyright after Oracle got ahold of it. This seemed to be the easiest way to update. A few other reasons follow.

Everyone and their mom says xtraBackup is the way to go for MySQL backups. Even Facebook uses it. xtraBackup is an open source project made by Percona. mysqldump is fine for small projects, but it’s not real scalable when you have any real amount of data. It’s available in the Percona apt repositories.

By default, older version of MySQL use the MyISAM storage engine, which has fallen out of favor. The default in newer MySQL installs is InnoDB. Percona also makes a storage engine called XtraDB, which is backwards compatible with InnoDB and supposedly a bit more performant. MariaDB (MySQL fork maintained by the MySQL creator) uses it as their default as well. Sounds like most people don’t notice a huge difference between XtraDB and InnoDB, but both are much favored over MyISAM which caused lots of problems for people.

Finally, there’s also HandlerSocket, which is a plugin for MySQL. It allows you to do primary key lookups directly to the storage engine bypassing MySQL’s SQL layer. It’s supposed to be 5-10x faster because it doesn’t have to parse the SQL and do table locking. It turns MySQL into a key/value as good as any of the NoSQL solutions. It’s actually much better because you can still run SQL queries on your data, which you can’t do with most of the NoSQL solutions and you get MySQL’s replication etc. which is all very well documented. As long as your DB can fit in RAM on a single machine it makes MySQL much faster. Perhaps even faster and easier to use than even memcached.

To migrate, first create a backup:

mysqldump -uroot -p --all-databases > dump.sql

Then do the upgrade:

gpg --keyserver  hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
gpg -a --export CD2EFD2A | sudo apt-key add -
sudo emacs /etc/apt/sources.list
Add:
    ## Percona repository
    deb http://repo.percona.com/apt oneiric main
    deb-src http://repo.percona.com/apt oneiric main
sudo apt-get update
sudo apt-get install percona-server-server-5.5
sudo apt-get autoremove

Running Ubuntu on VirtualBox

11/30/2011

I had to figure out a few things to get Ubuntu installed and working well on VirtualBox.

I had to enable virtualization technologies in my BIOS. I have a Lenovo T520 and did this by pressing F1 during startup and then going to Security > Virtualization. If I did not do this then I would receive the error “VT-x features locked or unavailable in MSR” when trying to run with more than 1 CPU or 3584 MB of RAM. Don’t forget to increase the VirtualBox settings to use more RAM and CPUs after updating this.

The default disk is an 8GB dynamically expanding VDI. You may want to consider changing the default from 8GB to something more like 100GB. This is the max size only and will not be used unless needed.

I had to check “Enable 3D Acceleration” under the “Display” settings in order to get Ubuntu Unity to work.

I had to run “sudo apt-get install dkms” before installing the VirtualBox Guest Additions to get them to work.

To get USB 2.0 devices to pass through (necessary for Android development), you’ll need to download and install the extension pack. Make sure you’re on the latest version of VirtualBox, then right click the icon and choose “Run as administrator”, followed by “Preferences” -> “Extensions”, and choose and install the downloaded extension pack.

Finally, I remapped the host key. By default all kinds of weird things happen when you use the right Ctrl button. This can be fixed by going to File > Preferences… > Input and then setting Host Key to something you never use like Pause.

SSL on localhost with nginx

11/14/2011

Install nginx if it’s not already installed:

sudo apt-get install nginx

You must have the SSL module installed. The nginx docs say this is not standard. However, it does come installed on Ubuntu. You can verify by running nginx -V and looking for --with-http_ssl_module.

Next up is generating the SSL certs. Follow the Slicehost docs for this step.

Now you’ll need to update your /etc/nginx/nginx.conf file:

  upstream backend {
    server 127.0.0.1:9000;
  }

  server {
    server_name www.yourdomain.com yourdomain.com;
    rewrite ^(.*) https://www.yourdomain.com$1 permanent;
  }

  server {
    server_name local.yourdomain.com;
    rewrite ^(.*) https://local.yourdomain.com$1 permanent;
  }

  server {
    listen               443;
    ssl                  on;
    ssl_certificate      /etc/ssl/certs/myssl.crt;
    ssl_certificate_key  /etc/ssl/private/myssl.key;
    keepalive_timeout    70;
    server_name www.yourdomain.com local.yourdomain.com;
    location / {
      proxy_pass  http://backend;
    }
  }

Then restart nginx:

sudo nginx -s reload

Finally, in /etc/hosts put:

127.0.0.1   local.yourdomain.com

This will allow you to visit https://local.yourdomain.com/ which will be served up by the server that you have running on port 8080.

Embedded Tomcat

08/28/2011

Earlier in the year, I posted a quick writeup on how to run an embedded Jetty instance. Today, I’m posting basically the same code showing how to run an embedded Tomcat instance. The embedded Tomcat API is much nicer since it matches closely the web.xml syntax. However, the embedded Tomcat instance takes much longer to startup.

package com.benmccann.webtemplate.frontend.server;

import java.net.URL;

import org.apache.catalina.Context;
import org.apache.catalina.core.AprLifecycleListener;
import org.apache.catalina.core.StandardServer;
import org.apache.catalina.deploy.FilterDef;
import org.apache.catalina.deploy.FilterMap;
import org.apache.catalina.startup.Tomcat;
import org.apache.struts2.dispatcher.ng.filter.StrutsPrepareAndExecuteFilter;

import com.beust.jcommander.JCommander;
import com.google.inject.Guice;
import com.google.inject.Inject;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceFilter;

/**
 * @author Ben McCann (benmccann.com)
 */
public class WebServer {

  private final FrontendSettings webServerSettings;
  private final GuiceListener guiceListener;
  private final Tomcat tomcat;

  @Inject
  public WebServer(
      FrontendSettings webServerSettings,
      GuiceListener guiceListener) {
    this.webServerSettings = webServerSettings;
    this.guiceListener = guiceListener;
    this.tomcat = new Tomcat();
  }

  private FilterDef createFilterDef(String filterName, String filterClass) {
    FilterDef filterDef = new FilterDef();
    filterDef.setFilterName(filterName);
    filterDef.setFilterClass(filterClass);
    return filterDef;
  }
  
  private FilterMap createFilterMap(String filterName, String urlPattern) {
    FilterMap filterMap = new FilterMap();
    filterMap.setFilterName(filterName);
    filterMap.addURLPattern(urlPattern);
    return filterMap;
  }
  
  public void run() throws Exception {
    String appBase = ".";
    tomcat.setPort(webServerSettings.getPort());

    tomcat.setBaseDir("webapp");
    tomcat.getHost().setAppBase(appBase);

    String contextPath = "/";

    // Add AprLifecycleListener to give native speed boost
    // sudo apt-get install libtcnative-1
    StandardServer server = (StandardServer)tomcat.getServer();
    AprLifecycleListener listener = new AprLifecycleListener();
    server.addLifecycleListener(listener);

    Context context = tomcat.addWebapp(contextPath, appBase);
    context.addFilterDef(createFilterDef("guice", GuiceFilter.class.getName()));
    FilterDef struts2FilterDef = createFilterDef("struts2",
        StrutsPrepareAndExecuteFilter.class.getName());
    struts2FilterDef.addInitParameter("struts.devMode",
        Boolean.toString(webServerSettings.isDevModeEnabled()));
    context.addFilterDef(struts2FilterDef);
    context.addFilterMap(createFilterMap("guice", "/*"));
    context.addFilterMap(createFilterMap("struts2", "/*"));
    
    tomcat.start();
    tomcat.getServer().await();
  }

  public static void main(String[] args) throws Exception {
    FrontendSettings webServerSettings = new FrontendSettings();
    new JCommander(webServerSettings, args);
    
    Guice.createInjector(new FrontendModule(webServerSettings));
    
    Injector injector = Guice.createInjector();
    
    WebServer server = injector.getInstance(WebServer.class);
    server.run();
  }

}

Installing CUDA 5.0 and Theano on Ubuntu 12.04 Precise

07/09/2011

Theano is a very interesting Python library developed mainly for deep learning, which can run calculations on some NVIDIA GPUs by using the CUDA library.  Setting up Theano to use the GPU can be a little tricky and take a bit of work.

Install the pre-reqs

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

Next, create a symlink to libglut, which will allow you to install the CUDA samples as described on Utkarsh Jaiswal’s blog

sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so.3 /usr/lib/libglut.so

Install CUDA
Download CUDA from the NVIDIA site and then install it:

sudo apt-get remove --purge nvidia*
chmod +x cuda_5.0.35_linux_64_ubuntu11.10-1.run
sudo service lightdm stop
sudo ./cuda_5.0.35_linux_64_ubuntu11.10-1.run

Install Theano

Get the latest released version of Theano:

sudo apt-get install python-dev libopenblas-dev liblapack-dev gfortran
sudo pip install --upgrade Theano

Create a ~/.theanorc file to enable the GPU:

[global]
floatX = float32
device = gpu

Test it out

Now run the sample program under “Testing Theano with GPU” in the Theano tutorial. It will hopefully tell you that it used your GPU.

A good benchmark to test out the speed of your setup is to run /usr/local/lib/python2.7/dist-packages/theano/misc/check_blas.py

Credits

Thanks to the Theano developers for providing this awesome library and to Andrew Ng, Samy Bengio, and the other Googlers who have been taking their time to teach the rest of us more machine learning concepts.

Getting started with Git

04/11/2011

I’ve recently started using Git, which I’ve found I much prefer to Subversion for two reasons. The first is that it’s really fast since almost all commands are run locally. The second reason is that Subversion litters your source code with .svn directories and should you accidentally delete or move one then you’re in for a world of hurt. Git also handles ignored files in a much easier manner.

There are two downsides with Git. The first is that there’s no central server to store the code base. GitHub or BitBucket can fulfill this role if you don’t mind someone else hosting your source code. If you want to set up a central server yourself it seems the best solution is gitolite. The documentation isn’t for beginners, but I found a decent tutorial on setting up gitolite.

The other downside with git is that the commands can be a bit bizarre.

git aliases

You can set aliases using git config --global.  E.g. git config --global alias.dt "difftool --no-prompt" makes git dt act the same as git difftool --no-prompt. These aliases are saved in ~/.gitconfig. My ~/.gitconfig looks like:

[user]
	name = Ben McCann
	email = ben@benmccann.com
[alias]
	cam = commit -am
	dt = difftool --no-prompt
	dtm = !meld .
	pending = !clear & git status
	rev = checkout --
	revall = reset --hard HEAD
[push]
	default = current

Reverting to a previous version

$ git reset --hard YOUR_CHANGESET_HERE
$ git reset --soft @{1}
$ git commit -a
Newer Posts
Older Posts