Quantcast

Ben McCann

Co-founder at Connectifier.
ex-Googler. CMU alum.

Ben McCann on LinkedIn Ben McCann on AngelList Ben McCann on Twitter

Embedded Tomcat

08/28/2011

Earlier in the year, I posted a quick writeup on how to run an embedded Jetty instance. Today, I’m posting basically the same code showing how to run an embedded Tomcat instance. The embedded Tomcat API is much nicer since it matches closely the web.xml syntax. However, the embedded Tomcat instance takes much longer to startup.

package com.benmccann.webtemplate.frontend.server;

import java.net.URL;

import org.apache.catalina.Context;
import org.apache.catalina.core.AprLifecycleListener;
import org.apache.catalina.core.StandardServer;
import org.apache.catalina.deploy.FilterDef;
import org.apache.catalina.deploy.FilterMap;
import org.apache.catalina.startup.Tomcat;
import org.apache.struts2.dispatcher.ng.filter.StrutsPrepareAndExecuteFilter;

import com.beust.jcommander.JCommander;
import com.google.inject.Guice;
import com.google.inject.Inject;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceFilter;

/**
 * @author Ben McCann (benmccann.com)
 */
public class WebServer {

  private final FrontendSettings webServerSettings;
  private final GuiceListener guiceListener;
  private final Tomcat tomcat;

  @Inject
  public WebServer(
      FrontendSettings webServerSettings,
      GuiceListener guiceListener) {
    this.webServerSettings = webServerSettings;
    this.guiceListener = guiceListener;
    this.tomcat = new Tomcat();
  }

  private FilterDef createFilterDef(String filterName, String filterClass) {
    FilterDef filterDef = new FilterDef();
    filterDef.setFilterName(filterName);
    filterDef.setFilterClass(filterClass);
    return filterDef;
  }
  
  private FilterMap createFilterMap(String filterName, String urlPattern) {
    FilterMap filterMap = new FilterMap();
    filterMap.setFilterName(filterName);
    filterMap.addURLPattern(urlPattern);
    return filterMap;
  }
  
  public void run() throws Exception {
    String appBase = ".";
    tomcat.setPort(webServerSettings.getPort());

    tomcat.setBaseDir("webapp");
    tomcat.getHost().setAppBase(appBase);

    String contextPath = "/";

    // Add AprLifecycleListener to give native speed boost
    // sudo apt-get install libtcnative-1
    StandardServer server = (StandardServer)tomcat.getServer();
    AprLifecycleListener listener = new AprLifecycleListener();
    server.addLifecycleListener(listener);

    Context context = tomcat.addWebapp(contextPath, appBase);
    context.addFilterDef(createFilterDef("guice", GuiceFilter.class.getName()));
    FilterDef struts2FilterDef = createFilterDef("struts2",
        StrutsPrepareAndExecuteFilter.class.getName());
    struts2FilterDef.addInitParameter("struts.devMode",
        Boolean.toString(webServerSettings.isDevModeEnabled()));
    context.addFilterDef(struts2FilterDef);
    context.addFilterMap(createFilterMap("guice", "/*"));
    context.addFilterMap(createFilterMap("struts2", "/*"));
    
    tomcat.start();
    tomcat.getServer().await();
  }

  public static void main(String[] args) throws Exception {
    FrontendSettings webServerSettings = new FrontendSettings();
    new JCommander(webServerSettings, args);
    
    Guice.createInjector(new FrontendModule(webServerSettings));
    
    Injector injector = Guice.createInjector();
    
    WebServer server = injector.getInstance(WebServer.class);
    server.run();
  }

}

Installing CUDA 5.0 and Theano on Ubuntu 12.04 Precise

07/09/2011

Theano is a very interesting Python library developed mainly for deep learning, which can run calculations on some NVIDIA GPUs by using the CUDA library.  Setting up Theano to use the GPU can be a little tricky and take a bit of work.

Install the pre-reqs

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

Next, create a symlink to libglut, which will allow you to install the CUDA samples as described on Utkarsh Jaiswal’s blog

sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so.3 /usr/lib/libglut.so

Install CUDA
Download CUDA from the NVIDIA site and then install it:

sudo apt-get remove --purge nvidia*
chmod +x cuda_5.0.35_linux_64_ubuntu11.10-1.run
sudo service lightdm stop
sudo ./cuda_5.0.35_linux_64_ubuntu11.10-1.run

Install Theano

Get the latest released version of Theano:

sudo apt-get install python-dev libopenblas-dev liblapack-dev gfortran
sudo pip install --upgrade Theano

Create a ~/.theanorc file to enable the GPU:

[global]
floatX = float32
device = gpu

Test it out

Now run the sample program under “Testing Theano with GPU” in the Theano tutorial. It will hopefully tell you that it used your GPU.

A good benchmark to test out the speed of your setup is to run /usr/local/lib/python2.7/dist-packages/theano/misc/check_blas.py

Credits

Thanks to the Theano developers for providing this awesome library and to Andrew Ng, Samy Bengio, and the other Googlers who have been taking their time to teach the rest of us more machine learning concepts.

Getting started with Git

04/11/2011

I’ve recently started using Git, which I’ve found I much prefer to Subversion for two reasons. The first is that it’s really fast since almost all commands are run locally. The second reason is that Subversion litters your source code with .svn directories and should you accidentally delete or move one then you’re in for a world of hurt. Git also handles ignored files in a much easier manner.

There are two downsides with Git. The first is that there’s no central server to store the code base. GitHub or BitBucket can fulfill this role if you don’t mind someone else hosting your source code. If you want to set up a central server yourself it seems the best solution is gitolite. The documentation isn’t for beginners, but I found a decent tutorial on setting up gitolite.

The other downside with git is that the commands can be a bit bizarre.

git aliases

You can set aliases using git config --global.  E.g. git config --global alias.dt "difftool --no-prompt" makes git dt act the same as git difftool --no-prompt. These aliases are saved in ~/.gitconfig. My ~/.gitconfig looks like:

[user]
	name = Ben McCann
	email = ben@benmccann.com
[alias]
	cam = commit -am
	dt = difftool --no-prompt
	dtm = !meld .
	pending = !clear & git status
	rev = checkout --
	revall = reset --hard HEAD
[push]
	default = current

Reverting to a previous version

$ git reset --hard YOUR_CHANGESET_HERE
$ git reset --soft @{1}
$ git commit -a

Sed Cookbook

03/31/2011

The Linux sed command is a stream editor. What that means is basically that you can do a regex operation on each line of a file or a piped stream. You can also use perl like sed.

Sed does not use the extended regex syntax. Sed regex reminders:

  • You need a backslash before parens in a regex grouping
  • You refer to matched regex groups using \1, \2, etc.
  • The + regex operator does not work
  • Non-greedy quantifiers don’t work.  For example, .*? will not work
  • The output is printed to standard out by default.  You need the -i option if you want to edit a file with sed.

Remove all but the first column in a .tsv stream

sed 's/\([^\t]*\).*/\1/'

Edit a .tsv file by removing all but the first column

sed -i 's/\([^\t]*\).*/\1/'

Remove the first line of a stream

sed '1d'

Strip trailing whitespace from a file

sed -i -e 's/ *$//'

Recursively replace tabs with spaces

grep -Plr '\t' src/ | xargs sed -i 's/\t/  /g'

Replace @inheritDoc with @override after marking for edit

grep -l -r @inheritDoc java/com/benmccann | xargs p4 edit
grep -l -r @inheritDoc java/com/benmccann | xargs sed -i 's/\(.*\)@inheritDoc/\1@override/'

Replace @inheritDoc with @override in JS files after marking for edit

find java/com/benmccann -name '*.js' -print0 | xargs -0 grep -l @inheritDoc | xargs p4 edit
find java/com/benmccann -name '*.js' -print0 | xargs -0 grep -l @inheritDoc | xargs sed -i 's/\(.*\)@inheritDoc/\1@override/'

Using the Guice Struts 2 plugin

03/29/2011

Guice 3.0 was released a few days ago!  One of the easiest ways to use it in your web server is to use Struts 2 with the Struts 2 plugin, which is available in the central Maven repository.

This tutorial assumes familiarity with Guice and Struts 2.

In order to use it the plugin, your injector must be created with a Struts2GuicePluginModule:

Injector injector = Guice.createInjector(
    new com.google.inject.servlet.ServletModule(),
    new com.google.inject.struts2.Struts2GuicePluginModule(),
    new MyModule());

You must then define a GuiceServletContextListener to provide the injector to the Struts 2 plugin. I injected the Injector because I’m using embedded Jetty. However, if you’re using a standard servlet container, you’d probably just create the injector in the class itself.

package com.benmccann.example;

import com.google.inject.Inject;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceServletContextListener;

/**
 * @author benmccann.com
 */
public class GuiceListener extends GuiceServletContextListener {

  private final Injector injector;

  @Inject
  public GuiceListener(Injector injector) {
    this.injector = injector;
  }

  @Override
  public Injector getInjector() {
    return injector;
  }

}

You must then wire it up in your web.xml:

  <listener>
    <listener-class>com.benmccann.example.GuiceListener</listener-class>
  </listener>  

  <filter>
    <filter-name>guice</filter-name>
    <filter-class>com.google.inject.servlet.GuiceFilter</filter-class>
  </filter>

  <filter-mapping>
    <filter-name>guice</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

There’s also an example in the Guice source code repository.

Enjoy!

Latent Dirichlet Allocation with Mallet

03/10/2011

We recently had a PhD candidate from UCI come in and speak to the AI club at Google Irvine to speak about her research on Latent Dirichlet Allocation (LDA). LDA is a topic model and groups words into topics where each article is comprised of a mixture of topics. I was interested to play around with this a bit, so I downloaded Mallet and wrote up some quick code to try making my own LDA model.

package com.benmccann.topicmodel;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.TreeSet;

import cc.mallet.pipe.CharSequence2TokenSequence;
import cc.mallet.pipe.Pipe;
import cc.mallet.pipe.SerialPipes;
import cc.mallet.pipe.TokenSequence2FeatureSequence;
import cc.mallet.pipe.TokenSequenceLowercase;
import cc.mallet.pipe.TokenSequenceRemoveStopwords;
import cc.mallet.pipe.iterator.ArrayIterator;
import cc.mallet.topics.ParallelTopicModel;
import cc.mallet.types.Alphabet;
import cc.mallet.types.IDSorter;
import cc.mallet.types.InstanceList;

import com.google.inject.Guice;
import com.google.inject.Inject;
import com.google.inject.Injector;

public class Lda {

  @Inject private com.benmccann.topicmodel.TextProvider textProvider;

  InstanceList createInstanceList(List<String> texts) throws IOException {
    ArrayList<Pipe> pipes = new ArrayList<Pipe>();
    pipes.add(new CharSequence2TokenSequence());
    pipes.add(new TokenSequenceLowercase());
    pipes.add(new TokenSequenceRemoveStopwords());
    pipes.add(new TokenSequence2FeatureSequence());
    InstanceList instanceList = new InstanceList(new SerialPipes(pipes));
    instanceList.addThruPipe(new ArrayIterator(texts));
    return instanceList;
  }

  private ParallelTopicModel createNewModel() throws IOException {
    List<String> texts = textProvider.getTexts();
    InstanceList instanceList = createInstanceList(texts);
    int numTopics = instanceList.size() / 5;
    ParallelTopicModel model = new ParallelTopicModel(numTopics);
    model.addInstances(instanceList);
    model.estimate();
    return model;
  }

  ParallelTopicModel getOrCreateModel() throws Exception {
    return getOrCreateModel("model");
  }

  private ParallelTopicModel getOrCreateModel(String directoryPath)
      throws Exception {
    File directory = new File(directoryPath);
    if (!directory.exists()) {
      directory.mkdir();
    }
    File file = new File(directory, "mallet-lda.model");
    ParallelTopicModel model = null;
    if (!file.exists()) {
      model = createNewModel();
      model.write(file);
    } else {
      model = ParallelTopicModel.read(file);
    }
    return model;
  }

  public void printTopics() throws Exception {
    ParallelTopicModel model = getOrCreateModel();
    Alphabet alphabet = model.getAlphabet();
    for (TreeSet<IDSorter> set : model.getSortedWords()) {
      System.out.print("TOPIC: ");
      for (IDSorter s : set) {
        System.out.print(alphabet.lookupObject(s.getID()) + ", ");
      }
      System.out.println();
    }
  }

  public static void main(String[] args) throws Exception {
    Injector injector = Guice.createInjector();
    Lda lda = injector.getInstance(Lda.class);
    lda.printTopics();
  }

}

One of the things I found interesting was that you have to specify a number of topics. This is where the ‘art’ of machine learning comes in. With some training data this parameter could be tuned to perform better than my random guesses.

Remote Java debugging in Eclipse

03/08/2011

To debug a Java program being run on the command line from Eclipse you can start the Java program in remote debugging mode:

java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y -jar myProgram.jar

The program will wait for you to attach the Eclipse debugger to it. Open Eclipse and choose:

Run > Debug Configurations... > Remote Java Application > New

Make sure to enter the same port that you chose on the command line. The default is port 8000. Now hit “Debug” and you’re off!

Security Lockdown for Linux

02/11/2011

Automatic updates

If you’re using Ubuntu you can do this by editing /etc/apt/apt.conf.d/50unattended-upgrades. Running out of date packages with security holes is a good way to get your machine pwnd.

Remove unused software

Every piece of software installed on your system provides one more attack point for malicious users. You should inventory your system and remove anything you don’t need. E.g. to remove Ubuntu One from your system:

sudo apt-get purge ubuntuone*

Secure SSH

Edit /etc/ssh/sshd_config:

PermitRootLogin no
AllowUsers bmccann nx gitolite

You may also disable password authentication and replace it with public key authentication:

PasswordAuthentication no
PubkeyAuthentication yes

Restart the SSH daemon:

sudo service ssh restart

or

sudo /etc/init.d/ssh restart

This disallows login via password and instead replaces it with login via public/private key pair. To setup your public key encryption run ssh-keygen on the client and put ~/.ssh/id_rsa.pub from the client into ~/.ssh/authorized_keys on server.

Sometimes while messing around with SSH settings, you’ll lock yourself out. I this case it’s nice to use the -v option with the ssh client.

You can also setup shortcuts in ~/.ssh/config. E.g. the shortcut below turns ssh gitolite into an alias for ssh -l gitolite -p 77777 bensdynamicdns.getmyip.com.

Host gitolite
   User gitolite
   Hostname bensdynamicdns.getmyip.com
   Port 77777
   IdentityFile ~/.ssh/id_rsa

Secure NX

If you’d like to setup NX in a secure manner, you can follow these instructions.

Secure MySQL

Run mysql_secure_installation

Install fail2ban

  • Install fail2ban by running sudo apt-get install fail2ban, which will lockout users who repeatedly try to access your system by guessing passwords.
  • Make your own copy of the configuration file: sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
  • Check if fail2ban is running properly: sudo fail2ban-client status

More
Andrew Ault and CyberCiti wrote good articles as well.
The NSA has a comprehensive guide to securing a Linux system

Google GXP Struts 2 Plugin

02/02/2011
Google GXP is a replacement for JSP that provides compile-time type safety.  This article is a quick introduction on how to use GXP with Struts 2.
 
1. Download the jar.  It’s not in Maven yet because it’s still unreleased.
 
2. Install the jar in Maven or otherwise put it on your classpath.  You’ll also need the Google GXP jar and the Google Collections jar:
  <dependency>
    <groupId>com.google.gxp</groupId>
    <artifactId>gxp-plugin</artifactId>
    <version>2.2.2-SNAPSHOT</version>
    <scope>system</scope>
    <systemPath>${basedir}/lib/struts2-gxp-plugin-2.2.2-SNAPSHOT.jar</systemPath>
  </dependency>
  <dependency>
    <groupId>com.google.gxp</groupId>
    <artifactId>google-gxp</artifactId>
    <version>0.2.4-beta</version>
  </dependency>
  <dependency>
    <groupId>com.google.collections</groupId>
    <artifactId>google-collections</artifactId>
    <version>1.0</version>
  </dependency>

3. Call the GXP compiler. E.g.

java -cp lib/gxp-0.2.4-beta.jar com.google.gxp.compiler.cli.Gxpc --output_language java com/benmccann/example/web/gxp/*.gxp

4. Add a result type of gxp to your struts.xml:

  <package name="test" extends="gxp-default">
    <action name="TestAction" class="com.benmccann.example.web.action.TestAction">
      <result type="gxp">com/benmccann/example/web/gxp/Index.gxp</result>
    </action>
  </package>

Commons Math vs. ojAlgo

01/23/2011

There are numerous math libraries for Java.   This is frustrating as a user because it’s hard to decide which to use.   Sometimes an algorithm is implemented in one library, but not another, which means you must marshal your data between proprietary formats.  I was working on solving systems of linear equations and there was no good Java-only solution available, so I had to write my own.  I decided to contribute the SimplexSolver to Commons Math because it used a friendly license and because it already had significant mindshare.  After doing so, I was informed that ojAlgo has a LinearSolver as well.  Today I decided to test it out to see whether I’d wasted my time by writing my own implementation.  It turns out that the ojAlgo implementation is buggy as shown by the unit test below which I created.

package com.benmccann.test;

import static org.ojalgo.constant.BigMath.EIGHT;
import static org.ojalgo.constant.BigMath.FIVE;
import static org.ojalgo.constant.BigMath.FOUR;
import static org.ojalgo.constant.BigMath.ONE;
import static org.ojalgo.constant.BigMath.SEVEN;
import static org.ojalgo.constant.BigMath.SIX;
import static org.ojalgo.constant.BigMath.TEN;
import static org.ojalgo.constant.BigMath.TENTH;
import static org.ojalgo.constant.BigMath.THREE;
import static org.ojalgo.constant.BigMath.TWO;
import static org.ojalgo.constant.BigMath.ZERO;

import java.math.BigDecimal;
import java.util.List;

import org.junit.Assert;
import org.junit.Test;
import org.ojalgo.matrix.BasicMatrix;
import org.ojalgo.matrix.store.PhysicalStore;
import org.ojalgo.matrix.store.PrimitiveDenseStore;
import org.ojalgo.optimisation.OptimisationSolver;
import org.ojalgo.optimisation.Variable;
import org.ojalgo.optimisation.OptimisationSolver.Result;
import org.ojalgo.optimisation.linear.LinearExpressionsModel;

public class SolverTest {

  @Test
  public void testMath286() {

    Variable[] objective = new Variable[] {
        new Variable("X1").weight(TENTH.multiply(EIGHT)),
        new Variable("X2").weight(TENTH.multiply(TWO)),
        new Variable("X3").weight(TENTH.multiply(SEVEN)),
        new Variable("X4").weight(TENTH.multiply(THREE)),
        new Variable("X5").weight(TENTH.multiply(SIX)),
        new Variable("X6").weight(TENTH.multiply(FOUR))};

    LinearExpressionsModel model = new LinearExpressionsModel(objective);
    model.setMaximisation(true);

    model.addWeightExpression("C1",
            new BigDecimal[] { ONE, ZERO, ONE, ZERO, ONE, ZERO }
        ).level(new BigDecimal(23));
    model.addWeightExpression("C2",
            new BigDecimal[] { ZERO, ONE, ZERO, ONE, ZERO, ONE }
        ).level(new BigDecimal(23));
    model.addWeightExpression("C3",
            new BigDecimal[] { ONE, ZERO, ZERO, ZERO, ZERO, ZERO }
        ).lower(TEN);
    model.addWeightExpression("C4",
            new BigDecimal[] { ZERO, ZERO, ONE, ZERO, ZERO, ZERO }
        ).lower(EIGHT);
    model.addWeightExpression("C5",
            new BigDecimal[] { ZERO, ZERO, ZERO, ZERO, ONE, ZERO }
        ).lower(FIVE);

    Result result = model.getDefaultSolver().solve();
    List solution = result.getSolution()
        .getRows(new int[] { 0, 1, 2, 3, 4, 5 })
        .toPrimitiveStore().asList();

    // A valid solution of 25.8 can be produced with:
    //     X1=10, X2=0, X3=8, X4=0, X5=5, X6=23
    // However, ojAlgo returns 21.7
    Assert.assertEquals(25.8, solution.get(0) * .8 + solution.get(1) * .2
        + solution.get(2) * .7 + solution.get(3) * .3
        + solution.get(4) * .6 + solution.get(5) * .4, .1);
  }

}

After releasing the Common Math SimplexSolver, I received numerous bug reports, which has been the main benefit of open sourcing it.  The code is now more robust as a result.  The test above was just one of many cases that Commons Math initially had trouble with, so I don’t fault the ojAlgo developers for getting it wrong the first time around – I did too.  However, I’m glad I chose to contribute to a well-recognized project because it led to flushing out many of these problems.  A lesser known project such as ojAlgo doesn’t have that advantage.  I’m not sure why so many people are still writing their own smaller libraries instead of contributing to make the larger players better.  Hopefully ojAlgo will consider recommitting its efforts towards Commons Math or another of the larger projects at some point.  It would be good for the community to see some consolidation.  At one point, matrix-toolkits-java was talking about combining with Commons Math, which would be a great move towards consolidated APIs.

Newer Posts
Older Posts