Using Perl-compatible regular expressions in Emacs

I’ve been thinking about Emacs’s regular expressions lately, and how I rarely use them. I tend to reach for keyboard macros first, and only occasionally use query-replace-regexp when I need to do a complex search-and-replace. I use M-x rgrep almost every day, but I rarely pass it regular expressions. I think I would probably use regular expressions more, but I’m much more familiar with the so-called Perl-compatible regular expressions such as those supported by PCRE, and I think the unfamiliar Emacs and POSIX style is a subtle reason why I’ve been avoiding them. So, I decided as an experiment to start teaching Emacs to use grep with the -P option, which is a years-old “experimental” feature available in all the greps I use that enables Perl-style regular expressions. I started with the following settings:

(setq grep-command "grep -nHP -e "
      grep-find-template "find . <X> -type f <F> -exec grep <C> -nHP -e <R> {} /dev/null \\;")

These are the same as the defaults for me, except with the added ‘P’ flag. With these two settings, M-x grep and M-x rgrep will use Perl-compatible regular expressions. It’s particularly helpful with M-x rgrep, since the regular expression is automatically shell-escaped, so it provides a way to avoid backslash-itis due to the combination of this shell escaping and the more humane (in my opinion) escaping rules that PCRE uses.

I wanted a replacement for M-x occur that would use PCRE as well, so I wrapped M-x grep and taught it to use occur’s conventions, including the optional numeric argument that causes it to include some extra lines of context in the output.

(defun grep-occur (regexp &optional nlines)
  (interactive (occur-read-primary-args))
  (save-some-buffers)
  (grep (concat grep-command
                (shell-quote-argument regexp)
                (if nlines
                    (if (> nlines 0)
                        (format " -C %d " nlines)
                      (format " -B %d " (abs nlines)))
                  " ")
                (buffer-file-name))))

Since this uses the same grep-command variable, it supports PCRE as well, and I can now show all matches for a PCRE in the current buffer (as long as it is saved to a file first), and jump between them with next-error and previous-error (C-x ` or M-g n / M-g p by default). I added a (save-some-buffers) call to give the option of saving unsaved buffers before running the grep command, similar to how M-x compile works.

I’m just starting to incorporate this into my workflow, but I think it’s pretty neat. I’d love to know of any other places where I can integrate PCRE effectively, if you can think of any.

Teaching Magit new tricks

If you’re using Magit 1.0 or newer to work with Git from Emacs, you have probably noticed that when you press keys it now shows a little menu where you can enable flags and pass options. It’s easy to add more. Here’s how I teach Magit about “push -u”, which I use to push local branches to a remote repository and configure them to track the remote branch automatically:

(eval-after-load 'magit
  '(magit-key-mode-insert-switch 'pushing "-u" "Set upstream" "-u"))

You can find all the groups of switches and arguments by typing C-h v magit-key-mode-groups

How to quack like a QuerySet

The main database I work on has a bit of a quirk where there are two sets of tables for something that is conceptually one entity. For the sake of this example, let’s call them cogs and sprockets. So, I have a “cog” table, a “sprocket” table, and then a bunch of related tables like “cog_tooth”, “sprocket_tooth”, and so on. As it turns out, the split between cogs and sprockets wasn’t really necessary, since they’re really both just gears, but enough infrastructure is in place to deal with cogs and sprockets as separate groups of entities that it’s too much work to change that now.

The basic problem, which I’ve been trying to solve for months now and have only just recently figured out, is how to just show a list of gears, with cogs and sprockets mixed together, while still allowing searching, sorting, and pagination. In other words, I don’t want one page for browsing cogs and another page for browsing sprockets; I just want a gear list. I want to bury the detail of whether a gear is a cog or a sprocket and let the user page through them all sorted together. And I want to do this, somehow, with the Django ORM.

What didn’t work

Before I realized I needed to quack like a QuerySet, I tried many other things. Each was a near-solution, but wasn’t practical for one reason or another.

Model inheritance

The obvious solution was to create a Gear model, and let Cog and Sprocket inherit from that model. Then, I could paginate through Gears and follow an association to a Cog or Sprocket if I needed further information. The problem with this approach is that it introduces a new table for gears. Outside of Django, there is a lot of existing infrastructure for importing, indexing, and managing cogs and sprockets. All of that code would have to be updated to create and update the gear table whenever working with cogs and sprockets. A lot of fields would need to be moved to the gear table to prevent having to do lots of joins during pagination. This would be far too much work just to accommodate Django’s particular inheritance technique, which is otherwise unnecessary for these systems that would need to change.

Database views

If I were to solve this problem from a strictly SQL perspective, I would use a UNION query, like:

SELECT name, num_teeth, 'cog' AS type
FROM   cog
UNION
SELECT name, num_teeth, 'sprocket' AS type
FROM   sprocket

So, it seemed like I could wrap that in a database view and create a Gear model with managed=False using that view as its “db_table”. I tried this, and it actually worked, but it was dreadfully slow. Perhaps this means I shouldn’t be using MySQL; I imagine Postgres’s query optimizer is sufficiently intelligent to run this type of query faster, but these are the cards I’ve been dealt. Unfortunately, performance is a show-stopper with this approach.

RawQuerySet

At this point, I thought maybe it would be best to drop to raw SQL so that I could optimize the queries by hand. I started studying the Manager.raw() method and RawQuerySet class, and built a custom QuerySet that ran a UNION query with a subselect and dynamically generated LIMITs to allow for pagination. This actually worked, and it performed just fine, but it had several issues. One was that I was generating SQL from strings in order to make pagination work, and this was dirty and error-prone. The larger issue is that I had no obvious way to implement .filter() for these RawQuerySets. Faking .order_by() is easy enough, but .filter() is very complex and pulls in a ton of Django internals. It would have taken forever to reinvent this wheel, and I really didn’t want to roll with a half-assed .filter() implementation, especially since my main use case was to be able to provide Django Admin ChangeList-style list views with lots of filtering options.

Looking at RawQuerySet was very inspiring, however. It made me realize that maybe what I wanted wasn’t so much a Gear model at all, but rather a Gear QuerySet – that Django Models were a bit too low-level for what I was trying to accomplish, and that QuerySets were a much better abstraction to code against. Django’s generic list view doesn’t require a model, for example; you can feed it any QuerySet, or really, any object that supports .count() and slicing. This led me to the big a-ha! moment:

Quack like a QuerySet

I mean “quack” in the “duck-typing” sense: create a class that behaves like a QuerySet, but don’t bother inheriting from the QuerySet class. It doesn’t have to provide the full suite of QuerySet features (which is huge), but just enough to perform the operations that the views require. This was the winning solution, and everything fell into place once I made this realization. Here’s what I ended up with:

from django.db.models import Q
from django.db.models.query import REPR_OUTPUT_SIZE
 
from myproject.myapp.models import Cog, Sprocket
 
ITER_HARD_LIMIT = 10000
 
class GearQuerySet(object):
    def __init__(self):
        self.ordering = ('description',)
        self.cog_query = Cog.objects.order_by(*self.ordering)
        self.sprocket_query = Sprocket.objects.order_by(*self.ordering)
 
    def __iter__(self):
        for row in self[:ITER_HARD_LIMIT]:
            yield row
 
    def __repr__(self):
        data = list(self[:REPR_OUTPUT_SIZE + 1])
        if len(data) > REPR_OUTPUT_SIZE:
            data[-1] = "...(remaining elements truncated)..."
        return repr(data)
 
    def __getitem__(self, k):
        if not isinstance(k, (slice, int, long)):
            raise TypeError
        assert ((not isinstance(k, slice) and (k >= 0))
                or (isinstance(k, slice) and (k.start is None or k.start >= 0)
                    and (k.stop is None or k.stop >= 0))), \
                "Negative indexing is not supported."
 
        if isinstance(k, slice):
            ordering = tuple(field.lstrip('-') for field in self.ordering)
            reverse = (ordering != self.ordering)
            if reverse:
                assert (sum(1 for field in self.ordering
                            if field.startswith('-')) == len(ordering)), \
                        "Mixed sort directions not supported."
 
            cq = self.cog_query
            sq = self.sprocket_query
 
            if k.stop is not None:
                cq = cq[:k.stop]
                sq = sq[:k.stop]
 
            rows = ([row + (Cog,)
                     for row in cq.values_list(*(ordering + ('pk',)))] +
                    [row + (Sprocket,)
                     for row in sq.values_list(*(ordering + ('pk',)))])
 
            rows.sort()
            if reverse:
                rows.reverse()
            rows = rows[k]
 
            pk_idx = len(ordering)
            klass_idx = pk_idx + 1
            cog_pks = [row[pk_idx] for row in rows
                            if row[klass_idx] is Cog]
            sprocket_pks = [row[pk_idx] for row in rows
                           if row[klass_idx] is Sprocket]
            cogs = Cog.objects.in_bulk(cog_pks)
            sprockets = Sprocket.objects.in_bulk(sprocket_pks)
 
            results = []
            for row in rows:
                pk = row[-2]
                klass = row[-1]
                if klass is Cog:
                    cogs[pk].type = 'cog'
                    results.append(cogs[pk])
                elif klass is Sprocket:
                    sprockets[pk].type = 'sprocket'
                    results.append(sprockets[pk])
            return results
        else:
            return self[k:k+1][0]
 
    def count(self):
        return self.cog_query.count() + self.sprocket_query.count()
 
    def all(self):
        return self._clone()
 
    def filter(self, *args, **kwargs):
        qs = self._clone()
        qs.cog_query = qs.cog_query.filter(*args, **kwargs)
        qs.sprocket_query = qs.sprocket_query.filter(*args, **kwargs)
        return qs
 
    def exclude(self, *args, **kwargs):
        qs = self._clone()
        qs.cog_query = qs.cog_query.exclude(*args, **kwargs)
        qs.sprocket_query = qs.sprocket_query.exclude(*args, **kwargs)
        return qs
 
    def order_by(self, *ordering):
        qs = self._clone()
        qs.cog_query = qs.cog_query.order_by(*ordering)
        qs.sprocket_query = qs.sprocket_query.order_by(*ordering)
        qs.ordering = ordering
        return qs
 
    def _clone(self):
        qs = GearQuerySet()
        qs.cog_query = self.cog_query._clone()
        qs.sprocket_query = self.sprocket_query._clone()
        qs.ordering = self.ordering
        return qs

The above QuerySet-like class implements the parts of the QuerySet interface that Django’s generic list view depends on: .count() and slicing with .__getitem__(). It also provides a ._clone() method which, despite the private-looking name, is generally expected to exist for a QuerySet. As usual, .filter(), .exclude(), and .order_by() call ._clone() to make a copy of the QuerySet before making modifications so that immutability is preserved.

The .__iter__() method is defined in terms of .__getitem__() and uses a constant, ITER_HARD_LIMIT, to define the maximum number of results that will be returned if the QuerySet is used as an iterator. This is to prevent accidentally loading the entire table into memory with a for-loop, since my .__getitem__() returns a list for slices, not an iterator. It is rather difficult to support lazy iteration through multiple tables in parallel while keeping them sorted together, so I elected not to solve this particular problem. Django’s RawQuerySet, on the other hand, defines an iterator directly and builds slicing in terms of that, so the right way to structure these methods really depends on your use case.

Quack like a Manager

I could stop right here and create instances of this custom QuerySet class directly, but I decided to carry the façade further and create duck-typed Manager and Model classes as well:

class GearManager(object):
    def count(self):
        return self.get_query_set().count()
 
    def all(self):
        return self.get_query_set()
 
    def filter(self, *args, **kwargs):
        return self.get_query_set().filter(*args, **kwargs)
 
    def exclude(self, *args, **kwargs):
        return self.get_query_set().exclude(*args, **kwargs)
 
    def order_by(self, *args, **kwargs):
        return self.get_query_set().order_by(*args, **kwargs)
 
    def get_query_set(self):
        return GearQuerySet()
 
class Gear(object):
    objects = GearManager()

With the above code, I can write expressions like Gear.objects.filter(…), Gear.objects.all().order_by(…), and so on, maintaining consistency with the way model objects are usually queried. This seems a bit silly at first glance, but in the future I will be adding methods to create new instances, so it’s good to have a place to put them (GearManager).

More curious is the fake model class (Gear), since it is currently just acting as a namespace for the “objects” attribute. I wonder if it would be worthwhile to make this class actually do something, perhaps making it a superclass of the Cog and Sprocket model classes (though I dislike the idea of adding a circular dependency between the modules these classes reside in). The idea of using Python’s new Abstract Base Class support might be worthwhile exploring, since it provides an advisory way to make isinstance() return True.

Has anyone else tried to do this sort of thing before? What do you think of this solution?

Edit: I updated the GearQuerySet example to include the algorithm for searching for cogs and sprockets together.

Trying Out PyCharm, Part 2

In which I actually use the editor to write code

Previously, I attempted to write up my experiences with the new beta of the PyCharm editor for Python, from JetBrains, makers of IntelliJ IDEA. By the time I got a basic installation working and started a Django project, I ran out of time. This time around, I actually wrote a little bit of code using PyCharm, so I have a few more things to say.

Things I like

It defaults to spaces, not tabs. This is correct.

Ctrl-space does menu completion, and this works very well. If you’re used to Intellisense, this does what you expect. Any IDE worth its salt should be able to accomplish this basic task.

Overriding methods get a little icon in the left gutter that takes you to the source code for the methods they override, and similarly, overridden methods allow you to navigate up and down the hierarchy. This is neat, and not something I’ve seen any Python editor do before.

The Structure panel is great. It updates as you code, and uses icons to distinguish between variables, functions, and methods.

You can duplicate a line with Ctrl+D, comment a line with Ctrl+/, and indent/dedent a line with Tab/Shift-Tab. If there is an active selection, these commands work on the selection instead. These are super-handy and very intuitive features.

Entire blocks can be easily rearranged by the Move Statement Up / Move Statement Down (Ctrl+Shift+Up/Down) commands. This seems like it would be really handy for code reorganization.

Ctrl+B will jump to the definition of the symbol under the cursor, and if it’s ambiguous it pops up a menu where you can pick which definition you mean. Likewise, Ctrl+N lets you type a class name to jump to, searches as you type, and lets you refine your choice with a pop-up menu. I can see this being a huge time saver.

The basic refactorings are all there: Rename, Change Signature, Move, Copy, Safe Delete, Inline, Pull Up, Push Down, Extract Interface, Extract Superclass. I’m not sure what Extract Interface does, since Python doesn’t really have interfaces (third-party implementations aside), but Extract Superclass has a nice interface where you can check which methods you want to move into the new base class. If the current class already extends a class, it will use multiple inheritance. This isn’t Java!

Out of the box, PyCharm supports CVS, Subversion, Git, and Mercurial. Not too shabby!

Editor windows can be split horizontally and vertically, and each split pane can have its own tabs. The tabs don’t currently support drag’n’drop, which is a bummer, but I bet they’ll have that working soon, since everyone expects that to work with web browsers nowadays.

Things I’m on the fence about

The real-time syntax checking is slightly distracting, just as I recall it being with IDEA. As I type, various things get underlined with red squigglies to notify me that my syntax is invalid. It’s invalid, usually, because I haven’t finished typing yet.

Code folding works great, if you’re into that sort of thing. I seem to be the odd one out – I can’t stand it. It just creates a bunch of noise, and I feel like if your program is so big and deeply nested that you need to collapse sections of it, you should fix your program. But anyway, PyCharm does folding just fine.

Things I don’t like

Introspection doesn’t always work too well. When I tried to create a models.CharField for a model, it popped up with the hint that CharField takes the arguments (self, *args, **kwargs). I’m sure there’s some fancy footwork going on in Django that makes this difficult, but it would be really helpful to have argument hints about basic stuff like this if PyCharm is to claim any sort of Django support.

One of the more amazing refactorings that IDEA can do, which I think is called Replace Inheritance With Delegation, is not available in PyCharm. I think this is a great tool, and hopefully it will get integrated with PyCharm someday. Python’s support for multiple inheritance makes it less necessary, perhaps (since the big benefit with Java is to be able to pick a different base class without breaking anything, since Java only allows single-inheritance).

By default, you can click anywhere and it will put the cursor at that point, even if it’s 50 characters to the right of the text. This really bothers me, but fortunately it’s easy to disable: File->Settings->IDE Settings->Editor->Virtual Space->Allow placement of caret after end of line.

PyCharm follows the convention that many Windows-style programs do these days, where Ctrl-Left/Right moves between words but Alt-Up/Down moves between paragraphs. I don’t know who came up with this, but I really prefer Ctrl-Up/Down for symmetry, and so that I can keep Ctrl held down and easily move around. Ctrl-Up/Down works like a mouse wheel, which is a nice feature but I wish it had a different shortcut. Alt-Left/Right switches between tabs, which is handy but also weird to me. But I’m really nitpicking here.

Indexing takes a long time. After opening the editor, you’ll want to grab some coffee or what-have-you.

Thumbs Up

This is really just a superficial evaluation of PyCharm, since I would have to use it for much longer to really critique it properly. It appears to be well-implemented, full-featured, and artfully designed. I think I could be happy using PyCharm. I don’t know if it’s compelling enough to break my Emacs addiction, but it’s certainly a worthy contender. I recommend giving it a try!

Trying out PyCharm, Part 1

One of the things in our goodie bags at DjangoCon 2010 was a quick reference card for PyCharm, a new (still beta) IDE for Python designed by JetBrains, the makers of IntelliJ IDEA. I’m not a big fan of IDEs in general, but I did use IDEA for a little over a year when I was writing a lot of Java, and I was very impressed with the quality of that product. So, when I found out about PyCharm, I figured I should at least try it out and see how well it works. I’m pretty much an Emacs die-hard at this point, so I don’t expect to be wooed, but I’m trying to keep an open mind about it anyway. Here is a summary of my first experiences with it.

When I first tried to run PyCharm, I got an error about how no JDK was installed, which made me remember that I had completely purged Java from my system. I do this about every six months to save disk space, I think.

I was unsure about whether I actually needed a JDK or just a JRE, so I decided to try just installing the JRE. PyCharm still complains about wanting environment variables to be set, but I just symlinked /usr/bin/java to /bin/java, and it starts up fine despite the error message.

The initial interface is colorful and welcoming. I couldn’t help but notice that the fonts and toolbar buttons all look very Java. I guess this is to be expected, but it feels a bit foreign. It doesn’t look bad, however; it looks professional. It’s just a tiny reminder that this is a Java program.

I read something about Django integration on the PyCharm website, so I figured a good test would be to try creating a new Django hello-world project. The first option under “Quick Start” is “Create NewProject”, so that’s where I went. It popped up a simple dialog asking for a project name and location, and I could choose between a “Empty project”, “Django project”, and “Google App Engine” project. I picked “Django project” and a new project was created – for Python 3.1. Whaaaaaaaat?

I then saw a second popup titled “Django Project Settings”. The only available choice for “Python Interpreter” was the “Python 3.1.2″ it picked, This definitely was not going to work, so I hit the “…” button, where I could add and remove Python Interpreters. It was able to locate the Python 2.6 I had installed, and after a lengthy scan of libraries for that version, my project settings were ready to go.

Also on the “Django Project Settings” panel was an “Application name” field, which was a little bit confusing because projects normally consist of multiple applications. I went ahead and entered an application name anyway, even though it appeared to be optional. There was also a templates directory, which was already filled in and located at the project level.

Once I confirmed the project settings, a progress bar dialog titled “Loading Project” popped up, with messages reading “Generating skeletons of binary libs for interpreter /usr/bin/python2.6″. I thought it already scanned all the libraries once, but I guess it wasn’t building “skeletons” that time. I assume this is for metadata to enable completion in the editor. I think their Java bias is showing in the way they repeatedly refer to Python installations as “interpreters”. Do they not know that Python runs on a bytecode compiler too?

The scan finished after a few minutes, and I was dropped into the main editor window. A progress bar at the bottom of the window indicated that there was still more indexing to be done, with a tooltip reading “Updating project indices… Refactorings, usage search, and some other featuers will become available after indexing is complete.” A red icon with an esclamation mark blinked sporadically, and mousing over this icon caused a message to briefly display about an internal error that should be reported to their beta program. Once the indexing stopped, the red icon went away, so I guess there isn’t much I can do here.

Oh wait, it’s back. It says something about an internal error: a null pointer from com.intellij.ide.projectView.BaseProjectTreeBuilder.expandNodeChildren. I was wondering how long it would take before I saw a Java stack trace.

There were initially two frames to the main window, a project panel and a gray area where the editor goes. The project panel appeared to be empty, which may be a result of the exception that occurred earlier. When I clicked “View as” in the project sidebar, I could change the view from “Project” to… well, just “Project”. So I did that, and it reloaded and this time I could see all the files in the directory tree it created.

The initial project looked more or less like the results of a standard Django “startproject” and “startapp”.

I clicked the green “Run” button in the toolbar to see what would happen. It opened a Run panel at the bottom and executed “/usr/bin/python2.6 manage.py runserver 8000″, which succeeded in getting a Django development server instance up and running. I popped open a web browser and went to localhost:8000, hoping to see Django’s welcoming first page, but instead I got an error about how “/” could not be found, since the only defined url pattern is “^admin/”. I guess this is to be expected since I asked for an admin site, though it was kind of a bummer to not get the usual pretty blue “it works!” page.

I next tried “/admin/” and got an ugly Python stack trace with the message “ImproperlyConfigured: You haven’t set the database ENGINE setting yet.” I guess this is my cue to set up a database of some sort. Unfortunately, I ran out of time to play with PyCharm, so I didn’t get a chance to kick the tires of the editor itself. I’ll try to write a “Part 2″ about the editor experience soon.

Database model objectives

I was flipping through Chris Date’s SQL And Relational Theory and came across this little gem, which paraphrases “Codd’s own stated objectives in introducing his relational model.” I think this bears repeating today because I have felt for awhile that the NoSQL movement has a significantly different set of goals–which is fine–but seems to be ignoring some of the things that make the relational model nice to work with. I wonder if it is necessarily either-or, or if perhaps some of these NoSQL systems can work toward satisfying more of the needs that relational database systems satisfy, without sacrificing the speed and ease of distribution that has made the NoSQL concept popular.

Here are the stative objectives, quoting Date:

  1. To provide a high degree of data independence
  2. To provide a community view of the data of spartan simplicity, so that a wide variety of users in an enterprise, ranging from the most computer naive to the most computer sophisticated, can interact with a common model (while not prohibiting superimposed user views for specialized purposes)
  3. To simplify the potentially formidable job of the database administrator
  4. To introduce a theoretical foundation, albeit modest, into database management (a field sadly lacking in solid principles and guidelines)
  5. To merge the fact retrieval and file management fields in preparation for the addition at a later time of inferential services in the commercial world
  6. To lift database application programming to a new level–a level in which sets (and more specifically relations) are treated as operands instead of being processed element by element

I want to ponder on these objectives for a bit before drawing too many conclusions, but a few things seem starkly obvious.

The need to build indexes by hand in NoSQL systems in order to search (efficiently or not) by different criteria is a step away from the relational model’s goals of data independence because these indexes are likely to be built with a particular application in mind, sometimes (often?) to the disadvantage of other applications requiring a different view of the data.

To further that point, if the indexes designed into the database are insufficient, it will probably be the case that applications will have to drop back to the level of processing one record at a time, rather than working with data sets as units, unless all application developers have enough control over the database system to be able to make the needed changes.

The job of the database administrator is no doubt at a disadvantage today with NoSQL systems, though this is more of a tools issue than a fundamental design issue. The “how do I query the database?” comic sums up the current situation amusingly.

A theoretical foundation of NoSQL systems is hard to find. Most of the theory seems to be in regard to eventual consistency and other issues related more to distributed systems than data modeling in the abstract. This will surely come with time, though as soon as you get into the details of data modeling in NoSQL systems, you really have to specify which one, as they are more different than they are similar. A theory of data management with key-value stores seems, to me, unenlightening at first glance.

Whatever the base model is, if NoSQL databases are here to stay, I think we are going to see a need for some theoretical foundations to manage the growing complexity of our data models given the new strengths and limitations of NoSQL systems.

Ruby’s surprising handling of local variables

I was trying to read some Rails code today and came across the definition of the link_to view helper, which looks like this:

def link_to(*args, &block)
  if block_given?
    options      = args.first || {}
    html_options = args.second
    concat(link_to(capture(&block), options, html_options))
  else
    name         = args.first
    options      = args.second || {}
    html_options = args.third
 
    url = url_for(options)
 
    if html_options
      html_options = html_options.stringify_keys
      href = html_options['href']
      convert_options_to_javascript!(html_options, url)
      tag_options = tag_options(html_options)
    else
      tag_options = nil
    end
 
    href_attr = "href=\"#{url}\"" unless href
    "<a #{href_attr}#{tag_options}>#{name || url}</a>"
  end
end

At first glance, I thought for sure I had found a bug! The variable, href, is only initialized if html_options is specified. It seems like the “href_attr = … unless href” line would blow up otherwise, since it’s testing a variable that may not have been set. Or, so I thought. It turns out that my understanding of Ruby’s local variable semantics was wrong, as demonstrated by this simple test:

irb(main):001:0> x
NameError: undefined local variable or method 'x' for main:Object
        from (irb):1
        from :0
irb(main):002:0> if false
irb(main):003:1>   x = 0
irb(main):004:1> end
=> nil
irb(main):005:0> x
=> nil

It seems that assigning to an uninitialized variable in code that does not execute is sufficient to create that variable and assign it a default value of nil. This is in contrast to Python, which doesn’t define new variables unless the code that sets them actually executes:

>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> if False:
...   x = 0
...
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Neither does Javascript:

js> x
typein:1: ReferenceError: x is not defined
js> if (false) x = 0;
js> x
typein:3: ReferenceError: x is not defined

Is it just me, or is Ruby’s behavior completely bizarre here?

Taming the Unicode terminal

For the past six months or so, I’ve been on a quest for the perfect graphical Linux terminal, and it’s led me in some odd directions. The thing is, I already found the perfect terminal (perfect for me, that is): mrxvt. I’ve been using mrxvt for about five years now, and it’s been my favorite for several reasons:

  • It’s insanely fast
  • It’s easy to customize
  • It has tabs, and you can put them at the bottom

About my only gripe has been the lack of Zmodem support. Call me crazy, but I still like to use Zmodem from time to time. The only terminals that still do Zmodem are SecureCRT and old versions of Konsole; since KDE 4’s Konsole regressed, this feature no longer exists.

I had little reason to switch to another terminal until the day I realized that UTF-8 is here to stay, and it’s something a terminal needs to do. I fought this for a long time, but ultimately I got tired of setting LANG=C in all my profiles and jumping through hoops to turn off what is essentially progress. The problem is that mrxvt does not support Unicode, and it probably never will.

What are my other options? I tried many:

  • xterm and rxvt have unicode options, but no tabs.
  • xfce4-terminal is actually quite close to what I want, but it won’t let me put the tabs at the bottom.
  • gnome-terminal is slow, feature-poor, hard to configure, and also won’t let me put the tabs at the bottom.
  • urxvt looks promising, but the tab support is text, not graphical, and needs some UI work. (I hear there’s a GTK version, but it doesn’t work very well and lacks the keyboard support I depend on.)
  • Konsole is probably the best terminal available for Linux, but it’s slow, clunky, and I can’t customize the keyboard shortcuts the way I want. I like how it remembers what sessions I had open last, but I hate not being able to assign hotkeys – with mrxvt, I type Ctrl-Shift-Enter, type a host name, and get an instant ssh connection with the host name as the tab title. I can’t seem to get this effect with Konsole.

I’ve tried others, but I can’t remember them anymore. I tried everything available through Debian and a few others, and wasn’t happy with any of them. I even tried making my own terminal using Python and libvte, and at one point considered that a viable option, but ultimately gave up because I saw how much work I had to do to get decent terminal emulation (I use a lot of keyboard shortcuts in emacs), and besides, VTE just feels kind of slow (which is why gnome-terminal also feels slow).

After flirting with Konsole for a few months (hey, at least it motivated me to try out KDE 4, which is awesome), I got tired of the slowness and lack of keyboard customizations and I switched back to mrxvt. I started looking for other ways to get Unicode to work, and somehow I stumbled upon GNU Screen, which I had been ignoring up to that point, since mrxvt’s tabs left me little reason to add another layer of virtual screen support.

However, GNU Screen had, this whole time, been a workable solution, and I was just completely unaware of what it could do. I don’t think many people know that this is even possible, since it’s hard to find any conversations on the web about it, but screen can emulate UTF-8 on a latin-1 terminal, and I find this completely amazing. Here’s my simple .screenrc configuration:

defbce on
defutf8 on
escape ^]^]
markkeys "h=^B:l=^F:$=^E"
setenv LANG en_US.UTF-8
startup_message off
term $TERM
termcapinfo xterm|xterms|xs|rxvt ti@:te@
zmodem catch

Here’s what it does:

  • “defutf8 on” turns on UTF-8 support, including translation to the parent terminal’s encoding!
  • “defbce on” turns on “background color erase”, which fixes a problem where status-line background colors don’t go all the way across the screen in various full-screen console programs
  • “escape ^]^]” moves the escape key from the default of Ctrl-A to Ctrl-], which is the best compromise I could come up with since my brain is completely wired to use Ctrl-A to go to the beginning of the line (emacs, zsh, etc.)
  • The “markkeys” line I don’t really use much anymore, now that I got mrxvt’s scrollback buffer working again, but it lets me use the other emacs keys to navigate screen’s scroll/copy buffer
  • “setenv LANG en_US.UTF-8″ sets the LANG environment variable to indicate that I want UTF-8 encoding. This is essential because, from mrxvt, I have LANG=C so that screen knows my terminal doesn’t support UTF-8.
  • “startup_message off” skips the startup screen, since I use screen for every single tab, and this would be totally redundant.
  • “term $TERM” uses the TERM setting from the parent terminal rather than letting screen overwrite it as TERM=screen or TERM=screen-bce. I actually use “xterm” as my TERM type, even though I use an rxvt-derived terminal, because this has given me the best compatibility with the various console programs I use. I’ve spent many hours fixing keyboard escape codes so that mrxvt is xtermy enough.
  • The “termcapinfo …” line tells screen not to use the “alternate screen” for anything, which is necessary for mrxvt’s scrollback buffer and corresponding keyboard and mouse support to work.
  • Last but not least, “zmodem catch” enables–holy shit–Zmodem support! I had pretty much already given up on Zmodem, after having bad experiences with zssh, but screen’s Zmodem support really works! Supposedly it’s experimental and crashes sometimes, but so far I haven’t had any major problems with it.

That’s it for my screen setup so far. I made the following change to my .mrxvtrc to use screen for every tab and make sure that LANG=C before screen runs (which screen will later set to en_US.UTF-8):

Mrxvt.command: \!LANG=C exec screen
Mrxvt.macro.Ctrl+Shift+Return: NewTab "ssh" \!read -p"Host: " host; echo -ne "\e]0;$host\a"; LANG=C exec screen ssh $host

The first line is the only one necessary to start using screen for all new tabs. The second one is my handy shortcut for opening up ssh sessions quickly. It really helps when I need to log onto a bunch of boxes at once.

I got all of this working and ran with it for a couple of weeks. Everything was great except for one minor issue, which wasn’t really a big deal, but I came up with a fix for that, too. The issue was that certain Unicode characters were showing up as ‘?’. After some investigation I learned that these characters did not have corresponding characters in latin-1. They included bullets, various dashes/hyphens, and curly single- and double-quotes. I don’t really care if my quotes are curly, but it is distracting when they show up as question marks, so I started digging through the screen manual trying to find a solution. I couldn’t find any way to customize the replacement character it chooses, so I finally downloaded the screen source and started hacking on it. I wrote a little patch that does the replacements I want, and figured I might as well try doing a fork on github so I can keep up with future changes to screen. Here is the commit.

As it turns out, the latest version from github, which I think is a mirror of the CVS trunk, once again kills mrxvt’s scrollback buffer support, and after messing around with it for awhile I gave up and went back to the latest screen source package from Debian unstable, applying my patch to encoding.c and running “debian/rules binary” to build a custom .deb package. That is what I am using now, and it’s working just fine!

All this, and I still haven’t really learned how to use screen for its intended purpose. I still don’t really need the virtual screens, but I’m intrigued by the bind, bindkey, and stuff commands, which seem to be a general-purpose keyboard macro tool. I’m looking forward to learning more about what screen can do.

Redesigning my blog, for reals this time

Okay, so this is about the 4th theme I’ve picked for my blog, though I doubt anyone reads it anyway. In any case, this time I finally realized that I’ll never be happy with the coding style of another WordPress theme developer, so I started from scratch. Plus, I really wanted to try using 960.gs, and converting an existing theme from one framework (or no framework) to another is pretty hard and time-consuming. So here’s my brand new theme, with not many features or anything. Hope you like it. =)

Synchronet runs on Linux

I just found this out. I followed the directions and got Synchronet BBS running on my Debian box. It runs great under daemontools, and out of the box it sets up telnet, ssh, ftp, finger (yes, finger), mail, and irc servers. It has multiple sets of menu keys including one that emulates Renegade, which is more nostalgic for me than emacs bindings.

Now, the question is, what would I do with it? If I made a BBS, would you log into it? If so, would you log in more than 20 times? What if it had ANSI art? What if it was UTF-8?!

I doubt it, right? Internet BBSes are weird.

I miss BBSes a lot but I never stay on an internet one for more than a couple of weeks. But I have to say, Synchronet is a really nice BBS, and I’d totally use it if I still knew how to do ANSI art and more than 1 people would call it. Maybe I can gate it into Twitter somehow.