Programming

Twitget Improvement Addendum

0

Awhile back, I posted a modification to the Twitget Twitter widget I’m now using to display my tweets over there on the side bar. I’ve now made some further improvements since my original changes made an erroneous assumption about processing the tweet information.

First, hashtag links were losing the leading space when being displayed in the sidebar. The fix here was trivial, as it simply requires adding a space to the to preg_replace function calls in the process_links function that deal with generating the hashtag links.

The second fix is slightly more significant. Basically, if there are no URL entities in the tweet metadata, then the code needs to find link text within the tweet and turn it into a link. Here’s the new batch of code:

function process_links($text, $new, $urls) {
        if($new) {
                $linkmarkup = '<a rel="nofollow" target="_blank" href="';
                $text = preg_replace('/@(\w+)/', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);
                $text = preg_replace('/\s#(\w+)/', ' <a href="http://twitter.com/search?q=%23$1&src=hash" target="_blank">#$1</a>', $text);
        }
        else {
                $linkmarkup = '<a rel="nofollow" href="';
                $text = preg_replace('/@(\w+)/', '<a href="http://twitter.com/$1">@$1</a>', $text);
                $text = preg_replace('/\s#(\w+)/', ' <a href="http://twitter.com/search?q=%23$1&src=hash">#$1</a>', $text);
        }

        if (!empty($urls))
                foreach($urls as $url) {  
                        $find = $url['url'];
                        $replace = $linkmarkup.$find.'">'.$url['expanded_url'].'</a>';
                        $text = str_replace($find, $replace, $text);
                }
        else {
            if ($new) {
                $text = preg_replace('@(https?://([-\w\.]+)+(d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1" target="_blank">$1</a>',  $text);
            }
            else {
                $text = preg_replace('@(https?://([-\w\.]+)+(d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>',  $text);
            }
        }

        return $text;
}

The framework here is pretty much identical as before. The main addition is the else clause in the if(!empty($urls)). The code after that is actually the previous link code- regexes like that are too persnickety to reinvent.

So this will suffice until the next problems surfaces.

Minor Twitget Improvement

0

I noticed today that the Twitter feed over there was not displaying my tweets properly. Specifically, any links are displayed using the t.co URL structure which Twitter uses. I’d fixed this once before for the old feed, I figured it was worth investigating to see if I could fix it in the new one.

As it happens, the modification is pretty trivial, with only a few lines of code added in 1 source file.

The file to modify is twitget.php. Start by changing the function process_links to look like the following:

function process_links($text, $new, $urls) {
    if($new) {
        $linkmarkup = '<a rel="nofollow" target="_blank" href="';
        $text = preg_replace('/@(\w+)/', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);
        $text = preg_replace('/\s#(\w+)/', '<a href="http://twitter.com/search?q=%23$1&src=hash" target="_blank">#$1</a>', $text);
    }
    else {
        $linkmarkup = '<a rel="nofollow" href="';
        $text = preg_replace('/@(\w+)/', '<a href="http://twitter.com/$1">@$1</a>', $text);
        $text = preg_replace('/\s#(\w+)/', '<a href="http://twitter.com/search?q=%23$1&src=hash">#$1</a>', $text);              
    }
    if (!empty($urls))
        foreach($urls as $url){
            $find = $url['url'];
            $replace = $linkmarkup.$find.'">'.$url['expanded_url'].'</a>';
            $text = str_replace($find, $replace, $text);
        }
    return $text;
}

Here, we’ve added the argument $urls, which will come from the entities field of the tweet data. This data is used to create the appropriate anchor markup, in the foreach loop. The actual link URL is maintained, while the display URL is changed to the expanded_url field supplied by the entities information. Note I’ve also modified the replacement string for hashtag searches, adding &src=hash to the href attribute in the achor tag.

Now we need to add the entity data to the function calls. Search for the process_links function within the file. There were only two instances of it used in my version. Add the third parameter to the function calls as follows:

$link_processed = process_links($whole_tweet, $options['links_new_window'], $tweet['entities']['urls'])

That third parameter should be added to every invocation of process_links. That provides the URL information to make our earlier changes work.

That’s it. Save the file and Tweets should now display the proper link text, while still linking to the t.co URL’s as specified by Twitter’s guidelines.

Custom More Text for WordPress Posts

0

A note to the less programming savvy readers out there, this one is full of programming jargon and can likely be safely ignored. In fact, unless you’re writing a blog client, you’re likely to find this one pretty uninteresting.

For those who are interested, the rest is after the link with the custom text…

Click Here to Read More

Document Code the First Time Around

0

Lesson learned- for any coding much beyond a module or two, make sure you figure out a documentation method and stick to it across all modules. I’ve just spent the past several hours going through my blogtool code and fixing all those mistakes. Tedium doesn’t begin to describe the process. I can’t imagine having needed to do that for a more significant project.

Definitely a case where it pays to get it right the first time.

Fun With Numbers

0

Periodically, I try to take a look at our home finances to see if there’s something that can be done to find some hidden stash of money. So far, my efforts have been for naught.

One expense I always investigate is our mortgage payment. I’ve always tried to pay ahead on the mortgage to save on future interest payments. So yesterday I got curious about what the best way to pay the curtailment- at the same time as the payment or halfway through the month or some other day of the month? I could have resorted to a web page that calculates amortization tables, but what fun is that?

So I wrote some python code that can be used to generate a repayment table.

Here’s the meat of it:

Months = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] 
Month = 0

def setPaymentParameters(payment, rate, day = 0):

    monthlyrate = rate / 100.0 / 12

    def _calcMonth(principal, curtailment = 0):

        def _calcADB(principal, curtailment, day):
            global Month, Months

            dim = Months[Month]
            Month += 1
            if Month == 12:
                Month = 0
            return ((day * (principal + curtailment)) + ((dim - day) * principal))/dim

        # _calcMonth code starts here...
        adb = _calcADB(principal, curtailment, day)
        interest = adb * monthlyrate
        return (principal,
                adb,
                interest, 
                payment-interest,  # principal payment
                curtailment,
                (payment-interest)+curtailment,  # total principal payment
                principal-((payment-interest)+curtailment))

    return _calcMonth

So the setPaymentParameters function returns a function that will calculate the monthly interest, principal payment and so forth for a single month. The function returned is a closure over the set monthly payment, the interest rate and the day of month a theoretical curtailment payment is made. No curtailment is necessary for the function to work.

In order to determine the effect of curtailments separate from the normal payment, the calculation uses an average daily balance method. For instance, a normal payment is typically made on the 1st of the month and a separate curtailment payment is made on the 15th. The average is calculated by summing the days the principal the post-payment level and adding the sum of the days the principal is at the post-curtailment level. Then divide by the total number of days to get the average daily balance. In the absence of a curtailment, the calculation simplifies to the prinicipal balance at the beginning of the period.

Following is an example of how to use the function:

Rate = float(5.25)
Payment = float(1000.00)

Amortization = []
CalcMonth = setPaymentParameters(Payment, Rate, 15)
Principal = float(150000.00)
while (Principal > 0):
    t = CalcMonth(Principal)
    Amortization.append(t)
    Principal = t[6]

print len(Amortization)
interestTotal = float(0.0)
for i in range(len(Amortization)):
    print map(lambda x: format(x, ".2f"), Amortization[i])
    interestTotal += Amortization[i][2]

print format(interestTotal, ".2f")

The output won’t be particularly pretty, but it will list the total number of payments made to payoff the loan, followed by a breakdown of the effect of each monthly payment, followed by a calculation of the total interest paid. A monthly payment line will look like this:

['150000.00', '150000.00', '656.25', '343.75', '0.00', '343.75', '149656.25']

From left to right, we have the beginning principal, the average daily balance, the interest for the month, the principal paydown, the principal curtailment, the total principal paydown and finally the principal balance after the payment is applied. Each subsequent month uses this final principal balance number as the beginning balance.

The above snippet doesn’t use a curtailment payment to accelerate the paydown of the mortgage. To do that, the while loop needs to be modified slightly:

Curtailment = float(500.00)
while (Principal > 0):
    if len(Amortization) == 0:
        temp = CalcMonth(Principal)
        t = (temp[0],
             temp[1], 
             temp[2],
             temp[3], 
             Curtailment, 
             Payment-temp[2]+Curtailment,
             Principal-(Payment-temp[2]+Curtailment))
    else:
        t = CalcMonth(Principal, Curtailment)
    Amortization.append(t)
    Principal = t[6]

The modification is needed for the first payment. Since it’s the first payment, no curtailment is made, so the interest is calculated on the entire loan amount. The returned payment info needs to be modified then, manually inserting the curtailment payment. Thereafter, all calculations use the curtailment.

Here are the first couple of payment output lines:

['150000.00', '150000.00', '656.25', '343.75', '500.00', '843.75', '149156.25']
['149156.25', '149424.11', '653.73', '346.27', '500.00', '846.27', '148309.98']

The curtailment payment is included and the ending principal balance includes the extra payment. Notice the second line’s average daily balance number, which is higher than the starting principal balance. To fully understand that, first notice that the setPaymentParameters was called with the day set to 15, meaning the curtailment payment is applied on the 15th of the month, not the same day as the normal payment. Therefore, there are 15 days where the principal sits without the curtailment payment applied. Then the payment is applied for the remainder of the month. The end result is the ADB, which is used to calculate interest, is slightly higher than the principal balance after the curtailment.

The final answer to my question about the optimal day to apply the curtailment turned out to be- it saves the most money if the curtailment is paid on the same day as the normal payment. This makes sense since in general, paying earlier means the outstanding principal is reduced quicker, therefore interest is minimized.

But, that’s not the whole picture. Sometimes, for monthly household cash flow purposes, it is preferable to make multiple smaller payments. Will that result in a big difference in total interest paid? The answer there turns out to be no, it won’t. Depending on the amount owed and repayment length, the difference is only a few hundred dollars.

Design Is Not a Straight Line

0

I’ve recently attained a renewed interest in my blog client blogtool. A big part of that renewal is due to unfinished business- I’d alway meant to release it into the wild but had never taken the time to learn how to package it. I finally took that plunge a few weeks ago. Ever since, I’ve come up with a series of improvements, fine tunings and new ideas to make it a more capable tool and a better piece of software in general.

(more…)

Release Announce- blogtool v1.1.0

0

I’ve just uploaded blogtool v1.1.0 to pypi.

The minor release number bump is due to switching the option parser library as well as adding the ability to process information from the standard input. The comment option has also been modified to take a couple of arguments.

I’ve added some spiffy, new web based documentation to help with getting up and running with blogtool. The documentation stuff was generated with the help of sphinx, a very cool tool that uses a different plain-text markup format that I’ll be exploring adding support for in blogtool.

Announce- blogtool v1.0.1

0

I’ve released blogtool version 1.0.1 into the wild.

This is a bug fix version. It fixes an error in HTML output where tags like \<img> were not being properly closed. Also takes care of stray ‘&’ characters that need to be escaped.

It also fixes some bugs in the getpost option related to converting the post HTML into it’s markdown equivalent. Nested inline elements were not properly accounted for and escaping of a number of characters was also added.

Release Announcement- blogtool

0

I wrote a blog client a couple years ago and have been developing it on and off ever since. One of the reasons I hadn’t done anything public with it is I needed to take the time to organize it appropriately for something like pypi.

I’ve finally taken those steps and have put it out into the wild. The source code is on github, here. I’ve also used python’s setuptools to publish it on pypi, here.

It works with my self-hosted WordPress blog and I’ve used it almost for all but a handful of the blog posts I’ve written on the blog, so I consider it reasonably well tested for those purposes. It won’t support all of WordPress features, but I plan on changing that as I migrate some of the functionality over to using more of the WordPress API. When I originally wrote blogtool, WordPress didn’t have its own API for posting, so that’s why that shortcoming exists.

There are a couple of nice features to blogtool that I thought I’d mention here. One, it uses python-markdown to mark-up post text. It’s proven very capable for my style of blogging, which is 90% text. It handles pictures as well, and I’ve added a little wrinkle for that purpose. Rather than supply a URL or some such for markdown's syntax, simply supply a file path to the picture. Then, blogtool will take care of the rest.

The other nice feature is that posts can be retrieved and edited from a blog. When retrieving, it will reformat the HTML into markdown style format. This is useful for editing comments as well as posts.

So, there it is. My first published code project.

Dealing with Unicode in Python

0

I haven’t touched the code for the blog client I’d written in quite awhile. This is largely because it works well for my purposes and I haven’t had the need to add further support for other features.

There has been one major shortcoming for it, however, that I hadn’t taken the time to investigate and correct. Often times, when quoting text from an article on the web, I would get a unicode decode error related to the blob of text I’d copied from the browser.

Now, I understood in general terms what the problem was: stray characters within the copied text were not ASCII characters and markdown chokes on those characters. I had an inelegant workaround that kept me from properly dealing with the problem: I’d scan the text for offending characters, typically punctuation, and replace them with reasonable ASCII equivalents. It was a pain, but it worked.

Like all workarounds, this method had limitations. Specifically, certain special letter characters like letters with umlauts, tildes, accent graves or accent aigus over them cannot be duplicated. The fact that I didn’t run into that problem a lot kept me from dealing with it quicker. Also, scanning a block of text for unicode violators is tedious.

What I failed to understand at the time was that the characters on a web page were encoded in some kind of format, like UTF-8 for example. For most of the alpha characters (those without umlauts and the like) UTF-8 and unicode are identical. The problem comes in when characters don’t line up so neatly. What I finally came to understand was that the encoded web page text needed to be decoded into unicode prior to processing. The concept seems so blisteringly obvious, now, that I’m actually perplexed as to how I never grasped it originally.

So I finally fixed the problem. Or, perhaps better put, I came up with a solution with a better set of trade-offs. Because in order to actually “fix” the problem, it would be necessary to always know how text had been encoded. Unfortunately, from the program’s perspective, it can’t be done.

But it can make some educated guesses.

Here’s the basic code that fixes the problem:

for encoding in ['ascii', 'utf-8', 'utf-16', 'iso-8859-1']:
    try:
        xhtml = markdown.convert(text.decode(encoding))
    except (UnicodeDecodeError, UnicodeError):
        continue
    except:
        print "Unexpected Error: %s\n" % sys.exc_info()[0]
        sys.exit(1)
    else:
        return helperfunc(xhtml)

In this case, markdown is an object for marking up markdown formatted text. Prior to passing the text to the markdown object, I decode it using encoding that represent the most likely encodings I’ll run into. If an encoding fails, that a UnicodeDecodeError will get raised, which is caught by the first except clause. That clause merely passes control back to the for loop where the next encoding is selected and tried. Rinse, repeat. When no exception is created, control passes to the else clause where normal program flow continues on the returned xhtml from markdown.

This section of code eliminates, in my case, almost all occurrences my afore explained unicode problems. But that’s because the vast majority of webpages I use are encoded using UTF-8. I’ve since added a command line option to specify the encoding to use for decoding purposes. This should provide a means to cover all other situations that arise. In this instance, when the user specifies the encoding on the command line, the user specification supersedes all other encodings and is used. The presumption is the user knows what they are doing.

The code to support that looks like this:

if charset:
    encodings = [charset]
else
    encodings = ['ascii', 'utf-8', 'utf-16', 'iso-8859-1']

for encoding in encodings:
 .
 .
 .

The rest of the code looks identical to the above snippet.

It was a good exercise for me to muddle through, as I now fully comprehend the unicode problems that can arise and how to deal with them. The basic rules are:

  1. Decode text going into the program.
  2. Encode text coming out of the program.
  3. Use unicode for the string literals within the program.

These should help keep me out of unicode trouble in the future.

WP Mystique Twitter Widget

0

The standard Twitter widget that comes with the Mystique theme has a couple of shortcomings, in my opinion. One, any shortened URL’s are displayed using Twitters t.co link instead of whatever shortener the user may actually be using. I’ve installed a YOURLS site for just this purpose, so I’d like to see my site displayed. Second, hashtags are not linked back to a Twitter search.

Below the fold are a few lines of PHP that will address these problems. A brief word to the wise, these mods only work with Twitter’s API V1. API V1.1 will require OAuth to perform this task.

(more…)

lua and require

0

A cool little feature I just came across with lua is that the require function can be used as part of the actual working code. For instance, an optional piece of functionality can be required based on an option flag. The benefit is unnecessary code is only loaded when it’s wanted.

I just wrote an IMAP4rev1 parser to help with making sure imap commands generated by luaimap are syntactically correct. The idea was to eliminate a potential point of confusion when working with the library. Of course, the checker is pretty green at this stage, so it may give a false negative. But at least is give a starting point to the would be developer.

Typically, I’d just require the module at the top of my code, like so:

local chksyntax = require("parser")

And then the parsing function can be invoked in the usual way.

But in this case, I figured the checker should really be part of the IMAP object that luaimap creates. Then I thought, wouldn’t it be nice if I only loaded the module when syntax checking was desired? That way, after a debug and proving out phase, syntax checking could equally be turned off resulting in fast execution. So I added an options table to the new method and then inserted the following lines of code:

       .
       .
       .
if options then
    if options.syntaxchecking then
        o.__checksyntax = require("parser").parse
    end
end

The other cool thing here is that lua allows for grabbing the parse function reference inside the module. In other words, since require returns a table, I can immediately add the parse element for the assignment without generating a syntax error. This is also because lua treats functions as first class values, so they can be assigned just like other variables.

Now, elsewhere in the IMAP object, I can use the parser like so:

if self.__checksyntax and not self.__checksyntax(cmd) then
    error("I can't in good conscience send this command to the server: "..cmd)
end

Anyway, I thought this a somewhat novel (for me) insight into lua usage, and figured I’d pass it along.

bogotrain.lua

0

My spam filter of choice has been bogofilter for many a year now. For the mail I receive it got to be very accurate quickly and it has remained so ever since. It is one of the Bayesian variety of spam filters and requires “training” to keep it properly classifying email.

I use an IMAP server for working with my mail so integrating bogofilter with the server is less than ideal, which would be to use a keystroke and immediately reclassify the mail. Instead, I’ve assigned a couple of training folders that I then farm out to a script run as a cron job. Specifically, for misclassified spam (i.e. mail that’s actually good but was misclassified as spam) I created a spam2mail folder and for misclassified good mail (i.e. mail that’s actually spam but is classified as good) I use the Junk folder. The script, using IMAP, interrogates the mail folders, retrains bogofilter on the mail, and then places the mail in the appropriate final destination, either my spam folder or my INBOX.

Originally, I wrote the script in question using perl and IMAPtalk. Since I wrote an IMAP library in lua, I figured it appropriate to rewrite the script in lua using my library.

After the break is the code.

(more…)

git fast-export

0

I had a personal project I wanted to put onto Gitorious, but I didn’t want all of the history put up there because I had some username/passwords in the history as part of some test code. These pieces were gone in the more recent versions of the code. But git makes it so easy to recall and search history, it’s the sort of thing best not to risk.

After mucking with filter-branch and rebase, neither of which really gave me a repository I wanted to put out there, I came across the fast-export and fast-import commands. The export command, particularly, was quite flexible and I was able to arrive at a workable solution. I used fast-export to dump the last 10 commits of my master branch up to HEAD into a file, and then imported it into a new repository, finally pushing the result out to Gitorious.

Here’s the fast-export command I used:

git fast-export master~10..master >> export

I still have the original repository, but I’ve mothballed it and will work from the new one. Seems like a reasonable compromise and ended up being a pretty straight-forward application of the available tools. No worrying about unintended side effects or subtle forms of data loss.

Lua Packages

0

I’m starting to look into how to add pipelined or asynchronous support to my luaimap4 project. It had been awhile since I’d looked at the code so I started the task of refreshing my understanding of the code. In the course of doing so, I opted to take the original source file and break out some of the functionality into separate support modules. After doing so, I didn’t like that I now had multiple source files directly in my install directory, so I opted to create an imap4 subdirectory and put all the related modules under that directory.

And that’s when the fun began.

(more…)

An Awesome Mail Widget

2

The name really does lend itself to abuse. Regardless, I leveraged some previous lua code to create a nice little email widget that checks my email account for new mail and, if so, creates a menu of the mailboxes which have new mail that I can select and launches mutt with that folder open.

Code and explanation after the jump.

(more…)

CSS and PNG File Icons

0

I’m sure this is basic web developer stuff, but I have zero web developer training. Anything I know, I’ve gleaned on my own by reading and modifying source code. Yesterday, I figured out a little (but important!) detail regarding the use of PNG image files for displaying icons on a web page. I’m jotting it down here as a reference for myself.

(more…)

git rebase and merge strategy ours

0

I created an interesting problem for myself last night while working on some code for personal use. I’ll state upfront that the code is not public so I didn’t have to worry about screwing things up for other people. Just myself, though the goal was not to. Following is a description of the circumstances and how I used git to rectify the situation.

(more…)

luaimap4

0

I just created a repository on Gitorious for a client-side IMAP4 library. The project is luaimap and I’ve published it under the MIT license. The project consists of two files: imap4.lua is the actual library and checker.lua is a sample program that checks an IMAP account for new mail using the library. The library minimally requires luasocket to establish a basic connection. To establish a TLS connection luasec is also required.

The library implements all IMAP commands except ‘AUTHENTICATE.’ I’ve only tested it against a Dovecot server, so consider it very green. For now, it is a synchronous implementation: commands are sent and return a response from the server for the command. Going forward, I intend to add support for the AUTHENTICATE command and look at trying to take advantage of command pipelining.

Anyone intending to use it should read RFC3501, the document on which the library is based. The intention of the library is to handle the protocol related details of IMAP4rev1, not to enforce IMAP4 client side design practices.

To use it, simply install the file in a project directory and use a line like:

local imap = require("imap4")

To make it available on the system, copy the file in a path that exists in lua’s package.path variable. On my system, a debian/testing setup, I’ve installed it to /usr/local/lib/lua/5.1/.

Case Insensitive lua Methods

0

I’ve been working on a lua library to support IMAP4Rev1 command exchanges with an IMAP server. Towards that end, I’ve created a lua object. Now, I’d known a little about using lua and the flexibility of lua table since my window manager uses it as a configuration language. Until this project; however, there has been no reason for me to delve into the deeper depths of lua tables.

Now that I have, I’m not sorry as I’ve learned quite a bit of interesting techniques. Below the fold is a rather simple trick to make lua methods case insensitive.

(more…)

Go to Top