I have been playing around with Clojure and after familiarising myself with the core, I decided to start writing little programs that forces me to choose the right idioms. I am starting off with Levenshtein edit distance of two strings.
Levenshtein edit distance
Levenshtein edit distance between two string is a way of quantifying how similar or dissimilar they are, lower distance implying higher similarity. The algorithm to calculate Levenshtein distance is fairly simple - the algorithm cares about three operations that can be performed at a particular position in a string to move it towards the other - insertion, deletion and substitution.
This can be recursively solved for two strings A0:AS
and B0:BS
by comparing AS
, B0:BS
; A0:AS
, BS
and AS
, BS
.
Code
(ns clojure_dojo.core)
(defn edit-distance
"Return the Levenshtein edit distance between two strings"
[first second]
(cond
(empty? first) (count second)
(empty? second) (count first)
:else (min (+ 1 (edit-distance (drop 1 first) second))
(+ 1 (edit-distance first (drop 1 second)))
(+ (cond
(= (take 1 first) (take 1 second)) 0
:else 1)
(edit-distance (drop 1 first) (drop 1 second))))))
Test
(ns clojure_dojo.t-core
(:use midje.sweet)
(:use [clojure_dojo.core]))
(facts "about `edit-distance`"
(fact "it returns length of other string if one string is empty"
(edit-distance "foo" "") => 3
(edit-distance "" "foo") => 3)
(fact "detects deletions"
(edit-distance "foo" "oo") => 1)
(fact "detects substitutions"
(edit-distance "ab", "cd") => 2)
(fact "detects additions needed"
(edit-distance "oo", "foo") => 1
(edit-distance "kit", "sitting") => 5
(edit-distance "intention", "execution") => 5
(edit-distance "sittin", "sitting") => 1))
The problem with this approach
This solution looks fine - it passes the tests. However, if we carefully observe the recursion tree, we notice that there are sub-problems that are solved multiple times and this makes this algorithm’s order of complexity Θ(3^min(m, n)). This can be observed from the call tree of this recursion.
Improvement
Recursive algorithms of this nature can be improved in two ways - memoization during recursion or applying Dynamic Programming in a bottom up manner. The next blog post in this series will deal with how these two approaches can be done in Clojure.
The code for the above solution is on GitHub.
If you have questions or comments about this blog post, you can get in touch with me on Twitter @sdqali.
If you liked this post, you'll also like...
- Implementing feature toggles for a Spring Boot application - Part 4
- Implementing feature toggles for a Spring Boot application - Part 3
- Implementing feature toggles for a Spring Boot application - Part 2
- Implementing feature toggles for a Spring Boot application - Part 1
- Setting up a secure etcd cluster behind a proxy
- Handling Deserialization errors in Spring Redis Sessions
- CSRF Protection with Spring Security and Angular JS
- Controlling Redis auto-configuration for Spring Boot Session
- JWT authentication with Spring Web - Part 5
- JWT authentication with Spring Web - Part 4
- JWT authentication with Spring Web - Part 3
- JWT authentication with Spring Web - Part 2
- JWT authentication with Spring Web - Part 1
- JSON logging for Spring applications
- Injecting dependencies into a Spring @Configuration
- Filtering responses in Spring MVC
- Deprecating domain events in Axon
- Programmable exit codes for spring command line applications - 2
- Programmable exit codes for Spring command line applications
- Using custom arguments in Spring MVC controllers
- Authentication for Apache Camel HTTP components
- Thoughts on Open Graph tags
- Integration testing Spring command line applications
- Integration testing challenges for non-web Spring applications
- How thinking of Documentation as Legislation helped me become a better programmer
- Implementing custom annotations for Spring MVC
- Validating RequestParams and PathVariables in Spring MVC
- Testing async responses using MockMvc
- Running multiple applications in the same Tomcat installation
- Making sense of Cloud Foundry security group declarations
- Configuring Cloud Foundry Java Memory Parameters
- Importing the Yelp dataset into MongoDB
- A simple JMeter test with login
- Implementing Rate Limiting in Rails - Part 2
- Implementing Rate Limiting in Rails - Part 1
- Python Hack - Dynamically override an object's attribute
- Fitting an Image in to a Canvas object
- Accessing Environment Variables in Gradle
- Reading user input in Gradle scripts
- Ruby, Named Capture Groups and Local Variables
- Named Capture Groups in Regular Expressions
- Decomposing URLs in Python
- Shared history in Bash
- Managing Gemsets in Rbenv
- Looking up Compiler params used to compile a Ruby version
- Navigating Stacktraces in Emacs
- Python's bool type
- Graph databases 1 - Modeling
- Validating JSON in Emacs
- Emacs hack: Viewing Git logs while composing commit messages
- Configure Git's comment character
- My experience working remotely
- Oh I can build it in...
- JavaScript, clipboard access and hidden flash widgets
- Reducing Emacs startup time while committing
- My first Firefox plugin: GetCache - View cached version of the current page
- GetCache - A Chrome plugin to view cached version of the current page
- On REST, Content-Type, Google Chrome and Caching
- How Browsers Detect If You Are Offline
- D3.js Workshop
- Visualisation - How European clubs dominate their leagues
- Understanding Python's "with" statement
- Heredocs in Ruby and Python
- Micro Journal - simple Git-backed journal in Python
- VodQA NCR: Maintaining Large Test Suites
- Know Your Tools - Don't Shoot Yourself in the Foot
- Managing security certificates from the console - on Windows, Mac OS X and Linux
- Indian and Pakistani cricketers - who make better debuts?
- Fixing Flyspell for Emacs in Mac OS X
- Finding un-merged commits with git cherry
- Bullet proof Jenkins setup
- Why your project should have a Getting Started guide.
- Debugging: C Sharp's HttpWebRequest, 100-Continue and nginx
- Wikipedia Page Hopping
- Empathy Log Parser
- Binary Signature Art
- Java Arrays in JRuby
- Autorun.py - Execute stuff on file change