A very common problem in computer programming is finding the longest increasing (decreasing) subsequence in a sequence of numbers (usually integers). Actually this is a typical dynamic programming problem.
Dynamic programming can be described as a huge area of computer science problems that can be categorized by the way they can be solved. Unlike divide and conquer, where we were able to merge the fairly equal sub-solutions in order to receive one single solution of the problem, in dynamic programming we usually try to find an optimal sub-solution and then grow it.
Once we have an optimal sub-solution on each step we try to upgrade it in order to cover the whole problem. Thus a typical member of the dynamic programming class is finding the longest subsequence.
Prefix encoding, sometimes called front encoding, is yet another algorithm that tries to remove duplicated data in order to reduce its size. Its principles are simple, however this algorithm tend to be difficult to implement. To understand why, first let’s take a look of its nature.
Please, have a look on the following dictionary.
Instead of keeping all these words in plain text or transferring all them over a network, we can compress (encode) them with prefix encoding.
Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur.
The only problem with these algorithms is that not always the input stream of data is constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress.
We can hardly imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the alphabet letters the very base of the words so it is the minimal part for word construction and it’s hard to compress them.
Fortunately this isn’t true always. An algorithm that tryies to deal with non repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90’s).
Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do.
Now we have a shorter string, but we can go even further with keeping only the first year. All other years will as relative to this year.
Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont be always formatted in such pretty way. How about the next character stream?
We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become:
The problem is that we can’t decide which value will be the base value so easily. What if the data was dispersed in a different way.
Now the value of “100” isn’t useful, because compressing the stream will get something like this:
To group the relative values around “some” base values will be far more handy.
No matter how fast today’s computers and networks are, the users will constantly need faster and faster services. To reduce the volume of the transferred data we usually use some sort of compression. That is why this computer sciences area will be always interesting to research and develop.
There are many data compression algorithms, some of them lossless, others lossy, but their main goal aways will be to spare storage space and traffic. These algorithms are very useful when talking about data transfer between two distant places. Perhaps the best example is the transfer between a web server and a browser.
Actually when a file is executed by the client’s virtual machine, it doesn’t matter how “beautifully” it is formatted from a programmer’s point of view. Thus the spaces, tabs and the new lines don’t bring any significant information for the environment. That is why such compressing tools like YUI Compressor, Google Closure Compiler, etc. remove those symbols. Well, they can achieve even more in order to improve the compression rate. In this post I won’t cover this, but this shows how important data compression algorithms are.
It would be great if we could just compress data with some tool. Unfortunately this is not the case and usually the compression rate depends on the data itself. It is obvious that the choice of data compression algorithm depends mainly on the data and first of all we must explore the data.
Here I’ll cover one very simple lossless data compression algorithm called “run-length encoding” that can be very useful in some cases.
This algorithm consists of replacing large sequences of repeating data with only one item of this data followed by a counter showing how many times this item is repeated. To become clearer let’s see a string example.
This string’s length is 24 and as we can see there are lots of repetitions. Using the run-length algorithm, we replace any run with shorter string followed by a counter.
First thing to say TinyMCE is a very popular WYSIWYG online based editor. It’s very widely used in the web, as may already know it it’s part of the default WordPress installation. Out of the web, of course, there are some other editors as well. The most used and well developed projects are the Yahoo!’s YUI 2 Rich Text Editor and CKEditor also known with his past name – FCKEditor.