Wednesday, December 24, 2008

Regular Expression (RE) as PL ?

Just a duplicate of my ol' blog posting in http://www.regexadvice.com :

Regular expression as a programming language. Is it possible?

I began to learn about regular expression about 3 years ago. I was at that time, never heard about, or even expect to know such a thing. I was asked to learn Perl to solve some Bioinformatics work, and in my mind i don't have any regex knowledge, except mathematical statements like x = { y| y is subset of z}, or y = {1,2,3}, and some basic knowledge of Z-language. Although the math statements did not closely resembles the regular expression statements that are used in programming language, its foundation is still a 'regular expression' (since regular expression is about stating the regular behavior of the item that we want to specify). We assure y = {1,2,3} by /[1-3]/. Anyway, from that point of time, i began to learn Perl, and slowly i was introduced to m// operator, s/// and tr/// (text processing requires a massive use of these operators).

What i like about regular expression is its compactness. Techniques for simplifying codes have been explored long time before. One approach is by using function (which is eventually another concept that come from math). Using function (or some use the word subroutine), we manage to reduce codes, and simplify them just by calling their name instead of rewriting the same codes. Almost with a similar purpose in mind, we use regex to simplify complex requirements, which is by representing a set of rules within a simple statement, i.e. /a-z/. One should realize that we are representing many lines of codes within a single statement. Just imagine, using regular expression as programming language, a million lines of codes can be turned into just several lines of codes (or symbols).

I'm also believe that regular expression can possibly be a language that is easier to remember, and can be written faster. This because in regular expression we use simple symbols to represent (possibly) complex rules. It have been proved that our brains can (easily) remember things that we see visually compared to the things that are written or touched. Also human brains will capture things in graphical form. Based on this fact, isn't it possible that we can remember some simple symbols more faster than to remember huge amount of text? Also since we only need to write symbols, it will not take us a long time to write the codes in regular expression (unless for complex rules).

Regular expression is specified using a finite set of symbols, such as '?' to represent existence, '+' to represent repetition etc, make it looks more encrypted. Programming language was created to bring computer language (machine language) more closer to the natural language, so that it will become easier for people to write codes to be computer programs. Based on this fact, it seem impossible for encrypted code like regular expression to be accepted as one of high programming language.However, it is not an excuse. Even most of programming language today require some comments to clarify its purpose, or explain what the code does. People might claims that some high level language is already self-explained (the codes explains its purpose). However many of us will found that this statement is not true for all cases. When a section of codes becomes so complex, even the most proclaimed self-explanatory language require at least few comments to describe the codes. Some of todays implemented regular expressions allows comments to be included in the regular expression statement. So it is not encrypted at all when the regular expression are combined with some extra comments. In implementation, no different in code size since comments will be ignored.

I'm just writing the general ideas of how regular expression can possibly be a programming language here. There's still a lot of things that need to be considered, studied and experimented with. But I'm still hoping for this idea to become true.

Guess what? Randal Schwartz (the Perl hacker) posted a comment on it! :)
He said:
Abigail uses regex to compute prime numbers. Ovid created a prolog-like problem solver using the regex engine. The regex engine is quite versatile, especially with built-in back-tracking.
And then later i added more updates:

I'm thinking about the advantage of natural (written) language over mathematical notations (and vice versa). Since (most) natural language statements can be simplified using mathematical notation e.g. like 'one plus two is equal to three' and '1+2=3'. But somehow there's a situation where mathematical notations gets longer than natural language. For example (unfortunately i can't show the mathematical expression), when we need to describe the relations of elements between sets, i found it is more convenient to describe it using simple sentence. But i arrive at a conclusion that, this 'advantages' over another is simply because there are no simple notation in either languages (neither natural nor mathematical) to describe the semantics of another language. If we need to describe certain aspect of a language (X), in another language (Y), we must define a notation (in Y) which describe the semantic described by other language (X) in similar fashion, which is as precise and understandable (same complexity in interpretation) as the original (X). So when we convert into the target language (Y), we gets a 'similar' complexity with the original (X), but with a new notation.

This idea basically was applied in regex to natural language, except with a different complexity. This complexity increased because regex tends to describe a pattern (many characteristics and semantics in a minimum notation) of natural text, not directly one-to-one interpretation (i mean the whole expression, not the symbol). If we want to allow regex to be able to describe a semantic of program (or as programming language), there should be many (not all) one-to-one correspondence between regex statement and computer program command.

Well, that's all. I haven't found anymore idea than this (yet!)

Yeah, that's all (since i never updated it). The last update is on June 12th, 2008.

No comments:

Post a Comment