|
Home Getting the most out of EPOSIs EPOS for me?EPOS BenefitsPayPointLinks
|
|
|
|
A New Programming Language
Basics A new programming language would hopefully incorporate many useful features from other programming languages. This would, to a large degree, reflect the biases and presumptions of the writer. This section will outline what these are for this writer. First, let me introduce the cultural heritage behind my choices. I started programming in Algol 60 and then migrated to FORTRAN, C, and finally C++. I also had a strong interest at one time in FORTH. FORTRAN had little to commend it. Algol, C, and C++, introducing the notion of structure, have clearly many benefits. The later introductions of C++ take these benefits a lot further. FORTH is an interesting language. With its lack of structure and type checking it ranks as one of the poorest programming languages. Some features however, are outstanding. Two of these are the extensibility, and the simplicity of the compiler. The use of RPN (reverse Polish notation) seems an anachronism today. RPN does seem to fit the model of data processing in some way. The main justification for not using it is that there is a conflict with established mathematical notation. To argue against this, I would say that firstly, data processing is only rarely concerned with mathematics, and secondly, that few users are all that familiar with mathematical notation. In favour of RPN, there is the ability to handle multiple arguments, and return multiple results, in a systematic way. With this background, I will propose a new programming language that will be an extension of FORTH, but will include type testing. This will allow overloaded operators to be used. Notation The language is to be used with a computer, and so all notations are limited to the standard keyboard. Combinations of symbols may be used to create tokens. Where necessary, sequences of tokens are separated by commas. Numbers Numbers are represented by sequences of digits in the range 0 to 9. These may be preceded by a negative sign. To clearly separate numbers from tokens, the negative sign will be the caret, ^, which will not be used for words. Floating point numbers will also use the caret for exponentiation, so the number 23x103 becomes 23^3 and -23x10-3 becomes ^23^^3. The decimal point is used for decimal fractions, but must be surrounded by numbers. Tokens Tokens or words are made up of sequences of letters or symbols but not both. Letters are the characters a to z or A to Z. Capitals are different to lower case letters. Symbols are all the remaining printable characters excluding the semi-colon and the caret. Thus Help, now, and, print are all words, as are =, //, *=. A character word may include one or more spaces, but when several spaces are together in a word, only the first is significant. Thus, rate of tax & rate of tax are the same token, but rateoftax is different. Words made up of symbols are normally used for operators, but words made up of characters may be operators or variables. This is not enforced by the language. Words are separated from each other by commas, but letter based words do not need a separator from character based words. Lines Lines will be significant. This is a departure from both structured languages and from FORTH. A line is ended by a carriage return, or by a semi-colon. Comments can appear after a semi-colon, and will be ignored. The following is a valid program line, but will do nothing: ; This is just a comment. Paragraphs Paragraphs will also be significant. Certain operations will be completed at the end of a paragraph. For instance, if a file is opened in a paragraph, it will be closed at the end of the paragraph. Consequently, we can make the opening redundant as a file read can check to see if the file is open and open it if necessary. The file will then be closed at the end of the paragraph. Other candidate operations are database open and close and memory allocation/free. Stack Part of the processing will be use of the stack, as in FORTH. In this new language the stack will always be emptied after the completion of a line. This ensures that the effect of a line will not depend on surrounding lines. The return stack concept is also used. This is not emptied. Simple arithmetic For simple arithmetic statements, we do not differ from FORTH. 3, 4+ print; Add 3 to 4 and print the result Note that white space in a line is not significant when it is not a part of a word, but words cannot be broken up. The comma separator is used to separate 3 and 4 but is not needed to separate 4 from + or + from print. Each number may be copied to the stack when it is executed. Anything left on the stack is dropped at the line end. Variables Variables may be created and given values; for now, just consider their use in arithmetic. X, Y+print; Add X to Y and print the result; Note that the + word is overloaded to add variables that are referenced by X and Y. To create a variable, use the format int, X,Y; create variables X and Y. To assign them values, use 23 7int X Y; create X with value 7 and Y with value 23. The word int knows it has to create an integer with a value because one exists on the stack. Consider the order of execution. 7 int X Y; creates X with a value 7 and Y with an undefined value. Undefined All variables that are not assigned a value are assigned the value undefined. This is a specific value that will be subsequently recognised by other words, so as to be consistent. So X Y+ will be undefined if X and or Y are undefined, or if the sum X+Y is outside the integer range. This is not enforced by the language, but will be provided for, by all basic words, and programmers are expected to comply with this philosophy with any words that they write. Where it is convenient, I will use the symbol # to represent an undefined value. Mathematical symbols A rich set of mathematical symbols is provided. First, there are the usual operators, +, -, *, /; plus, minus, times, divide. This is a valid program line, as with undefined arguments, each will return an undefined result and any remaining undefined value on the stack will be dropped. Next there are subsidiary symbols, <, >, <=, >=, =,! (not), &, | (or), % (remainder). These depend on the arguments, and may have different meanings with different arguments. For instance, & is a logical and with Boolean arguments, and a bitwise and with integer arguments. We also have >>, and << which are used to rotate integers, but are used to pipe streams. I will leave it to a dictionary to define all of these, but will introduce any that are useful as we go along. Defining words These come from the FORTH origins and will use a similar notation. A definition is started with a colon and ended with a semicolon. In this language this restricts definitions to one line. This has advantages in readability. More complex definitions will need to be made up of previously defined words. Let us start with an example: : %, 100 */; Calculates percentages as in value 5 % print, where value is any number Once this line has been executed, % is a defined word just like any other. With type checking we will need to include code to check on the types of operand. The word can then be extended to work on other variables such as floats, money, etc. Examples Let us get to work with some examples. A, B + print; Adds A to B and prints the result, as in FORTH. A and B can be integer or floating point or even strings, because of the type identification and overloading. A, B+C:= ; note := is the assignment operator, : only special when alone. Add A to B and store in C. Program Structures All programs consist of a number of sections that contain sequences, selections and iterations. Sections A section is a piece of code that has only one entry, at the beginning, and only one exit, at the end. A section is guaranteed to terminate. Sequences A sequence consists of consecutive words, executed in turn. Selections In this new language, there is no simple if statement; instead there is a multi-way selection using the words ?, ?!, ?# and !?. (if, ifnot, ifundefined and notif. Note that !? is equivalent to ?! or ?#). Iterations An iteration acts on a section and continues until a test is true: note that since all sections are guaranteed to terminate, something extra is required. This effect is achieved by imposing a time limit on any iteration. If the time limit expires before the iteration terminates, the section promptly terminates in a defined manner. It is for the programmer to define the manner of this termination. Co-routines Co-routines are an important part of this new language; in a normal word definition, control is passed back to the caller when an isolated semicolon is encountered in the definition. In a co-routine, control is passed back prematurely each time another colon is encountered. For example : routine blah blah: blah blah blah: blah blah; a co-routine example When routine is called the first time it executes blah blah and then returns. When it executes a second time, it executes blah blah blah. And on the third execution, it executes blah blah. Subsequently it will start again at the beginning. This is not sufficient as we need a mechanism to start afresh on command. (Yet to be decided.) Integrated Development Environment The IDE is an important part of any implementation of a language. Some of the restrictions that have been specified are enforced by the IDE and not by compiler. Restrictions that are part of the human interface such as a maximum line length and a maximum paragraph length are enforced by the IDE. The IDE features the creation of a multi-column layout of selections. Style The programming language will be known as Style. A correctly formed program, written in style and obeying the rules of Style will be called stylistic. A program that is not correctly styled will be called ugly. It is legal to write ugly programs, but the first line should make it clear with a line thus: ;ugly This lets other readers know that rules of style have been broken. |
Visit eBay listings
Thursday, 6 September, 2007
|