BASICwebmaster
Subject: Optimizing code parsers
I've been re-visiting my BASIC interpreter the past few days, and I'm starting to realize just how bulky it is.

One area of concern is the number of passes being made over the code. The first examines every single character to denote line endings, and check for logical line extenders/combiners. The second pass is used to check syntax and look for functions, branches, DATA statements, and so on. The third would be another check of the syntax (whilst gathering the various parameters), and then executing it or generating output code.

I've devised a way to eliminate the second pass, by merging the syntax check for each statement with the storage of the statement into an array. However, I'd like to add a preprocessor, and I can't really see any way to do that without adding a pass. Any ideas?

Subject: Re: Optimizing code parsers
Hi Bill,

I haven't seen the source code for your language, so I'll have to make some generalized recommendations.

Of the languages written in LB that I've seen so far, the largest amount of their time is spent manipulating small (often single character) strings. Strings are expensive in terms of CPU cycles. Whenever possible, your code should use numeric variables and numeric literals, preferably whole numbers.

Reducing the number of passes is not always easy and may not be possible, depending on the language being parsed. There are a few true one-pass compilers out there. Look at traditional Pascal for such an example.

Preprocessors can be tricky to optimize. It mostly depends on the type of preprocessor. If you mean defining constants and conditional compilation, you can probably make the preprocessing concurrent with normal parsing. If you're going for something like the C macro preprocessor, it requires at least two passes by itself. Each pass checks for expandable symbols. However, macros can be nested quite deeply, so your C compiler may need to make tens or hundreds of passes, before it even goes to the actual compiler.

If you haven't taken a close look at my VFScript compiler, it may be worthwhile. VFScript doesn't support true user-defined subroutines yet, but it does have comments and logical line extensions, and even comments after _'s, and it does it all in a single pass. Much of the "magic" is done inside one SUB: "VFS.GetToken". Eventually I'll add a second pass where the first pass collects the names of global symbols. However, if I wanted to, I could force the user to declare everything beforehand and I'd be able to keep my parsing down to a single pass, just like Pascal.

By the way, please do not think that VFScript of a highly optimized program. Since I wrote it, I've discovered things about LB that make VFScript's current implementation 2 ot 3 times slower than it could be.

Subject: Re: Optimizing code parsers
Brent wrote:
By the way, please do not think that VFScript of a highly optimized program. Since I wrote it, I've discovered things about LB that make VFScript's current implementation 2 ot 3 times slower than it could be.

I would be very interested in what you've discovered. :)

-Carl

Subject: Re: Optimizing code parsers
Welcome back to the forum, Carl!

Not wanting to threadjack, I posted my reply as a new topic. See here.


Page 1 of 1


Jump to:  
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum

Full Version
Powered by phpBB © phpBB Group
Design by phpBBXS.Com | Lo-Fi Mod.