How to turn conditional chaining into faster ugly code?
I have 9 different grammars. One of them will be loaded depending on what the first txt line is in the file it is parsing.
I was thinking about outputting the lexer / parser replicator in sep. classes and then creating them as soon as I get the match - not sure if this will slow me down or not. I think some benchmarking is ok.
Indeed, speed is definitely my goal here, but I know this is ugly code.
The code now looks something like this:
sin.mark(0)
site = findsite(txt)
sin.reset()
if ( site == "site1") {
loadlexer1;
loadparser1;
} else if (site == "site2") {
loadlexer2;
loadparser2;
}
.................
} else if (site == "site8") {
loadparser8;
loadparser8;
}
findsite(txt) {
...................
if line.indexOf("site1-identifier") {
site = site1;
} else if(line.indexOf("site2-identifier") {
site = site2;
} else if(line.indexOf("site3-identifier") {
site = site3;
}
.........................
} else if(line.indexOf("site8-identifier") {
site = site8;
}
}
some clarifications
1) yes, I do have 9 different grammars that I built with antlr so that they ALL have their own lexer / parser objs.
2) yes, at the moment we are comparing strings and obivously which will be replaced with some kind of whole display. I have also considered attaching site IDs to a single regex, however I don't believe that would speed things up.
3) yes, it's pseudocode, so I wouldn't get too picky about the semantics here.
4) kdgregory correctly notes that I cannot create a single instance of a lexer / parser pair
I like the idea of the hash to make the code look a little better, but I don't think it will speed me up to anyone.
a source to share
The standard approach is to use a map to connect the key strings to a lexer that will handle them:
Map<String,Lexer> lexerMap = new HashMap<String,Lexer>();
lexerMap.put("source1", new Lexer01());
lexerMap.put("source2", new Lexer02());
// and so on
Once you get the string that identifies the lexer to use, you should get it from the map like this:
String grammarId = // read it from a file, whatever
Lexer myLexer = lexerMap.get(grammarId);
However, there are a few quirks in your example code. First, the calls to indexOf () indicate that you don't have a separate line and Map won't look inside the line. So you need to somehow extract the actual key from whatever string you read.
Second, lexers and parsers generally maintain state, so you can't create one instance and reuse it. This means you need to create a factory class and store it in the map (this is the Abstract factory pattern).
If you expect to have many different lexers / parsers, then it makes sense to use a map based approach. For a small number, the if-else chaining is probably best suited, properly encapsulated (that's the factory method pattern).
a source to share
Using polymorphism is almost guaranteed to be faster than string manipulation and will be checked for correctness at compile time. Is the site
string valid ? If so, FindSite should be called GetSiteName. I would expect FindSite to return an object site
that knows the corresponding lexer and parser.
Another speed issue is encoding speed. It would be better to have your different lexers and parsers in separate classes (perhaps with common functionality in another). This will make your code a little smaller and make it much easier for someone to understand.
a source to share
Replace conditional with polymorphism
For half a measure, for findsite (), you can simply tweak the HashMap to get from the site id to the site. An alternative cleanup would be to simply return the site string, thus:
String findsite(txt) {
...................
if line.indexOf("site1-identifier")
return site1;
if(line.indexOf("site2-identifier")
return site2;
if(line.indexOf("site3-identifier")
return site3;
...
}
Using indexOf () in this way is not very expressive; I would use equals () or contains ().
a source to share
I was thinking about outputting the lexer / parser replicator in sep. classes and then creating them as soon as I get a match
It looks like you already have an answer. This would create more flexible, but not necessary faster code.
I assume some benchmarking is ok
Yes, measure both approaches and make an informed decision. My guess is how you've had enough of it already.
Maybe if you need kilometric you can refactor it in different functions using the extract method .
The most important thing is to first create a solution that gets the job done, even if it is slow, and once you get it working, analyze it and identify the points at which performance can be improved. Remember the "Optimization Rules"
a source to share
Can have an ids map against sites and then just iterate over the map entries.
// define this as a static somewhere ... build from a properties file
Map<String,String> m = new HashMap<String,String>(){{
put("site1-identifier","site2");
put("site2-identifier","site2");
}}
// in your method
for(Map.Entry<String,String> entry : m.entries()){
if( line.contains(entry.getKey())){
return line.getValue();
}
}
cleaner: yes faster: dunno ... should be fast enough
a source to share
Can use reflection maybe
char site = line.charAt(4);
Method lexerMethod = this.getClass().getMethod( "loadLexer" + site, *parameters types here*)
Method parserMethod = this.getClass().getMethod( "loadparser" + site, *parameters types here*)
lexerMethod.invoke(this, *parameters here*);
parserMethod.invoke(this, *parameters here*);
a source to share
I don't know about Java, but some languages allow switching to strings.
switch(site)
{
case "site1": loadlexer1; loadparser1; break;
case "site2": loadlexer2; loadparser2; break;
...
}
For the second bit, use a regex to extract the identifier and include it. You might be better off using enum
.
a source to share