How to turn conditional chaining into faster ugly code?

I have 9 different grammars. One of them will be loaded depending on what the first txt line is in the file it is parsing.

I was thinking about outputting the lexer / parser replicator in sep. classes and then creating them as soon as I get the match - not sure if this will slow me down or not. I think some benchmarking is ok.

Indeed, speed is definitely my goal here, but I know this is ugly code.

The code now looks something like this:

sin.mark(0)
site = findsite(txt)
sin.reset()

if ( site == "site1") {
   loadlexer1;
   loadparser1;
} else if (site == "site2") {
   loadlexer2;
   loadparser2;
}
.................
} else if (site == "site8") {
   loadparser8;
   loadparser8;
}

findsite(txt) {
  ...................
  if line.indexOf("site1-identifier") {
    site = site1;
  } else if(line.indexOf("site2-identifier") {
    site = site2;
  } else if(line.indexOf("site3-identifier") {
    site = site3;
  }
  .........................
  } else if(line.indexOf("site8-identifier") {
    site = site8;
  }
}

      

some clarifications

1) yes, I do have 9 different grammars that I built with antlr so that they ALL have their own lexer / parser objs.

2) yes, at the moment we are comparing strings and obivously which will be replaced with some kind of whole display. I have also considered attaching site IDs to a single regex, however I don't believe that would speed things up.

3) yes, it's pseudocode, so I wouldn't get too picky about the semantics here.

4) kdgregory correctly notes that I cannot create a single instance of a lexer / parser pair

I like the idea of ​​the hash to make the code look a little better, but I don't think it will speed me up to anyone.

0


a source to share


11 replies


The standard approach is to use a map to connect the key strings to a lexer that will handle them:

Map<String,Lexer> lexerMap = new HashMap<String,Lexer>();
lexerMap.put("source1", new Lexer01());
lexerMap.put("source2", new Lexer02());
// and so on

      

Once you get the string that identifies the lexer to use, you should get it from the map like this:

String grammarId = // read it from a file, whatever
Lexer myLexer = lexerMap.get(grammarId);

      



However, there are a few quirks in your example code. First, the calls to indexOf () indicate that you don't have a separate line and Map won't look inside the line. So you need to somehow extract the actual key from whatever string you read.

Second, lexers and parsers generally maintain state, so you can't create one instance and reuse it. This means you need to create a factory class and store it in the map (this is the Abstract factory pattern).

If you expect to have many different lexers / parsers, then it makes sense to use a map based approach. For a small number, the if-else chaining is probably best suited, properly encapsulated (that's the factory method pattern).

+7


a source


Using polymorphism is almost guaranteed to be faster than string manipulation and will be checked for correctness at compile time. Is the site

string valid ? If so, FindSite should be called GetSiteName. I would expect FindSite to return an object site

that knows the corresponding lexer and parser.



Another speed issue is encoding speed. It would be better to have your different lexers and parsers in separate classes (perhaps with common functionality in another). This will make your code a little smaller and make it much easier for someone to understand.

+2


a source


Sort of:

Map <String, LexerParserTuple> lptmap = new HashMap <String, LexerParserTuple> ();
lpt = lptmap.get (site)
lpt.loadlexer ()
lpt.loadparser ()

coupled with some regex magic rather than string.indexOf () to grab site names should clean up your code significantly.

+1


a source


Replace conditional with polymorphism

For half a measure, for findsite (), you can simply tweak the HashMap to get from the site id to the site. An alternative cleanup would be to simply return the site string, thus:

String findsite(txt) {
  ...................
  if line.indexOf("site1-identifier") 
    return site1;
  if(line.indexOf("site2-identifier")
    return  site2;
  if(line.indexOf("site3-identifier")
    return  site3;
...
}

      

Using indexOf () in this way is not very expressive; I would use equals () or contains ().

+1


a source


I was thinking about outputting the lexer / parser replicator in sep. classes and then creating them as soon as I get a match

It looks like you already have an answer. This would create more flexible, but not necessary faster code.

I assume some benchmarking is ok

Yes, measure both approaches and make an informed decision. My guess is how you've had enough of it already.

Maybe if you need kilometric you can refactor it in different functions using the extract method .

The most important thing is to first create a solution that gets the job done, even if it is slow, and once you get it working, analyze it and identify the points at which performance can be improved. Remember the "Optimization Rules"

+1


a source


Let's assume your code is inefficient.

Will it take longer than (say) 1% of the time to actually parse the input?

If not, you have a great "fish to fry".

+1


a source


I would change the type of findsite to return the site type (superclass) and then use polymorphism ... This should be faster than string manipulation ...

Need separate lexers?

0


a source


Use the sitemap to customize your site to load your strategy structure. Then a simple search based on "site" is required and you follow the appropriate strategy. The same can be done for findSite ().

0


a source


Can have an ids map against sites and then just iterate over the map entries.

// define this as a static somewhere ... build from a properties file
Map<String,String> m = new HashMap<String,String>(){{
    put("site1-identifier","site2");
    put("site2-identifier","site2");
}}

// in your method
for(Map.Entry<String,String> entry : m.entries()){
    if( line.contains(entry.getKey())){
        return line.getValue();
    }
}

      

cleaner: yes faster: dunno ... should be fast enough

0


a source


Can use reflection maybe

char site = line.charAt(4);
Method lexerMethod = this.getClass().getMethod( "loadLexer" + site, *parameters types here*)
Method parserMethod = this.getClass().getMethod( "loadparser" + site, *parameters types here*)

lexerMethod.invoke(this, *parameters here*);
parserMethod.invoke(this, *parameters here*);

      

0


a source


I don't know about Java, but some languages ​​allow switching to strings.

switch(site)
{
    case "site1": loadlexer1; loadparser1; break;
    case "site2": loadlexer2; loadparser2; break;
    ...
}

      

For the second bit, use a regex to extract the identifier and include it. You might be better off using enum

.

0


a source







All Articles