A clearer way to parse a token from a ruby string

Question

A clearer way to parse a token from a ruby string

I'm trying to clean up something, heck, and looking for ways to do it. The idea is that instead of using regex in my rules to parse a string, I would like to use something closer to the route syntax "something /: searchitem / somethingelse" and then give a string like "/ something / FOUNDIT / somethingelse "" you get the result "FOUNDIT".

Here's an example I'm refactoring: Given the input string, say " http://claimid.com/myusername ". I want to be able to run this string against multiple possible matches and then return "myusername" for one of them.

The data for launching it may look like this:

PROVIDERS = [
  "http://openid.aol.com/:username",
  "http://:username.myopenid.com",
  "http://claimid.com/:username",
  "http://:username.livejournal.com"]

  something_here("http://claimid.com/myusername") # => "myusername"

Any good way to map a string like this http://claimid.com/myusername

to this list and understand the results? Or any methods to make something like this easier? I've been going over the rail routing code as it does something like this, but it's not the easiest code to run.

Right now I just do it with regex, but it looks like the above method will be much easier to read

PROVIDERS = [
  /http:\/\/openid.aol.com\/(\w+)/,
  /http:\/\/(\w+).myopenid.com/,
  /http:\/\/(\w+).livejournal.com/,
  /http:\/\/flickr.com\/photos\/(\w+)/,
  /http:\/\/technorati.com\/people\/technorati\/(\w+)/,
  /http:\/\/(\w+).wordpress.com/,
  /http:\/\/(\w+).blogspot.com/,
  /http:\/\/(\w+).pip.verisignlabs.com/,
  /http:\/\/(\w+).myvidoop.com/,
  /http:\/\/(\w+).pip.verisignlabs.com/,
  /http:\/\/claimid.com\/(\w+)/]

url = "http://claimid.com/myusername"
username = PROVIDERS.collect { |provider|
  url[provider, 1]
}.compact.first

+1

ruby regex openid

AdamFortuna May 21 '09 at 4:09 am

a source to share

4 answers

How about String include?

or index

?

url.include? "myuserid"

Or do you need a positional thing? If so, you can split

url.

Yes, third thought: using your input form with: username, create and compile a Regexp for each such line and use Regexp # match to return the MatchData . If you have kept the Regexp and field index: username pairs, you can do it directly.

+2

Charlie martin May 21 '09 @ 4:21 am

a source to share

I still think regex might be the solution here. However, you need to write code that generates a regex from a trace-like string. Sample code:

class Router
    def initialize(routing_word)
        @routes = routing_word.scan /:\w+/
        @regex = routing_word
        @regex.gsub!('/','\\/')
        @regex = Regexp.escape(@regex)
        @regex.gsub!(/:\w+/,'(\w+)')
            @regex = '^'+@regex+'$'
        @regex = Regexp.new(@regex)
    end
    def match(url)
        matches = url.match @regex
        ar = matches.to_a[1..-1]
        h = {}
        @routes.zip(ar).each {|k,v| h[k] = v}
        return h
    end
end

r = Router.new('|:as|:sa')
puts r.match('|a|b').map {|k,v| "#{k} => #{v}\n"}

Use a router for each routing line. It should return good hash tables that match the url column urls to the actual url components.

To recognize a given URL, you need to go through all the routers and find out which one is accepting the given URL.

class OpenIDRoutes
    def initialize()
        routes = [
           "http://openid.aol.com/:username/",
           "http://:username.myopenid.com/",
           "http://:username.livejournal.com/",
           "http://flickr.com/photos/:username/",
           "http://technorati.com/people/technorati/:username/",
           "http://:username.wordpress.com/",
           "http://:username.blogspot.com/",
           "http://:username.pip.verisignlabs.com/",
           "http://:username.myvidoop.com/",
           "http://:username.pip.verisignlabs.com/",
           "http://claimid.com/:username/"
        ].map {|x| Router.new x}
    end

    #given a URL find out which route does it fit
    def route(url)
        for r in routes
            res = r.match url
            if res then return res
         end
    end

r = OpenIDRoutes.new
puts r.route("http://claimid.com/myusername")

I think there is a good and simple implementation of most of the routing routes.

+1

Elazar Leibovich May 21 '09 at 4:53

a source to share

It's a bit of a URI, but the standard library has a URI.split ():

require 'uri'

URI.split("http://claimid.com/myusername")[5] # => "/myusername"

You can use it somehow.

CJ

+1

CJ. May 21 '09 at 14:06

a source to share

tomafro · Accepted Answer · 2009-05-21T10:31:06+0000

I think it is best to create regexes as Elazar suggested earlier. If you just match one field (: username), then something like this will work:

PROVIDERS = [
   "http://openid.aol.com/:username/",
   "http://:username.myopenid.com/",
   "http://:username.livejournal.com/",
   "http://flickr.com/photos/:username/",
   "http://technorati.com/people/technorati/:username/",
   "http://:username.wordpress.com/",
   "http://:username.blogspot.com/",
   "http://:username.pip.verisignlabs.com/",
   "http://:username.myvidoop.com/",
   "http://:username.pip.verisignlabs.com/",
   "http://claimid.com/:username/"
]

MATCHERS = PROVIDERS.collect do |provider|
  parts = provider.split(":username")
  Regexp.new(Regexp.escape(parts[0]) + '(.*)' + Regexp.escape(parts[1] || ""))
end

def extract_username(url)
  MATCHERS.collect {|rx| url[rx, 1]}.compact.first
end

It's very similar to your own code, only the vendor list is much cleaner, making it easier to maintain and add new vendors as needed.

A clearer way to parse a token from a ruby ​​string

More articles:

A clearer way to parse a token from a ruby string