What's the easiest way to parse a string in C?
I need to parse this line in C:
XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n
And you will be able to get part 207.46.106.118
and part 1863
(first ip address).
I know I can go char to char and eventually find my way through it, but the easiest way to get this information, given that the IP address in the string might change to a different format (with a smaller digit)?
a source to share
You can use sscanf()
from the C standard library. Here's an example of how to get ip and port as strings, assuming the part before the address is constant:
#include <stdio.h>
int main(void)
{
const char *input = "XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n";
const char *format = "XFR 3 NS %15[0-9.]:%5[0-9]";
char ip[16] = { 0 }; // ip4 addresses have max len 15
char port[6] = { 0 }; // port numbers are 16bit, ie 5 digits max
if(sscanf(input, format, ip, port) != 2)
puts("parsing failed");
else printf("ip = %s\nport = %s\n", ip, port);
return 0;
}
The important parts of format strings are scan patterns %15[0-9.]
and %5[0-9]
, which will match a string of no more than 15 characters, consisting of numbers or dots (i.e., IP addresses will not be checked for correctness) and a string of no more than 5 digits, respectively (which means that invalid port numbers above 2 ^ 16 - 1 will slip through).
a source to share
Depends on what determines the format of the document. In this case, it can be as simple as tokenize the string and view the tokens for whatever you want. Just use strtok
and split with spaces to capture 207.46.106.118:1863
, and then you can repeat it again (or just scan for :
manually) to get the components you want.
a source to share
Scroll down until you get the first "." and don't loop until you find ". Loop forward until you find": ", constructing substrings every time you meet." or ':'. You can check the number of substrings and their length as a simple error check. Then loop until you find "'and you have the 1863 piece.
This will be reliable as long as the beginning of the line is not much different. It is also very easy. You can make it even easier if the string always starts with "XFR 3 NS".
a source to share
This might be overkill since you said you didn't want to use the regex library, but the re2c program will give you a regex without the library: it generates the DFSM for the regex as C code. The regex is specified in the comments embedded in the C code.
And what seems like overkill can now become a comfort for you if you have to parse the rest of the line; it's much easier to change a few regexes to tweak or add new syntax than it is to change a bunch of custom tokenization code. And it makes the structure of what you parse much more clearly in your code.
a source to share