Inspection of C pipelines passing through the program - borderline cases

I am fetching from socket A and writing to socket B on the fly (proxy for example). I would like to check and possibly modify the data going through. My question is how to handle edge cases, i.e. where the regex I'm looking for will match between two serial sockets. Read and Socket B write iterations.

char buffer[4096]
int socket_A, socket_B

/* Setting up the connection goes here */

for(;;) {

    recv(socket_A, buffer, 4096, 0);

    /* Inspect, and possibly modify buffer */

    send(socket_B, buffer, 4096, 0);

    /* Oops, the matches I was looking for were at the end of buffer,
     * and will be at the beginning of buffer next iteration :( */

}

      

0


a source to share


6 answers


My suggestion is to have two buffers and rotate between them:

  • Recv buffer 1
  • Recv 2 buffer
  • Process.
  • Send buffer 1
  • Recv buffer 1
  • Process, but with buffer 2 before buffer 1.
  • Send buffer 2
  • Go to 2.


Or something like that?

+1


a source


Assuming the maximum length M of a possible regex is the same (or can live with an arbitrary value - or just use the entire buffer), you can handle it without transferring the full buffer, but saving M-1 bytes back. In the next iteration, put the new received data at the end of the M-1 bytes and apply the regular expression.



If you know the format of the data being transferred (for example, http), you should be able to parse the content, which should know when you have reached the end of the message, and should send the remaining bytes that you might have cached. If you don't know the format, then you will need to implement a timeout in recv so that you don't hang around the end of the message too long. Taking too long is something you'll have to decide for yourself

+1


a source


In that sense, you say (and all senses, say TCP), sockets are streams. It follows from your question that you have some data structure. Therefore, you should do something similar to the following:

  • Buffer (hold) incoming data until the boundary is reached. The boundary can be trailing, trailing, or whatever you know your regex will match.
  • When the "write" is ready, process it and place the results in the output buffer.
  • Write anything accumulated in the output buffer.

This is the case for most cases. If you have one of the rare cases where there is actually no "record", you need to create some kind of state machine (DFA). By this I mean that you should be able to accumulate data until a) it can match your regex, or b) it is complete.

EDIT: If you are matching fixed strings instead of a real regex you should use the Boyer-Moore algorithm , which can actually run in sublinear time (by skipping characters). If you do this correctly, as you move down the input, you can throw previously seen data into the output buffer as you go, reducing latency and dramatically increasing throughput.

+1


a source


You need to know and / or say something about your regex.

Depending on the regex, you may need more buffering than buffering.

The worst case scenario could be something like a regex that says "find everything from the beginning to the first occurrence of the word" dog "and replace it with something else": if you have a regex that way, you need buffer (no forwarding) everything from the beginning until the first occurrence of the word "dog": this may never happen, i.e. can be infinite for a buffer.

+1


a source


Basically the problem with your code is that the recv / send loop is running at a lower network level than your changes. How you solve this problem depends on what changes you make, but it probably involves buffering the data until all local changes are made.

EDIT: I don't know of any regex library that can filter a stream like this. How difficult it is will depend on your regex and the protocol it filters.

0


a source


One option is to use a poll(2)

non-blocking socket strategy . In read mode, grab a buffer from the socket, push it to the incoming queue, call lexer / parser / matcher, which collects the buffers into a stream and then pushes the chunks to the output queue. For a write event, take a chunk from the output queue, if any, and write it to the socket. It sounds pretty complicated, but it really isn't once you get used to the inverted control model.

0


a source







All Articles