A new (NAT) firewall appliance was recently installed at $WORK. Since then, I’m getting many network timeouts and interruptions, especially for operations which would require the server to think for a bit without a response (svn update, rsync, etc.). Inbound SSH sessions over VPN also timeout frequently.
That clearly suggests I need to adjust the TCP (and ssh) keepalive time on the servers in question in order to reduce these errors.
But what is the appropriate value I should use?
Assuming I have machines on both sides of the firewall between which I can make a connection, is there a way to measure what the time limit on TCP connections might be for this firewall?
In theory, I would send a packet with gradually increasing intervals until the connection is lost. Any tools that might help (free or open source would be best, but I’m open to other suggestions)?
The appliance is not under my control, so I can’t just get the value, though I am attempting to ask what it currently is and if I can get it increased.
I’m thinking that you just need to connect from one machine to the other while running a packet capture on one of the machines. Make an FTP, HTTP, SSH, etc. session and just let it sit there until it times out.
I’m not sure what you mean when you say “In theory, I would send a packet with gradually increasing intervals until the connection is lost”, but I don’t think you need to do anything other than make a connection, capture the traffic, and let it sit until it times out. Timeouts occur on idle sessions and if you send data to the other end it will probably reset the timer as the session will no longer be idle.
When it does time out, look at the timestamp of the capture from the first packet (beginning of the three way handshake) until the connection is terminated (you may or may not see a RST).
Barring any application layer timeouts (depending on what type of connection you make) this should give you an idea of what the timeout setting is configured to.