最近在解决一个客户的Escalation时,花了较多的时间解决了server端的问题,并加了更详细的log. 原先问题解决后,发现在客户的环境下出现了新的问题。没有response返回到client。server端抛出如下错误:

java.io.IOException:
java.net.SocketException: Connection reset
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
	at weblogic.utils.io.ChunkedOutputStream.writeTo(ChunkedOutputStream.java:284)
	at weblogic.servlet.internal.ServletOutputStreamImpl.writeHeader(ServletOutputStreamImpl.java:170)
	at weblogic.servlet.internal.ResponseHeaders.writeHeaders(ResponseHeaders.java:498)
	at weblogic.servlet.internal.ServletResponseImpl.writeHeaders(ServletResponseImpl.java:1315)
	at weblogic.servlet.internal.ServletOutputStreamImpl.sendHeaders(ServletOutputStreamImpl.java:284)
	at weblogic.servlet.internal.ChunkOutput.flush(ChunkOutput.java:433)
	at weblogic.servlet.internal.CharsetChunkOutput.flush(CharsetChunkOutput.java:298)
	at weblogic.servlet.internal.ChunkOutput$2.checkForFlush(ChunkOutput.java:657)
	at weblogic.servlet.internal.CharsetChunkOutput.write(CharsetChunkOutput.java:200)
	at weblogic.servlet.internal.ChunkOutputWrapper.write(ChunkOutputWrapper.java:148)
	at weblogic.servlet.internal.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:151)
	at org.apache.axis.utils.ByteArray.writeTo(ByteArray.java:375)
	at org.apache.axis.SOAPPart.writeTo(SOAPPart.java:265)
	at org.apache.axis.Message.writeTo(Message.java:539)
	at org.apache.axis.transport.http.AxisServlet.sendResponse(AxisServlet.java:902)
	at org.apache.axis.transport.http.AxisServlet.doPost(AxisServlet.java:777)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:751)
	at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:374)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:844)
	at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:242)
	at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:216)
	at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:132)
	at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:338)
	at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:221)
	at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3284)
	at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3254)
	at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
	at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
	at weblogic.servlet.provider.WlsSubjectHandle.run(WlsSubjectHandle.java:57)
	at weblogic.servlet.internal.WebAppServletContext.doSecuredExecute(WebAppServletContext.java:2163)
	at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2089)
	at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2074)
	at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1513)
	at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:254)
	at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
	at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

通过fiddler和服务端打印的信息,我们可以确定server返回时出错,写response时,connection 被reset/close了。研究了一些参考文档:

http://stackoverflow.com/questions/62929/java-net-socketexception-connection-reset

这种情况可以确定是被外部kill掉了,于是想到客户是否使用了集群及loadbalance setting。做了单server下的测试,没有问题。单node下loadbalance的情况问题继续重现。可以确认为loadbalance下的设置有问题。客户使用big-IP. 于是网上查了下相关资料:

https://support.f5.com/kb/en-us/solutions/public/7000/600/sol7606.html

这篇文章很好解释了相关的timeout设置,默认值为300秒,解释了为什么超过5分钟的请求就会fail。

于是要求客户更改了TCP Protocol的timeout时间为1个小时,再做测试。问题解决。注意HTTP的timeout 设置基于TCP.

The following BIG-IP objects have idle time-out values:

Protocol profiles
OneConnect profile
SNATs
NATs

发表评论