A. Determine if whether the system is using DNS or a local /etc/hosts file for hostname resolution.
To do this, look at the uncommented section of the /etc/netsvc.conf file, and check to see if the 'NSORDER' environment variable has been set:
Examples
A1. The last 5 lines of an /etc/netsvc.conf file from a system using DNS only:
# tail -6 /etc/netsvc.conf
# Example:
# aliases = nis, files
#
#
hosts=bind4
A2. The last 5 lines of an /etc/netsvc.conf file from a system using local /etc/hosts only:
# tail -6 /etc/netsvc.conf
# Example:
# aliases = nis, files
#
#
hosts=local4
A3. The last 5 lines of an /etc/netsvc.conf file from a system that will look at /etc/hosts first, and then querry the DNS server(s) if the hostname or IP is not found in the /etc/hosts file:
# tail -6 /etc/netsvc.conf
# Example:
# aliases = nis, files
#
#
hosts=local4,bind4
A4. The last 5 lines of an /etc/netsvc.conf file from a system that will look at the DNS server(s) first, and then querry the local /etc/hosts file if the hostname or IP is not found in the DNS server(s):
# tail -6 /etc/netsvc.conf
# Example:
# aliases = nis, files
#
#
hosts=bind4,local4
A5. Use the 'env' and 'echo' commands to verify the 'NSORDER' environment variable has not been set. If it has been set, this will override the /etc/netsvc.conf file settings:
# env | grep NSORDER
NSORDER=hosts=local4,bind4
# echo $NSORDER
NSORDER=hosts=local4,bind4
B. The RES_OPTIONS=debug environment varible will give details about the path a network packet takes to reach the desired target.
The output can be examined to see what nameservers are being called, when they are called, and in what order.
NOTES:
A1. Depending on whether you are using IPv4, IPv6, or both, you may see 'hosts=bind', or 'hosts=bind6'.
A2. Depending on whether you are using IPv4, IPv6, or both, you may see 'hosts=local', or 'hosts=local6'.
A3. Depending on whether you are using IPv4, IPv6, or both, you may see 'hosts=local,bind', or 'hosts=local6,bind6'.
A4. Depending on whether you are using IPv4, IPv6, or both, you may see 'hosts=bind,local', or 'hosts=bind6,local6'.
B. You will need the name or IP of the system the JVM is attempting to communcate with, and a knowledge of how your internal network is configured.
Additional notes:
- With the "options rotate" entry in your /etc/resolv.conf file, all of your nameservers will be querried.
This could slow down lookups, unless DNS caching has been set up.
Exapmle:
# cat /etc/resolv.conf
nameserver 192.168.2.200
nameserver 192.168.130.50
domain domain.ibm.com
domain ibm.com
options rotate <--------------
*** For the examples in this technote, the application server that is the target of our network packets will be named 'chris.domain.ibm.com' ***
The syntax when using the RES_OPTIONS=debug variable with a command is:
# RES_OPTIONS=debug command command_options
For example:
# RES_OPTIONS=debug ssh chris.domain.ibm.com
*Or you can run your code while setting "RES_OPTIONS=debug" environment variable:
# RES_OPTIONS=debug ./res_nsearch chris.domain.ibm.com
Output when successfully resolving a hostname:
# RES_OPTIONS=debug ssh chris.domain.ibm.com
;; res_setoptions("debug", "env")..
;; debug
;; calling process id = 7995714
;; res_nquerydomain(chris.domain.ibm.com,
;; res_query(chris.domain.ibm.com, 1, 1)
;; res_nmkquery(QUERY, chris.domain.ibm.com, IN, A)
;; res_send()
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50316
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; chris.domain.ibm.com, type = A, class = IN
;; Querying server (# 1) address = 192.168.2.200
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50316
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 8
;; chris.domain.ibm.com, type = A, class = IN
chris.domain.ibm.com. 15M IN A 192.168.235.38
ibm.com. 20h17m46s IN NS xxxx.akam.net.
ibm.com. 20h17m46s IN NS xxxx-206.akam.net.
ibm.com. 20h17m46s IN NS xxxx.akam.net.
ibm.com. 20h17m46s IN NS xxxx.akam.net.
ibm.com. 20h17m46s IN NS xxxx.akam.net.
ibm.com. 20h17m46s IN NS xxxx.akam.net.
ibm.com. 20h17m46s IN NS xxxx-99.akam.net.
ibm.com. 20h17m46s IN NS xxxx.akam.net.
xxxx.akam.net. 2h54m19s IN A 192.168.160.64
xxxx.akam.net. 18h23m45s IN A 192.168.61.64
xxxx-99.akam.net. 46m8s IN AAAA 2600:1401:2::63
xxxx-206.akam.net. 4h23m12s IN AAAA 2600:1401:2::ce
xxxx.akam.net. 10h35s IN A 192.168.50.64
xxxx.akam.net. 2h54m19s IN A 192.168.25.64
xxxx.akam.net. 10h36s IN A 192.168.161.64
xxxx.akam.net. 10h39s IN A 192.168.173.64
chris.domain.ibm.com is 192.168.235.38 <---- Successfully resolved
*NOTES:
- The output in bold is the output to focus on as it displays the name you are querying for, as well as the name server(s) your system is set up to query.
Output when a timeout occurs:
# RES_OPTIONS=debug telnet peters1.domain.ibm.com
;; res_setoptions("debug", "env")..
;; debug
;; calling process id = 7274858
;; res_nquerydomain(peters1.domain.ibm.com,
;; res_query(peters1.domain.ibm.com, 1, 1)
;; res_nmkquery(QUERY, peters1.domain.ibm.com, IN, A)
;; res_send()
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2349
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; peters1.domain.ibm.com, type = A, class = IN
;; Querying server (# 1) address = 192.168.130.50
;; timeout
;; Querying server (# 2) address = 192.168.2.200 <-------- nameserver queried
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2349
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 8
;; peters1.domain.ibm.com, type = A, class = IN
peters1.domain.ibm.com. 7m15s IN A 192.168.98.21
ibm.com. 2h33m12s IN NS eur2.akam.net.
ibm.com. 2h33m12s IN NS ns1-206.akam.net.
ibm.com. 2h33m12s IN NS usc3.akam.net.
ibm.com. 2h33m12s IN NS usw2.akam.net.
ibm.com. 2h33m12s IN NS ns1-99.akam.net.
ibm.com. 2h33m12s IN NS eur5.akam.net.
ibm.com. 2h33m12s IN NS asia3.akam.net.
ibm.com. 2h33m12s IN NS usc2.akam.net.
usc2.akam.net. 5h41m37s IN A 184.26.160.64
asia3.akam.net. 39m11s IN A 23.211.61.64
ns1-99.akam.net. 8h1m34s IN AAAA 2600:1401:2::63
ns1-206.akam.net. 11h38m42s IN AAAA 2600:1401:2::ce
usc3.akam.net. 17h16m1s IN A 96.7.50.64
eur5.akam.net. 17h16m2s IN A 23.74.25.64
usw2.akam.net. 17h16m2s IN A 184.26.161.64
eur2.akam.net. 17h16m5s IN A 95.100.173.64
Trying...
;; res_nquerydomain(peters1.domain.ibm.com,
;; res_query(peters1.domain.ibm.com, 1, 1)
;; res_nmkquery(QUERY, peters1.domain.ibm.com, IN, A)
;; res_send()
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45176
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; peters1.domain.ibm.com, type = A, class = IN
;; Querying server (# 1) address =
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45176
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 8
;; peters1.domain.ibm.com, type = A, class = IN
peters1.domain.ibm.com. 10m6s IN A 192.168.98.21
ibm.com. 2h36m3s IN NS ns1-99.akam.net.
ibm.com. 2h36m3s IN NS usw2.akam.net.
ibm.com. 2h36m3s IN NS ns1-206.akam.net.
ibm.com. 2h36m3s IN NS asia3.akam.net.
ibm.com. 2h36m3s IN NS eur2.akam.net.
ibm.com. 2h36m3s IN NS usc3.akam.net.
ibm.com. 2h36m3s IN NS eur5.akam.net.
ibm.com. 2h36m3s IN NS usc2.akam.net.
usc2.akam.net. 5h44m28s IN A 184.26.160.64
asia3.akam.net. 42m2s IN A 23.211.61.64
ns1-99.akam.net. 8h4m25s IN AAAA 2600:1401:2::63
ns1-206.akam.net. 11h41m33s IN AAAA 2600:1401:2::ce
usc3.akam.net. 17h18m52s IN A 96.7.50.64
eur5.akam.net. 17h18m53s IN A 23.74.25.64
usw2.akam.net. 17h18m53s IN A 184.26.161.64
eur2.akam.net. 17h18m56s IN A 95.100.173.64
telnet: connect: Connection timed out <----- failed to resolve the hostname
NOTES:
- The output in bold is the output to focus on, as it displays the name you are querying for, as well as the name server(s) your system is set up to query.
- Notice two nameservers are querried. This is the result of having two name servers, as well as "options rotate" in the /etc/resolv.conf file.
A. If the connection times out, then we know the issue is somewhere outside of the JVM and application logic. The following areas will need to be investigated further:
1. AIX socket
2. AIX kernel
3. Network layer on AIX
4. The network itself
Ensure that there are no issues with the nameserver that your server was not able to contact, as well as the network structure itself (this may require physically inspecting connections, cables, etc) if necessary.
If Java and the application logic are not the root cause, and neither are any components of the network or the nameserver, please open a PMR with the AIIX networking team to further troubleshoot any AIX components (AIX socket, AIX kernel, etc).
B. On the other hand, if the output of RES_OPTIONS testing shows a successful network connection, check if problem is in JVM or application logic.
if the problem persists, collect data as per URL below and upload:
http://www-01.ibm.com/support/docview.wss?uid=isg3T1022750">http://www-01.ibm.com/support/docview.wss?uid=isg3T1022750
Document Type: | Instruction |
Content Type: | Troubleshooting |
Hardware: | all Power |
Operating System: | AIX 6 | AIX 7 |
IBM Java: | all Java Versions |
Author(s): | Christopher C.D. Peters |
Reviewer(s): | Rama Tenjarla |