2006-09-02 weird DNS problem

Original Problem
I can't tell if this means I'm being hacked somehow or if it's just a network glitch at EarthLink, but at essentially random times Samba will start returning weird IP addresses for local machines on the network. The addresses are consistently within a particular range, though the exact assignments seem to change.

My main worry is that this means my network traffic is being routed via some hacker's machine (which would be consistent with some odd delays and errors loading web pages) which might then allow passwords and such to be picked up.

Some quick pastes:

From gonzo (KUbuntu Dapper):
 * net lookup gonzo: 192.168.0.103
 * net lookup bunsen: 209.86.66.92
 * net lookup beaker: 209.86.66.91
 * net lookup mokey: 209.86.66.93
 * net lookup floyd: 192.168.0.110
 * net lookup melorr: 209.86.66.90

Similar (but not identical) results doing "ping" to various machines from Beaker (Win98), but everything went back to normal when I rebooted it.


 * traceroute 209.86.66.92
 * traceroute to 209.86.66.92 (209.86.66.92), 30 hops max, 40 byte packets
 * 1 192.168.0.1 (192.168.0.1)  0.798 ms  0.365 ms  0.237 ms
 * 2 10.40.64.1 (10.40.64.1)  7.091 ms  6.291 ms  11.775 ms
 * 3 srp8-0.rlghnca-rtr1.nc.rr.com (24.25.2.161)  8.231 ms  6.120 ms  7.064 ms
 * 4 pos14-0.rlghncrdc-rtr1.nc.rr.com (24.25.0.5)  8.592 ms  7.250 ms  7.695 ms
 * 5 pos12-0.rlghncrdc-rtr2.nc.rr.com (24.93.64.37)  8.393 ms  7.100 ms  6.096 ms
 * 6 tenge-1-4.car1.Raleigh1.Level3.net (4.71.160.1)  23.241 ms  21.624 ms  22.056 ms
 * 7 ae-11-11.car2.Raleigh1.Level3.net (4.69.132.174)  79.548 ms  21.611 ms  21.724 ms
 * 8 ae-6-6.ebr2.Washington1.Level3.net (4.69.132.178)  27.750 ms *  31.726 ms
 * 9 ae-24-56.car4.Washington1.Level3.net (4.68.121.177)  85.176 ms ae-24-52.car4.Washington1.Level3.net (4.68.121.49)  24.588 ms ae-24-54.car4.Washington1.Level3.net (4.68.121.113)  28.596 ms
 * 10 unknown.Level3.net (64.158.57.50)  19.989 ms  18.623 ms  24.089 ms
 * 11 bor02-so-3-1.ga-atlanta0.ne.earthlink.net (209.165.110.73)  24.023 ms  21.807 ms  28.283 ms
 * 12 bor01-ge-1-0-0.ga-atlanta1.ne.earthlink.net (209.165.110.106)  24.091 ms  22.520 ms  24.604 ms
 * 13 * * *
 * 14 * * *
 * 15 * * *
 * 16 * * *
 * 17 * * *
 * 18 * * *
 * 19 * * *
 * 20 * * *
 * 21 * * *
 * 22 * * *
 * 23 * * *
 * 24 * * *
 * 25 * * *
 * 26 * * *
 * 27 * * *
 * 28 * * *
 * 29 * * *
 * 30 * * *

nslookup 209.86.66.92 Server:        192.168.0.1 Address:       192.168.0.1#53 Non-authoritative answer: 92.66.86.209.in-addr.arpa      name = elydm.03.am.barefruit.com.

woozle@gonzo:~$ nslookup 209.86.66.91 Server:        192.168.0.1 Address:       192.168.0.1#53 ** server can't find 91.66.86.209.in-addr.arpa: SERVFAIL

woozle@gonzo:~$ nslookup 209.86.66.91 Server:        192.168.0.1 Address:       192.168.0.1#53 Non-authoritative answer: 91.66.86.209.in-addr.arpa      name = elydm.02.am.barefruit.com.


 * 209.86.66.90: elydm.01.am.barefruit.com
 * 209.86.66.91: elydm.02.am.barefruit.com
 * 209.86.66.92: elydm.03.am.barefruit.com

2006-09-04 More information
C:\WINDOWS>tracert floyd

Tracing route to floyd.earthlink.net [209.86.66.94] over a maximum of 30 hops:

1  <10 ms   <10 ms   <10 ms  192.168.0.1 2    7 ms    12 ms     6 ms  10.40.64.1 3    7 ms     8 ms    11 ms  srp8-0.rlghnca-rtr2.nc.rr.com [24.25.2.163] 4    9 ms     7 ms     7 ms  pos14-0.rlghncrdc-rtr2.nc.rr.com [24.25.0.9] 5   12 ms    13 ms    13 ms  son1-0-1.chrlncsa-rtr6.carolina.rr.com [24.93.64.81] 6   12 ms    13 ms    11 ms  pop1-cha-P4-0.atdn.net [66.185.132.45] 7   12 ms    12 ms    12 ms  bb1-cha-P3-0.atdn.net [66.185.138.64] 8   17 ms    17 ms    17 ms  bb1-atm-P6-0.atdn.net [66.185.152.182] 9   17 ms    17 ms    18 ms  pop1-atm-P0-0.atdn.net [66.185.147.193] 10   17 ms    17 ms    18 ms  Earthlink.atdn.net [66.185.150.6] 11   16 ms    17 ms    17 ms  floyd.earthlink.net [209.86.66.94]

2006-09-08 More logs
A series of " lookup"s, all done within less than a minute:
 * woozle@gonzo:~$ net lookup floyd -l
 * 209.86.66.90
 * woozle@gonzo:~$ net lookup floyd -l
 * 192.168.0.101
 * woozle@gonzo:~$ net lookup floyd -l
 * 209.86.66.91
 * woozle@gonzo:~$ net lookup floyd -l
 * 209.86.66.91
 * woozle@gonzo:~$ net lookup floyd -l
 * 192.168.0.101
 * woozle@gonzo:~$ net lookup floyd -l
 * 209.86.66.92
 * woozle@gonzo:~$ net lookup floyd -l
 * 192.168.0.101
 * woozle@gonzo:~$ net lookup floyd -l
 * 209.86.66.92
 * woozle@gonzo:~$ net lookup floyd -l
 * 192.168.0.101
 * woozle@gonzo:~$ net lookup floyd -l
 * 209.86.66.93
 * woozle@gonzo:~$ net lookup floyd -l
 * 192.168.0.101
 * woozle@gonzo:~$ net lookup floyd -l
 * 192.168.0.101

Also, phealy says:

i'm betting your problem is that, say, floyd hasn't advertised itself recently and samba can't find it ... is there something in there (referring to resolv.conf) like 'search earthlink.net'?


 * @gonzo:~$ cat /etc/resolv.conf
 * search earthlink.net
 * nameserver 192.168.0.1

yea ... so when it can't find it, it looks to dns ... that search line means "if you can't find 'floyd', try 'floyd.earthlink.net' ... the delays are, respectively, WINS timing out, and then 'floyd' timing out


 * &lt;TheWoozle> Ok, that makes sense... but why did it start so abruptly? ... We've certainly had machines come and go on the LAN, and never had weird addresses pop up. If it couldn't find a machine, it would just say so.
 * &lt;phealy> a setting change, the machine acting sas the WINS server went, etc. ... actually, did you change your router? ... or your ISP could have changed their DNS settings
 * &lt;TheWoozle> Hmm... the router did get a firmware upgrade not long ago.
 * &lt;phealy> probably that search line wasn't there before and now is
 * &lt;TheWoozle> And if that search line was on the Samba master browser, it might cause other machines to act similarly?
 * &lt;phealy> no ... but all machines work that way ... and they're all getting it from the router's DHCP ... if it can't find it via netbios, it looks for DNS
 * &lt;TheWoozle> The router's DHCP listed the correct address for Floyd.
 * &lt;phealy> right
 * &lt;TheWoozle> I sshed to Floyd using the address given by the router; that worked. ssh to Floyd using the net lookup address failed.
 * &lt;phealy> but the router's dhcp gives them all the searchpath, and they're looking at earthlink.net when they can't find the machine via SMB

Well, that would explain why saying "ping netbiosname " now produces something other than "unknown host netbiosname ":
 * woozle@gonzo:~$ ping floyd
 * PING floyd.earthlink.net (209.86.66.93) 56(84) bytes of data.
 * 64 bytes from elydm.04.am.barefruit.com (209.86.66.93): icmp_seq=1 ttl=54 time=20.5 ms
 * 64 bytes from elydm.04.am.barefruit.com (209.86.66.93): icmp_seq=2 ttl=54 time=23.4 ms
 * 64 bytes from elydm.04.am.barefruit.com (209.86.66.93): icmp_seq=3 ttl=54 time=17.2 ms


 * --- floyd.earthlink.net ping statistics ---
 * 3 packets transmitted, 3 received, 0% packet loss, time 2009ms
 * rtt min/avg/max/mdev = 17.234/20.401/23.441/2.538 ms
 * woozle@gonzo:~$ ping bunsen
 * PING bunsen.earthlink.net (209.86.66.95) 56(84) bytes of data.
 * 64 bytes from elydm.06.am.barefruit.com (209.86.66.95): icmp_seq=1 ttl=54 time=27.4 ms
 * 64 bytes from elydm.06.am.barefruit.com (209.86.66.95): icmp_seq=2 ttl=54 time=31.1 ms


 * --- bunsen.earthlink.net ping statistics ---
 * 2 packets transmitted, 2 received, 0% packet loss, time 1004ms
 * rtt min/avg/max/mdev = 27.408/29.278/31.148/1.870 ms
 * woozle@gonzo:~$ ping beaker
 * ping: unknown host beaker
 * woozle@gonzo:~$ net lookup beaker
 * 209.86.66.92
 * woozle@gonzo:~$ ping beaker
 * PING beaker.earthlink.net (209.86.66.91) 56(84) bytes of data.
 * 64 bytes from elydm.02.am.barefruit.com (209.86.66.91): icmp_seq=1 ttl=54 time=23.3 ms
 * 64 bytes from elydm.02.am.barefruit.com (209.86.66.91): icmp_seq=2 ttl=54 time=23.0 ms


 * --- beaker.earthlink.net ping statistics ---
 * 2 packets transmitted, 2 received, 0% packet loss, time 1007ms
 * rtt min/avg/max/mdev = 23.046/23.184/23.323/0.205 ms

Next: how to verify that this is happening because of the router, and then how to make it stop.

2006-09-09 analysis

 * DHCP seems to be fetching any locally-unresolved addresses (i.e. addresses which the router can't resolve) from Earthlink's servers.
 * Query: why are they unresolved? Check to see what addresses those actual machines think they have.
 * Answer: ifconfig on Bunsen reveals correct IP (192.168.0.106) even though "net lookup bunsen" on gonzo returns 209.86.66.94
 * Note: DHCP listing on router has 192.168.0.106 listed as "unknown"...
 * Therefore: seems likely router is not recognizing the name "bunsen", and replies that it does not know bunsen's address
 * Query: why isn't the router receiving Bunsen's name in the DHCP request? Or, if it is receiving it, why isn't it storing it?
 * Subquery: how can I tell if Bunsen is correctly transmitting its name when it does a DHCP?
 * Flaw in theory: router's DHCP list shows 192.168.0.105 for Mokey, but "net lookup mokey" returns 209.86.66.92 on bunsen and 209.86.66.94 on gonzo. (Refreshed view of router's DHCP list just to be sure; no apparent changes.) At the moment, these addresses are being persistent.
 * Query: how to get a debug/trace log of "net lookup"'s activity? Do we have to use ethereal?
 * Answer: no. "net -d 2 lookup machinename " returns more info:
 * woozle@gonzo:~$ net -d 2 lookup mokey
 * [2006/09/09 10:27:45, 2] lib/interface.c:add_interface(81)
 * added interface ip=192.168.0.103 bcast=192.168.0.255 nmask=255.255.255.0
 * [2006/09/09 10:27:45, 2] libsmb/namequery.c:name_query(492)
 * Got a positive name query response from 127.0.0.1 ( 209.86.66.94 )
 * 209.86.66.94
 * [2006/09/09 10:27:45, 2] utils/net.c:main(878)
 * return code = 0
 * More: a net lookup on bunsen was less illuminating at first; I had to go up to -d 5 before I got this bit at the end:
 * Netbios name list:-
 * my_netbios_names[0]="BUNSEN"
 * [2006/09/09 10:31:13, 2] lib/interface.c:add_interface(81)
 * added interface ip=192.168.0.106 bcast=192.168.0.255 nmask=255.255.255.0
 * [2006/09/09 10:31:13, 5] lib/gencache.c:gencache_init(60)
 * Opening cache file at /var/cache/samba/gencache.tdb
 * [2006/09/09 10:31:13, 5] libsmb/namecache.c:namecache_fetch(201)
 * name mokey#20 found.
 * 209.86.66.92
 * [2006/09/09 10:31:13, 2] utils/net.c:main(988)
 * return code = 0
 * It looks like the incorrect address for Mokey has been cached (in "my_netbios_names" in Bunsen). Query: how to clear/refresh the cache?
 * Discovery: machinename apparently refreshes the cache some of the time. (It also will return a non-127.0.0 address for the localhost.) However, it doesn't seem to be solving the problem on gonzo:
 * woozle@gonzo:~$ nmblookup mokey
 * querying mokey on 192.168.0.255
 * 192.168.0.105 mokey<00>
 * woozle@gonzo:~$ net lookup mokey
 * 209.86.66.94
 * woozle@gonzo:~$ nmblookup mokey
 * querying mokey on 192.168.0.255
 * 192.168.0.105 mokey<00>
 * woozle@gonzo:~$ net lookup mokey
 * 209.86.66.94

2006-09-10 Resolution
The problem appears to have been caused in large part by EarthLink's new "dead domain handling" scheme, possibly exacerbated by my DLink DI-604's sloppy handling of NetBIOS names, which in turn may have been exacerbated by a recent firmware upgrade I did to it (although that should have improved things, of course).

We have turned off the DLink router's DHCP and are now using dnsmasq, using non-EarthLink upstream DNS servers (Xmission and IODynamics (Tenebram's employer)). Tene also made a number of other little tweaks, including one which now resolves NetBIOS names as effectively as regular Internet domain names, so (e.g.) I can now "ping bunsen".

2006-09-24 related problem?
For some reason, beaker suddenly wasn't picking up DHCP from Gonzo; dnsmasq (on Gonzo) might have gotten turned off accidentally when I restarted Samba, and that may have gotten Beaker all confused. ipconfig/renew_all didn't work; it actually reported that no DHCP server was found and reverted to using the hardware router (192.168.0.1) for DNS, rather than coming up with an autoconfig IP address (I checked, and the router's DHCP was still turned off, so I don't know how it worked that one out).

What eventually worked was: The network started appearing normally after that. ipconfig/all confirms that Gonzo (192.168.0.254) is now the DHCP server.
 * ipconfig/release_all on Beaker
 * shut down Beaker
 * wait 5 seconds, then press Beaker's power button to turn him back on