Author Topic: [FIXED!] Nearly mad... From time to time a few machines cant ping zentyal! :\  (Read 4116 times)

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
Hi guys.

I have a very strange problem: From time to time, some of the computers on my network (nearly 50) cant ping my zentyal box.
 Its working... then bang! that machine cant ping zentyal. Then, a few minutes later... it can ping it again. Its so strange because zentyal CAN ping the machine... Its just the machine that cant.

It happens with a lot of machines. while one machine cant see zentyal, another can. And then, that one cant, and then, it can again.

Its random.

This is the ifconfig of the zentyal box:

Quote
root@gateway:/home/testnetwork# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:a4:47:84 
          inet addr:10.0.0.249  Bcast:10.0.0.255  Mask:255.255.255.0 <-------- THIS ONE IS INTERNAL
          inet6 addr: fe80::250:56ff:fea4:4784/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:32345 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19288 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3793058 (3.7 MB)  TX bytes:5827075 (5.8 MB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:a4:47:85   <----- THIS IS EXTERNAL
          inet addr:10.0.1.3  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fea4:4785/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:30854 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13011 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:22446208 (22.4 MB)  TX bytes:1047058 (1.0 MB)

eth2      Link encap:Ethernet  HWaddr 00:50:56:a4:47:86   <--------------- EXTERNAL
          inet addr:190.8.44.13  Bcast:190.8.44.15  Mask:255.255.255.248
          inet6 addr: fe80::250:56ff:fea4:4786/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:124756 errors:0 dropped:0 overruns:0 frame:0
          TX packets:76646 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:161530175 (161.5 MB)  TX bytes:5448906 (5.4 MB)

eth3      Link encap:Ethernet  HWaddr 00:50:56:a4:47:87  <--------------- EXTERNAL
          inet addr:192.168.14.4  Bcast:192.168.14.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fea4:4787/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14970 errors:0 dropped:0 overruns:0 frame:0
          TX packets:138 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1219177 (1.2 MB)  TX bytes:6748 (6.7 KB)

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1196431 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1196431 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:120292327 (120.2 MB)  TX bytes:120292327 (120.2 MB)

and this is the ipconfig of the machine that can't ping zentyal:

Quote
Windows IP Configuration

   Host Name . . . . . . . . . . . . : srv-infra
   Primary Dns Suffix  . . . . . . . : test.local
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : test.local

Ethernet adapter Local:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Network Connection
   Physical Address. . . . . . . . . : 00-0C-29-4B-0D-61
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 10.0.0.250(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.0.0.249
   DNS Servers . . . . . . . . . . . : 10.0.0.254
   NetBIOS over Tcpip. . . . . . . . : Enabled

Tunnel adapter Local Area Connection* 9:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Teredo Tunneling Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter isatap.{2282D9A7-BAA2-4CD2-B880-B24848D0B242}:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter #2
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes


Please, help me :\ this is driving me and my users crazy
« Last Edit: August 08, 2012, 02:34:56 am by cyberstudio »

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
Re: Nearly mad... From time to time a few machines cant ping zentyal! :\
« Reply #1 on: August 01, 2012, 03:46:34 am »
The current server load is like this:

0.20, 0.11, 0.03

So, i dont think server load is the problem.

I disabled eth1, and eth3, but the problem persist. By the way, this is a fresh install (from a few hours ago)

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
Re: Nearly mad... From time to time a few machines cant ping zentyal! :\
« Reply #2 on: August 01, 2012, 03:49:57 am »
This is the route print for the windows server box that currently cant see the zentyal box (It was working just a few minutes ago, without ANY change... then suddenly its not working)

Quote
===========================================================================
Interface List
 12...00 0c 29 4b 0d 61 ......Intel(R) PRO/1000 MT Network Connection
  1...........................Software Loopback Interface 1
 11...00 00 00 00 00 00 00 e0 Microsoft Teredo Tunneling Adapter
 13...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter #2
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask               Gateway        Interface       Metric
          0.0.0.0                 0.0.0.0             10.0.0.249       10.0.0.250       266
         10.0.0.0           255.255.255.0            On-link         10.0.0.250       266
       10.0.0.250          255.255.255.255         On-link         10.0.0.250      266
       10.0.0.255          255.255.255.255         On-link         10.0.0.250      266
        127.0.0.0          255.0.0.0                   On-link         127.0.0.1        306
        127.0.0.1          255.255.255.255         On-link         127.0.0.1         306
  127.255.255.255      255.255.255.255          On-link         127.0.0.1        306
        224.0.0.0          240.0.0.0                   On-link         127.0.0.1        306
        224.0.0.0        240.0.0.0                      On-link        10.0.0.250      266
  255.255.255.255      255.255.255.255          On-link         127.0.0.1       306
  255.255.255.255      255.255.255.255           On-link        10.0.0.250     266
===========================================================================
Persistent Routes:
  Network Address          Netmask       Gateway Address          Metric
          0.0.0.0              0.0.0.0               10.0.0.249              Default
===========================================================================

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    306 ::1/128                       On-link
  1    306 ff00::/8                       On-link
===========================================================================
Persistent Routes:
  None
« Last Edit: August 01, 2012, 03:54:48 am by cyberstudio »

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
Re: Nearly mad... From time to time a few machines cant ping zentyal! :\
« Reply #3 on: August 01, 2012, 04:05:16 am »
if i disable the firewall, proxy and traffic shaping the problem goes away.

I dont know who is to blame... still testing...

christian

  • Guest
Re: Nearly mad... From time to time a few machines cant ping zentyal! :\
« Reply #4 on: August 01, 2012, 06:32:50 am »
Do you confirm (this is not clear to me from your various posts) that Zentyal is running on VMware virtual machine?

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
Re: Nearly mad... From time to time a few machines cant ping zentyal! :\
« Reply #5 on: August 01, 2012, 01:31:58 pm »
Yes, that's true.

Its running on a ESXi 5 host. That host also has 2 window server 2008 r2 installations.

By the way... the problem is not with proxy or traffic shaping modules. I have discovered that if i disable the firewall module, and enable it again, everything start working, at least for 10 or 20 minutes. After that, the problem happens again, and i have to (Again) disable, and enable the firewall module.

Zentyal was installed using the .iso. These are the specs for that Zentyal box:


Here you can see the problem in action. At first, i get "Request Timed out". Then, i go to zentyal: Disable the firewall -> save changes -> Enable firewall -> save changes, and it start working again.

« Last Edit: August 01, 2012, 02:10:53 pm by cyberstudio »

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
UPDATE

if i run "tracert www.google.com" on the cmd of the affected machine... the internet starts working again... just like magic, and i can ping the gateway..

But after a while, the problem happens again, and i need to run "Tracert" again

This is a big mistery to me...

Any tip?

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
ANOTHER UPDATE
Looking at the ARP table on one of the affected machines, i have noticed something VERY strange...

I have a test machine... this is the ARP table of that machine when the internet is working:

Quote
Interface: 10.0.0.123 --- 0xa
  Internet Address      Physical Address              Type
  10.0.0.28              a4-ba-db-ed-07-83          dynamic   
  10.0.0.249            00-50-56-a4-47-8c          dynamic <--------------   
  10.0.0.252            00-24-e8-53-db-f5           dynamic   
  10.0.0.254            00-0c-29-f7-e0-c7           dynamic   
  10.0.0.255            ff-ff-ff-ff-ff-ff                     static   
  224.0.0.252           01-00-5e-00-00-fc           static   
  255.255.255.255       ff-ff-ff-ff-ff-ff                 static   

This is the ARP table of that machine when the internet is not working:

Quote
Interface: 10.0.0.123 --- 0xa
  Internet Address      Physical Address              Type
  10.0.0.28              a4-ba-db-ed-07-83          dynamic   
  10.0.0.249            00-50-56-a4-47-8d          dynamic  <---------------   
  10.0.0.252            00-24-e8-53-db-f5           dynamic   
  10.0.0.254            00-0c-29-f7-e0-c7           dynamic   
  10.0.0.255            ff-ff-ff-ff-ff-ff                     static   
  224.0.0.252           01-00-5e-00-00-fc           static   
  255.255.255.255       ff-ff-ff-ff-ff-ff                 static   

Can you notice the difference?? 10.0.0.249 is my zentyal gateway.

That gateway has 4 ethernet adapters. 1 internal, 3 external. "00-50-56-a4-47-8c" is the mac address of the internal card. "00-50-56-a4-47-8d" is the mac address of the first external card.

For some unknown reason to me, something is making the affected machine point to the wrong zentyal network card.

Yes... i have checked that the card with "00-50-56-a4-47-8d" is marked as external, and "00-50-56-a4-47-8c" is internal.

What can cause that?  :o
« Last Edit: August 07, 2012, 02:13:43 pm by cyberstudio »

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
FOUND IT!!

the problem is described here:
http://www.embedded-bits.co.uk/2008/multiple-network-gotcha/
http://linux-ip.net/html/ether-arp.html

The problem is not Zentyal's fault, but linux design.

The solution was to add this:
net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2

on this file:
/etc/sysctl.conf

and restart.

christian

  • Guest
Excellent !
What I discover, reading your post, is that all, or at least some, of your external interfaces, although having IP in different range, are connected to same physical network. Am I correct with such understanding?

If yes could you explain the purpose of such design?

cyberstudio

  • Zen Monk
  • **
  • Posts: 61
  • Karma: +1/-0
    • View Profile
Hi

Yes, you're right. Here we have 3 internet connections, and they're connected to the same switch as the rest of the network (in different ip ranges). Why? Well, zentyal is running on a virtual machine, and the physical machine that host zentyal VM only has one network card. So, in order to connect zentyal with all the 4 networks (Local, internet-1, internet-2, internet-3) i have to connect them on the same switch, on different subnets.

Then, on zentyal, everything goes as usual: one card is market as internal, and the other 3 are external. The problem occurred because, since each card is on the same switch, all of them got the ARP package, and all of them reply to that package, generating 4 answers, each answer with a different mac address. Windows only uses the first answer to arrive and discard the other 3, and sometimes that answer contained the mac address of one of the external cards.

Hope that makes any sense to you xD

Anyway, its solved now, thanks god  ;D