Author Topic: ARP corruption and network errors  (Read 5347 times)

check-ict

  • Zen Apprentice
  • *
  • Posts: 30
  • Karma: +0/-0
    • View Profile
ARP corruption and network errors
« on: September 29, 2011, 11:49:09 pm »
Hello,

I have a big server with alot of VM's. The gateway server is Zentyal 2.2 (also tried 2.0) and has 2 virtual network cards.

Both eth0 and eth1 are on the same physical network card of the virtual host server. eth0 is external (WAN) IP and eth1 is internal (LAN) IP.

Everything works great, but sometimes servers can't reach the network anymore (ping the zentyal gateway).

I resolve this every time by SSHing to the zentyal gateway, removing the ARP entries with arp -d and ping the server from the gateway. When the ping starts from zentyal to the server with network problems, a new ARP entry gets created. After the first ping reply, the server has network connection again.

So in short, how can I avoid ARP corruption? Why are my servers getting disconnected?

Here is my ARP situation when a server can't reach the network:
hostname.domain.nl         ether   5e:19:32:fe:bd:fb   C                     eth1
hostname.domain.nl         ether   5e:19:32:fe:bd:fb   C                     eth0

After delete + ping I get the same, but this time with ns. in front.

I added the hostname.domain.nl in my dns (created a zone and entered IP) so I can resolve domain names within the LAN network.

Can anyone help?

nicolasdiogo

  • Forum Moderator
  • Zen Samurai
  • *****
  • Posts: 263
  • Karma: +3/-0
  • a pessimist, but trying out optimism
    • View Profile
    • BrainPowered Business Intelligence Consultancy - UK
Re: ARP corruption and network errors
« Reply #1 on: September 30, 2011, 09:38:27 am »
hi,

i am not sure if this is related but..

if you use the domain such as
Quote
hostname.domain.nl

you are then saying that your machines are in par with domain such as:
blogs.bbc.co.uk

which is not strictly correct.

so your gateway might be asking external DNS'es when you think that they should be using your internal DNS.
unless that is what you want.

try using someting like:
intranet.domain.nl

as your domain, so that your machines will be:
hostname.intranet.domain.nl

see if that helps
my opinions and suggestion expressed on this forum are my own as a user.
please note that i am not part of the Zentyal Development Team

www.brainpowered.net - supporting open-source Business Intelligence in Europe

jsalamero

  • Zentyal Staff
  • Zen Hero
  • *****
  • Posts: 1419
  • Karma: +45/-1
    • View Profile
Re: ARP corruption and network errors
« Reply #2 on: October 03, 2011, 11:41:12 am »
Check for failing network hardware or network loops :)

check-ict

  • Zen Apprentice
  • *
  • Posts: 30
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #3 on: October 03, 2011, 04:00:15 pm »
Could it be related to my physical network adapter?

I have all adapters on the same hardware NIC. So the Zentyal external (WAN) network card and the internal (LAN) network card are connected to the samen NIC on the hardware. It should be OK since it's a virtual switch, right?




check-ict

  • Zen Apprentice
  • *
  • Posts: 30
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #4 on: October 03, 2011, 04:03:13 pm »
My LAN DNS knows about host.domain.com, so it doesn't ask external DNS servers for this. It will redirect me to the LAN IP of the webserver.

so internal host.domain.com is redirected to 10.10.1.22.
Externel it's the WAN IP adres from the external name servers.

modti

  • Zen Apprentice
  • *
  • Posts: 6
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #5 on: January 09, 2013, 02:18:50 pm »
Hi,

This post is getting old but I am facing the same problem: sometimes Zentyal cannot be reached. I do not have a solution, but I have made a complete and deep analysis that may lead to a solution. The problem appears at the layer 2 of the network.

I am using Zentyal as a gateway in a very simple configuration:
LAN <--> eth0 (192.168.2.1) ZENTYAL eth1 (192.168.1.1) <--> WAN

eth0 MAC: 08:00:27:e4:fe:72
eth1 MAC: 08:00:27:e8:2a:b0

Zentyal is a virtual server, and both interfaces are on the same physical network (at the layer 2 actually). It is Zentyal 3.0.10, freshly updated.

I analyzed with Wireshark the exact communication when a host is discovering Zentyal for the first time:
1 - Who has 192.168.2.1? Tell 192.168.2.2 (Network host broadcasting)
2 - 192.168.2.1 is at 08:00:27:e4:fe:72
3 - 192.168.2.1 is at 08:00:27:e8:2a:b0

The host will randomly add one (or the first one) of the MAC addresses to the ARP table.

So the problem is the fact that Zentyal replies to the ARP broadcast on both interfaces (2 replies). This behavior causes connection issues as a LAN client may access Zentyal (or the WAN) with the MAC address of eth1 and the network IP address of eth0.

Interface eth1 should not reply to a ARP broadcast requesting the MAC address of the internal interface. I do not think Zentyal has the right behavior (am I wrong?). I can confirm that the problem persists on VirtualBox and Xen (XenServer), when testing fresh and updated Zentyal installs.

Besides I tested other distros in the same environment, such as ZeroShell and BrazilFW. It works perfectly, just one reply is made to the ARP broadcast.

Is there any solution/workaround to that problem or should it be considered as a bug? Thank you for any help!

Modti

christian

  • Guest
Re: ARP corruption and network errors
« Reply #6 on: January 09, 2013, 02:40:29 pm »
Search this forum as I remember that similar question has been posted some months ago (mix-up with MAC addresses and IPs).
Still I don't understand what is the added value of such set-up that will have 2 different networks on same physical interface when one is supposed to be internal and the other external.
This is not what I would call a "simple" Zentyal installation as it means that you have on same layer both internal and external sides.

You configuration looks simple but it's not plus you have some potential weakness in case workstation or server on this network is directly accessed from internet.
Or... there is something I don't understand  :)

modti

  • Zen Apprentice
  • *
  • Posts: 6
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #7 on: January 09, 2013, 03:18:28 pm »
Hi Christian, thank you for the quick reply.

I just found out the thread: http://forum.zentyal.org/index.php/topic,8476.msg39779.html (is that right? Actually I found 2 other threads with the same problem).

They found a workaround on the client side (manually and permanently add Zentyal server entry to the ARP table), but I am looking for a solution on the server side (in the production environment, I cannot go over 50 workstations to fix a problem which actually stands on the server).

About simple configuration, I would rather say typical configuration. And you are right about the security aspect of having both interfaces on the same network, it is just unsafe. But we don't rely on Zentyal for the security. The point of that configuration is to enjoy load balancing without modifying any single piece of hardware. And in the case the Zentyal VM drops, we still have the physical network available. About VLANs (another workaround), the switch does not have the 802.1q support unfortunately.

Is there any slight modification in Zentyal itself to avoid sending two replies to an ARP broadcast? That would fix the problem! Thank you.

Modti

christian

  • Guest
Re: ARP corruption and network errors
« Reply #8 on: January 09, 2013, 03:29:43 pm »
"typical configuration"   :o :o
hopefully not  ;D

- if you don't use Zentyal as internet gateway device, why do you set it up with internal and external interfaces ? (perhaps I just don't understand the "load balancing" aspect  :-[)
- I though you at least implemented VLANs. If there is none, then such config looks even stranger and not "typical" (Am I getting that old that I would first look at real servers first rather than virtual one when it doesn't bring anything (almost)  ???

modti

  • Zen Apprentice
  • *
  • Posts: 6
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #9 on: January 09, 2013, 03:45:01 pm »
Sorry about the confusion, we do use Zentyal as a gateway, the point is to have the "Balance traffic" feature which does "WAN load balancing" over several gateways.

Yes, virtualization is getting quite subversive...  :D

christian

  • Guest
Re: ARP corruption and network errors
« Reply #10 on: January 09, 2013, 04:53:20 pm »
we do use Zentyal as a gateway, the point is to have the "Balance traffic" feature which does "WAN load balancing" over several gateways.

Thus you should better us it in "real" traversing mode with two NICs, one external & one internal. This will de facto solves your issue  :P

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #11 on: January 10, 2013, 01:23:11 am »
I find this a weird use of a virtual machine that is not a gateway.  I assume that you have Xen bridging setup?  I would be most curious to see your bridge script.  I have setup Xen here.  While I migrated away from it, the reason was not networking issues.  Load balancing might better be performed at the Dom0 level than performed at the DomU level.  Just my two cents.

modti

  • Zen Apprentice
  • *
  • Posts: 6
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #12 on: January 10, 2013, 03:10:29 pm »
@Christian

For sure the solution of having another NIC works, but once again I would like to stress one of the constraints which is avoiding any hardware change.

To sum up a little, to me there are 3 solutions:
1 - the first one is to have a switch (and hypervisor) that supports VLANs, this is an expensive solution for small business in Brazil.
2 - the second one is to get a new NIC to the physical server and obviously to get a second switch to connect that NIC to the WAN gateways. This is still an expensive solution as the NIC has to be supported by the hypervisor and such a modification on the production server must be done by official staff (e.g. from Dell, HP, etc.).
3 - the third solution is to directly modify the behavior of Zentyal (which is not right to me), in order to solve the real root problem.

About IT consulting for (very) small business in Brazil, the money that is not spent to useless and expensive hardware could be actually use to get Zentyal support (I am talking about 2000 potential customers). In short the designed solution (using one single switch) is based on a real use case and has a huge commercial impact.

From that viewpoint, what is your opinion ?

@half_life

The VM I am talking about is definitely a gateway for the LAN, which is balancing the Internet traffic over other WAN gateways. In Zentyal I am using Network -> Gateways -> Balance traffic -> Traffic Balancing "enable" and some "Multigateway rules".
The network interface of the host is bridged to let the VMs access the LAN. This is done by XenCenter actually and it works fine so far. The distro I am using is XCP (bare-metal solution), which is a clone of XenServer, that uses Xen as hypervisor.
To perform "WAN load balancing" I've always used a specific software or distro (Zentyal, BrazilFW, ZeroShell, ClearOS, SME Server, etc.), I do not really expect to install anything on XCP (which is actually CentOS). Overall the virtualization does not affect significantly the performance of the WAN load balancing.

Modti

christian

  • Guest
Re: ARP corruption and network errors
« Reply #13 on: January 10, 2013, 04:06:12 pm »
From that viewpoint, what is your opinion ?

I can't comment this case and support related aspect as I'm not part of Zentyal staff. As I will never provide official support, I'm definitely not the one to comment this.
As forum member, Zentyal user and IT professional, I would never suggest to anyone to deploy internet gateway relying on VM using same physical NIC for external and internal connection. To me this is totally weird  :-[
but if you find local IT support prone to support it or even convince Zentyal team that they should officially support it, I'm fine with this  ;)

Here in Europe, cost for single port 10/100Mb network card is about 15€. From Dell or HP, you will get dual port for about $250.

Add to this 100 to 200€ to get manageable switch. Even in HP official catalogue, you can get switch for less that $100 (look at HP 1405-5G v2 Switch)
So, if cost does matter (which I do understand) adding hardware is still cheaper than buying support for unsafe design.

I also do not share you comment about "useless and expensive hardware" because you make it as general statement meaning (for what I understand): "this is always better to buy support than hardware"
I do agree on the principle that is to avoid wasting money buying useless hardware but this doesn't mean that you should stack everything on one single machine with one single NIC, hoping that strong and very well managed virtualization will solve all the potential issue. This is just not true, at least from my standpoint.
If you want to spend money in service rather than hardware, start buying design service ad ask the one designing such solution to support it  ;)

renato.diogo

  • Zen Apprentice
  • *
  • Posts: 1
  • Karma: +0/-0
    • View Profile
Re: ARP corruption and network errors
« Reply #14 on: March 05, 2013, 03:13:04 pm »
Hi...

had the same problem above.

In a search from internet I found the document that explain the "problem" or default behaviour the Linux:

http://linux-ip.net/html/ether-arp.html#ether-arp-flux

Test the arp_filter parameter of kernel.

[]s

Renato