|
Embedded Linux OpenWRT example of troubleshooting with large data does not matter, just to help a friend a busy row of record it wrong.
Before relations are very good colleagues, currently in the field of enterprise wifi business, purchasing our big data services, it is to give him a platform to build and debug. Then he was the CEO these days when debugging router encountered some problems, engaged in large data while incidentally gave his hand to solve the problem.
OpenWRT, embedded Linux, mainly used in the MIPS or ARM devices. Many routers and wifi equipment will use this system, and there is light.
Coova-Chilli, the access under the Access Controller openwrt provide authentication gateway, you can use the radius or http access charging to do other work.
Normal, after the start chilli, starts four tun virtual tunnel card, but the fault is sporadic, from time to time there will be two IP addresses the same tun device. Such is the case
tun0 10.1.0.1
tun1 10.1.0.1
tun2 10.2.0.1
tun3 10.3.0.1
tun4 10.4.0.1
Under normal circumstances the device should only tun0-3, but every time you launch one or two more tun, and is not fixed, sometimes tun0-1 IP address the same, sometimes tun2-3 IP address of the same. And OpenWRT default is not recorded in the syslog. Difficult to troubleshoot. In fact, you can read syslog logread from the inside, but in fact no syslog record anything.
That buddy is before writing code, working hard and making the night three did not find what the problem is, they contain a variety of startup scripts recorded log in chilli, wait, sleep, no use. After the last afternoon of the current demand for big data platform on the right, and then I gave him pain free eggs looked at the script, chilli script should not have too many problems, then he is the official deployment documentation structures. Beginning did not see what the problem is. chilli default script is on the /etc/init.d directory. Ordinarily there is no problem, then the pleasure came, he told me he had written a command in rc.local do start, I looked at rc.local inside, he wrote a startup script in the / root below. vi The startup script in / root, and there wrote a /etc/init.d/chilli restart. I asked him why it was used, he said, let wrt official wrote that write insurance. I try to write off the line restart, restart 10 times, tun tunnel are no problem. 20 minutes to get.
problem analysis
chilli original script as follows
#! / Bin / sh
### BEGIN INIT INFO
# Provides: chilli
# Required-Start: $ remote_fs $ syslog $ network
# Required-Stop: $ remote_fs $ syslog $ network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start CoovaChilli daemon at boot time
# Description: Enable CoovaChilli service provided by daemon.
### END INIT INFO
PATH = / sbin: / bin: / usr / sbin: / usr / bin
DAEMON = / usr / sbin / chilli
NAME = chilli
DESC = chilli
START_CHILLI = 0
if [-f / etc / default / chilli]; then
. / Etc / default / chilli
fi
if [ "$ START_CHILLI" = "1"!]; then
echo "Chilli default off. Look at / etc / default / chilli"
exit 0
fi
test -f $ DAEMON || exit 0
. / Etc / chilli / functions
MULTI = $ (ls /etc/chilli/*/chilli.conf 2> / dev / null)
[-z "$ DHCPIF"] && [-n "$ MULTI"] && {
for c in $ MULTI;
do
echo "Found configuration $ c"
DHCPIF = $ (basename $ (echo $ c | sed 's # / chilli.conf ##'))
export DHCPIF
echo "Running DHCPIF = $ DHCPIF $ 0 $ *"
sh $ 0 $ *
done
exit
}
if [-n "$ DHCPIF"]; then
CONFIG = / etc / chilli / $ DHCPIF / chilli.conf
else
CONFIG = / etc / chilli.conf
fi
[-f $ CONFIG] || {
echo "$ CONFIG Not found"
exit 0
}
check_required
RETVAL = 0
prog = "chilli"
case "$ 1" in
start)
echo -n "Starting $ DESC:"
/ Sbin / modprobe tun> / dev / null 2> & 1
echo 1> / proc / sys / net / ipv4 / ip_forward
writeconfig
radiusconfig
test $ {HS_ADMINTERVAL: -0} -gt 0 && {
(Crontab -l 2> & - | grep -v $ 0
echo "* / $ HS_ADMINTERVAL * * * * $ 0 radconfig"
) | Crontab - 2> & -
}
ifconfig $ HS_LANIF 0.0.0.0
start-stop-daemon --start --quiet --pidfile /var/run/$NAME.$HS_LANIF.pid \
--exec $ DAEMON - -c $ CONFIG
RETVAL = $?
echo "$ NAME."
;;
checkrunning)
check = `start-stop-daemon --start --exec $ DAEMON --test`
if [x "$ check" = x "$ DAEMON already running."!]; then
$ 0 start
fi
;;
radconfig)
[-e $ MAIN_CONF] || writeconfig
radiusconfig
;;
restart)
$ 0 stop
sleep 1
$ 0 start
RETVAL = $?
;;
stop)
echo -n "Stopping $ DESC:"
crontab -l 2> & - | grep -v $ 0 | crontab -
start-stop-daemon --oknodo --stop --quiet --pidfile /var/run/$NAME.$HS_LANIF.pid \
--exec $ DAEMON
echo "$ NAME."
;;
reload)
echo "Reloading $ DESC."
start-stop-daemon --stop --signal 1 --quiet --pidfile \
/var/run/$NAME.$HS_LANIF.pid --exec $ DAEMON
;;
condrestart)
check = `start-stop-daemon --start --exec $ DAEMON --test`
if [x "$ check" = x "$ DAEMON already running."!]; then
$ 0 restart
RETVAL = $?
fi
;;
status)
status chilli
RETVAL = $?
;;
*)
N = / etc / init.d / $ NAME
echo "Usage: $ N {start | stop | restart | condrestart | status | reload | radconfig}"> & 2
exit 1
;;
esac
exit 0
The problem is that when debugging him in for c in $ MULTI inside the loop, in order to ensure that each child process started successfully, plus a wait, back in time in order to establish tun channel debug added a few sleep. According to official documents, he added to the inside of a restart rc.local, so the problem comes, / etc / init.d which is automatically executed chilli start command, and added wait and sleep. init.d startup script will wait, but this time in a different Linux tty has launched a rc.local inside chilli restart command, then two or three identical tun IP address will co-exist.
Anyway, the problem is solved, and in view of this he stayed three night break things, I can tone prophet of education this person debugger CEO: "Believe everything the book as no book." Official documentation of open source systems tend to lag behind, the new version may need to restart long ago solved the problem, but the document does not update, lead to such problems.
Summary, understand how the various systems of how important it is. |
|
|
|