Broken NFS server (or client?)

Sun Aug 12 18:33:45 PDT 2001

In /usr/src/linux-2.4.8/Documentation/kernel-docs.txt (search for
"NFSD")
Check out:
http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/nfsd.html

The doc has edit from 1999, but claimed this:
"There are two version of the NFS protocol that are commonly in use today,
version 2 and version 3. The NFS server implementation in Linux currently
only supports version 2."

Prob just old since it refs 2.2.7 patches and the kernel Configure.help
are often more up-to-date than the Documentation/*

It does however mention exportfs ' ability to deal with signals to change
actual exports when issued instead of rebooting. (Likely meaning the same
commands for nfsd re-reading the /etc/exports with exportfs should work
with "knfsd" without requiring any reboots. :-)

Also, http://sourceforge.net/projects/nfs/
You can try to dl the latest tree and compile if your your server.

I skipped the dhiggen stuff (kernel patches for 2.2.x) but went for the
nfs-util version 3.1

when I did config, added this option:
--enable-nfsv3 
and will play with the other options later.

What else I did...

OK, after making a new 2.4.8 kernel with knfsd support and v3 too, and
then uninstalling the nfsd userland daemon, upgrading my nfs-tools and
maing sure the /etc/init.d/nfs* script for the old nfsd (userland) was
gone, and replaced with the new one from nfs-utils (appropriate mods made
for my debain system), and making sure that the new (/usr/sbin) rpc
services for lockd and statd were started in the init script instead of
the "normal" /sbin/rpc.statd and /sbin/rpc.lockd (just added "/usr" to the
PREFIX in the script. :-), and open up my /etc/hosts.deny (just a
little)... (inhale)

I was able to mount nfs exports from the new kernel based nfsd to another
station.

I can tell that nfsd is not a "normal process" when I look at all of my
processes and see the [nfsd] processes have zero memory in use and has
other "odd" things that you would not expect in a userland
daemon/process =-)

So this appears to work from the client to server end on my test. (I have
not finsihed the netboot root things or created the client kernels and
root export yet, but the first part is done. (I was able to duplicate a
mount error "permission denied" with very strict /etc/hosts.deny, and by
turning off some of the rpc services needed by an nfs client.)

Is there anything you notice above here in what I did that you may have
overlooked or skipped?

More comments below...

On Sat, 11 Aug 2001, Lincoln Peters wrote:
> On the diskless client, all networking features, including support for NFS, 
> are compiled straight into the kernel (not as modules).  On the non-diskless 
> client, NFS support is compiled as modules.

Then they should show up in a list when you do
# cat /proc/filesystems

I feel uneasy asking you this question, because you seem to have done so
many other things right, but I'll ask it anyway: are you sure that you are
using the new kernel you compiled with knfsd support *and* client based
nfs support too? (modified lilo.conf and rebooted with the new kernel
actually showing up, etc.) (Please dont take offense, I just want to
eliminate it from the realm of possible problems.)

Your disk based client station should at least have nfs listed in
/proc/filesystems as that is a list of filesystems that kernel can
mount. I am not 100% certain, but think you could probably have your
server not have client nfs support and still export nfs, but I dont see
any reason for that. If it appears after you have an nfs service point
mounted, then there is a chance that you have your nfs client set up as a
module and it is being inserted and removed as you use it and no longer
need it, but that seems unlikely unless you have some sort of crontab
thing for removing it.

> I should add something here: I have a directory of MP3 files shared on the 
> non-diskless client using NFS, and I successfully loopback-mounted it to 
> another path.  Then I saw nfs in /proc/filesystems.

Yes, then after you unmounted it, did you still see nfs in the
/proc/filesystems? If so, I bet running
# lsmod
would show nfs as a module loaded on that disk based client workstation.

> Yes, I enabled 'IP: Kernel-level autoconfiguration', 'DHCP Support', 'NFS 
> client support', and 'Root filesystem on NFS'.  I have tried both with and 
> without NFSv3 on both the client and server, but neither worked.

Hmm.

> What kernel version are they using now at the Schultz Information Center?

Mentioned v2.0, v2.2 and now a test server with v2.4

> >I dont run RH 7.1, but I'll see what I can find in duplicating the
> >problems you have been experiencing.
> 
> Since I'm not using a Red Hat 7.1 kernel for the diskless client, that 
> probably won't be nencessary.  I'm only using the Red Hat box to 
> troubleshoot the diskless box.

> 192.168.0.2
> I've tried setting permissions for my non-diskless client based on IP 
> address and based on hostname, but in both cases, I get the same error.

> > > >Now can try one of these:
> > > ># exportfs -ar
> 
> It's a zero.  It worked, but it still doesn't work.  The error message on 
> the non-diskless client was a bit longer, though:
> call_verify: server accept status: 1
> call_verify: server accept status: 1
> call_verify: server accept status: 1
> RPC: garbage, exit EIO
> nfs_get_root: getattr error = 5
> call_verify: server accept status: 1
> call_verify: server accept status: 1
> call_verify: server accept status: 1
> RPC: garbage, exit EIO
> nfs_get_root: getattr error = 5
> nfs_read_super: get root inode failed
> nfs warning: mount version older than kernel
> call_verify: server accept status: 1
> call_verify: server accept status: 1
> call_verify: server accept status: 1
> RPC: garbage, exit EIO
> nfs_get_root: getattr error = 5
> nfs_read_super: get root inode failed
> mount: wrong fs type, bad option, bad superblock on MELAMPUS:/,
>        or too many mounted file systems

Well, I am no expert on rpc, but I would bet the above is just a repeated
list of complaints telling you the client does not understand the server's
responses.

Something else to try:
go to sourceforge and get the latest nfs-tools, ./configure them with the
--enable-nfsv3 and then make install it over your old stuff. next look
through you startup scripts. See if they ref lockd and statd in the old
locations. Do a grep for "rpc" in your startup scripts. See if you can
point them to use the new rpc services stored in /usr/sbin (or wherever
you installed them from the source tree.) 

After you do that, then ps -auxw | grep "nfsd" and you should see a bunch
of [nfsd] that use up zero memory according to ps.

Also do a ps -auxw | grep "rpc" and see what you have running.

> >Even with a kernel based NFS server, I would expect to see the network
> >ports open for service with a netstat. Hmm.

And that is what I see here with knfsd setup.

> > > On the server, nfs-utils-0.3.1-5 (from Red Hat 7.1).  The diskless test
> > > client does not have nfs-utils, but the other test client had the same
> > > version of nfs-utils as the server.

Though it is 0.3.1, they may not have chosen to compile the support for
v3. You could check, or just dl and install a new one for testing.

> nfs appeared on the non-diskless client.  I can't check the diskless client 
> because it won't start up.  Although I would assume that it's there, since 
> the kernel was explicitly compiled with NFS support.
> 
> >2) the server not showing the service ports for NFS being open bring up a
> >question: is it really available for service?
> 
> I can try recompiling the kernel on the server WITHOUT the NFS server and 
> running a user-space NFS server.  Maybe that would work, or at least provide 
> some clues.

More points to your server since the netstat, and lack of nfsd processed
and ... seem to be pushing the same preliminary conclusion you
offered: something is wrong with the server.

Summary:
Get the new nfs-utils, configure with the v3 support, make, and make
install.
Check your startup scripts and make sure the rpc.* services are pointing
to the new rpc.* wherever they are installed (for me is was /usr/sbin)
restart your machine, and after it boots up, do a ps -auxw and see ifyou
can find the rpc service and [nfsd] (with nfsd using up zero memory).

and then we can try to move from there with the new info. :-/

-ME

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS/CM$/IT$/LS$/S/O$ !d--(++) !s !a+++(-----) C++$(++++) U++++$(+$) P+$>+++ 
L+++$(++) E W+++$(+) N+ o K w+$>++>+++ O-@ M+$ V-$>- !PS !PE Y+ !PGP
t at -(++) 5+@ X@ R- tv- b++ DI+++ D+ G--@ e+>++>++++ h(++)>+ r*>? z?
------END GEEK CODE BLOCK------
decode: http://www.ebb.org/ungeek/ about: http://www.geekcode.com/geek.html
     Systems Department Operating Systems Analyst for the SSU Library