3 Basic considerations for setting up News at your site
3.2 Deciding which newsgroups to carry
If you are planning to set up a local news database and exchange news with
other sites, you will need to decide what portions of the newsgroup hierarchies
available to you will be carried at your site. If you choose to run News as an
nntp client only, this is not really an issue, since the NNTP server will
determine which groups it makes available to clients connecting from your site.
In making this decision, you should bear several factors in mind. On the one
hand, in order to best serve your users, you may want to carry a broad spectrum
of newsgroups, covering topics of personal as well as professional interest. On
the other hand, as the size of your feed increases, so does the amount of space
necessary to hold items, the CPU and I/O load associated with maintenance batch
jobs, the cost of transmitting and receiving items, etc. At the time this was
written (July 1994), a full feed of all hierarchies available to a typical
site in the US includes over 5000 newsgroups, requires up to 7 Gbytes of disk
space, and uses over an hour per day of (off hours) CPU time for maintenance.
For example, here're a few quotes one site manager recently posted to the net:
From: sloane@kuhub.cc.ukans.edu (Bob Sloane)
Newsgroups: news.software.anu-news
Subject: Re: newsskim efficiency
Date: 20 Sep 93 16:12:35 GMT
Organization: University of Kansas Academic Computing Services
Others have commented on various performance enhancements that you
might try. I am running NEWS on a VAX 9210 which is pretty overloaded
with scientific computing. Over the course of a month, there is NEVER
any idle cpu time. NEWS processing accounts for about 8-10 percent of
CPU utilization. I have NEWS_MANAGER, NEWS_ROOT and NEWS_DEVICE all
on separate drives. I have six different incoming feeds of more than
2000 newsgroups, and I provide about ten outgoing full feeds, as well
as twelve partial feeds of varying sizes and NNTP service for our
campus. A normal SKIM command (done every night) takes about 4-6
hours, and a full SKIM (done once a week) take 8-12 hours depending on
other load. I have no problems keeping up with the news flow, so it is
possible.
From: sloane@kuhub.cc.ukans.edu (Bob Sloane)
Subject: Re: disk init params
Date: 25 Jul 94 14:09:20 CDT
Organization: University of Kansas Academic Computing Services
In article <1994Jul20.191110.5023@govonca.gov.on.ca>,
newsmgr@[192.197.192.2] writes:
> 1. If I wanted all groups for about 30 days, what size disk would do the
> job.
I think about 6-7 Gigabytes of total space would be enough. You would
need about 150 Megabytes per day to store the articles, plus 2 blocks
overhead for each article (INDEXF.SYS entry + NEWS.ITEMS entry), plus
whatever was wasted due to the cluster factor of the disk. With a
cluster factor of 3, we are currently getting about 50,000 articles
per day, so about 300,000 for the articles, plus 100,000 for overhead,
plus 75,000 for cluster factor totals 475,000 blocks per day that you
want to keep should be enough, or about 4-5 days per GB. Of course,
news volume is increasing daily, so by the time you get this the
reqirements may be higher. :-)
Your decision, then, must be based in part on local resources and policy
regarding access to and priority for usage of those resources.
As you may already know from experience, newsgroups are organized into large
groups called 'hierarchies' according to topic; names of groups within a
particular hierarchy generally begin in the same way. Several hierarchies are
commonly available on the net, including
-
Usenet -- this is actually a collection of seven hierarchies, in which
sites have agreed to adhere to common policies for creation and control of
newsgroups. If you're interested in the details, you should look in the
newsgroup news.announce.newusers for a start. The individual hierarchies are:
-
comp.* (516 newsgroups) topics related to computing, including hardware,
software, operating systems, networks, theory, etc.
-
misc.* (44 newsgroups) topics which don't fall into another hierarchy
-
news.* (22 newsgroups) topics related to news software and administration
-
rec.* (298 newsgroups) topics related to recreational activities
-
sci.* (88 newsgroups) topics of scientific interest
-
soc.* (96 newsgroups) topics related to sociology and culture
-
talk.* (23 newsgroups) open discussions on various issues
In addition, Usenet includes the small trial.* hierarchy, a place for new
newsgroups to live temporarily for trial runs before they are submitted for a
formal vote on their permanent creation.
-
'alt' -- the 'alternative' hierarchy, so called because it began as an
alternative to Usenet, is a free-wheeling collection of newsgroups on almost
any topic. The hallmark of this hierarchy is its lack of organization; anyone
can create a newsgroup, and many of the groups are short-lived responses to
fads. Some groups are serious discussions with too small or specialized a
following to support creation of a Usenet group, while others deal with
'fringe' topics not acceptable to some sites which carry Usenet. Newsgroups
in this hierarchy begin with 'alt.'; at this writing, there are 990 of them
(though that will probably change within the hour).
-
professional hierarchies -- several smaller newsgroup collections covering
topics of interest to a particular profession, or centered around a particular
organization, exist. Some have well-defined policies for creation and control
of newsgroups, while other are less organized. This category includes, among
others,
-
bionet.* (46 newsgroups) topics of interest to biologists and
biomedical scientists. Administrative matters are handled via the newsgroup
bionet.announce; see the FAQ posted there for more information.
-
biz.* (40 newsgroups) topics related to business
-
gnu.* (28 newsgroups) topics related to the Free Software Foundation's
GNU project and the software they distribute
-
ieee.* (6 newsgroups) topics related to the IEEE
-
k12.* (39 newsgroups) topics related to elementary and secondary
education
-
bit.* -- A large number of BitNet LISTSERV mailing lists are gatewayed
into newsgroups for more efficient distribution and reading than is provided
by individual email. These are grouped into the bit.* hierarchy, which
presently contains 227 newsgroups. (Many newsgroups in other hierarchies are
also gatewayed into mailing lists; FAQ postings to those lists and groups will
usually mention this.)
-
VMSnet -- This hierarchy runs under the aegis of the VMSnet Working Group
of the VMS MIF of DECUS' US chapter, and contains groups of interest to VMSnet
readers. Not surprisingly, most groups deal with VMS -- technical topics,
networking, administration, software, jobs, etc. The hierarchy was established
because most Usenet sites run Unix, and many were reluctant to carry newsgroups
dealing with VMS-specific issues. While there is much more crossover between
VMSnet and Usenet now, VMSnet continues to flourish as a community centered
on VMS, operating under looser rules than the larger Usenet community. There
are currently 36 groups in this hierarchy, whose names all begin with
vmsnet.*; administrative issues are handled in the group vmsnet.admin.
-
Commercial hierarchies -- Some businesses sell information services which
are distributed as newsgroup hierarchies which sites pay to receive. For
example, ClariNet is a collection of around 250 groups produced by a news (in
the traditional sense -- wire services, print journalism, etc.) abstracting
service. Control of groups in these hierarchies is usually by the producer,
and arrangements for feeds are made with them directly.
-
Regional hierarchies -- in many areas there are collections of newsgroups
defined by their distribution, carrying topics of interest to all sites within
a municipality, state or province, nation, system of universities, etc. The
administrative policies governing these hierarchies may vary from place to
place, and details are best obtained from a nearby site.
-
Institutional hierarchies -- newsgroups provide a convenient conferencing
tool and medium for disseminating information within an organization, so you
may find (or want to create) a hierarchy of groups for internal use.
-
to.* -- at many sites, it is customary to create a newsgroup, named
to.site, for each site with which news is exchanged (including yourself).
Each of these groups is distributed only to the local system and the site which
appears in the newsgroup name. (You do this when configuring News by
specifying the newsgroup name in the subscriptions field of the News.Sys
entry for the appropriate site only.) This provides you with a channel through
which to post test and administrative messages intended only for a specific
site to the appropriate to.site newsgroup, and the News distribution
mechanism will forward it only to that site. For instance, if your system is
named plato, and you exchange news with sites socrates and
aristotle, you would create newsgroups on your system named 'to.plato',
'to.socrates', and 'to.aristotle'. The group 'to.plato' would not be forwarded
to any other sites, but would contain incoming messages intended for your site
only. The group 'to.socrates' would be distributed to site socrates only,
and the group 'to.plato' to site plato only, so you could easily send items
to one site and not the other.
You'll probably want to choose a few of the possible hierarchies to start with,
and then expand or set up local hierarchies as you become more familiar with
News configuration and usage patterns at your site.
previous: 3.1 Software required for News
next: 3.3 Choosing a News database configuration
Table of Contents