Download List

项目描述

Heritrix is the Internet Archive's extensible, Web-scale,
archival-quality Web crawler.

系统要求

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2009-09-20 07:05
1.14.3

这是一个'微'的错误修正和改进的小要求释放。下一个主要版本2.2将在2009年,该计划包括向Heritrix 2配置系统更新和检查点的功能和工具,缓解从1.14.x过渡到Heritrix 2.2。
This is a 'micro' release with bugfixes and small requested improvements. The next major release will be 2.2 in 2009, which is planned to include updates to the Heritrix 2 configuration system and checkpointing functionality, and tools easing transition from 1.14.x to Heritrix 2.2.

2005-12-02 08:57
1.6.0

此版本提供了改进的远程控制,并通过JMX的,抓取,检查点的设施,为开花已筛选的实验性支持,包括测试,监测分区在多个独立的抓取工具抓取,收集和per-host/domain/queue-grouping配额。性能和稳定的大型检索得到改善。 39要求的增强和96列入报告的错误是固定的。您需要调整您的旧秩序的文件,使他们再次与新版本。
标签: Major feature enhancements
This release offers improved remote control and
monitoring via JMX, a crawl-checkpointing
facility, experimental support for bloom filter
already-included testing, partitioning a crawl
across multiple independent crawlers, and
per-host/domain/queue-grouping collection quotas.
Performance and stability in large crawls was
improved. 39 requested enhancements were included
and 96 reported bugs were fixed. You will need to
tweak your old order files again to make them work
with the new release.

2005-04-29 08:37
1.4.0

此版本的功能大大改进了内存利用率,一个新的实验范围界定/过滤器模型,和一个新的重新前沿。超过90错误是固定的。
标签: Major feature enhancements
This release features a much improved memory usage, a new experimental scoping/filter model, and a new revisiting frontier. Over 90 bugs were fixed.

2004-11-17 04:01
1.2.0

此版本增加了基于IP的礼貌,可配置的URI,规范化,和中取中止。也有很多的错误修正。
标签: Minor feature enhancements
This release adds IP-based politeness, configurable URI-
canonicalization, and mid-fetch abort. There were also lots of
bugfixes.

2004-09-23 20:53
1.0.4

Crawl.log和ARC数据线可以在先前的URI和MIME etype领域的空白。
标签: Minor bugfixes
Crawl.log and ARC metadata lines could previously have whitespace in URIs and MIME etype fields.

Project Resources