定时任务调度

学习目标

在这里，你将系统学习了解 MemorySearch 忆搜阁中定时任务调度的实际运用场景

我们将以最简单直接的方式为您呈现内容！

# 🍜 开启定时任务

在项目启动类 MainApplication 上方，增加 @EnableScheduling 注解，启动定时任务：

/**
 * 主类（项目启动入口）
 */
@SpringBootApplication(exclude = {RedisAutoConfiguration.class})
@EnableScheduling
@EnableAspectJAutoProxy(proxyTargetClass = true, exposeProxy = true)
public class MainApplication {
    public static void main(String[] args) {
        SpringApplication.run(MainApplication.class, args);
    }
}

# 🥩 配置定时任务

在 getArticleJob 下，编写getArticleJob方法，配置定时任务。这里我们使用 cron 表达式，每隔六个小时从外部网站爬取最新的博文，并持久化存储至 MySQL

/**
 * @author 邓哈哈
 * 2023/12/22 21:55
 * Function: 定时任务 爬取热榜博文
 */
@Component
public class getArticleJob {
    @Resource
    private ArticleService articleService;

    List<Article> articleList = new ArrayList<>();

    List<String> contentIdList = new ArrayList<>();

   @Scheduled(cron = "*/2 * * * * *")
    public void getArticleJob() {
    // Hutool 客户端发送请求
    getArticles();
    // Jsoup 解析 HTML 文档
    getContents();
    }

    .............................
}

# 🥣 编写定时任务

如上，我们调用了getArticles()和getContents()方法，通过 Hutool 客户端发送请求和 Jsoup 解析 HTML 文档，抓取到了外部网站的热榜博文，并持久化至 MySQL 中。
这两部分的详细操作步骤讲解已经在数据抓取中完成，可以点击此处跳转学习：数据抓取 (opens new window)

# Hutool 客户端发送请求

 @Test
    void getArticles() {
        // 定义 URL
        String url = "https://api.juejin.cn/content_api/v1/content/article_rank?category_id=6809637769959178254&type=hot&aid=2608&uuid=7202969973525005828&spider=0";
        // 发起 HTTP GET 请求
        HttpRequest request = HttpRequest.get(url);
        // 获取响应结果
        HttpResponse response = request.execute();
        String json = response.body();
        System.out.println(json);

        System.out.println("------------------------");

        // 解析 JSON 字符串
        ObjectMapper objectMapper = new ObjectMapper();
        JsonNode rootNode = null;
        try {
            rootNode = objectMapper.readTree(json);
        } catch (JsonProcessingException e) {
            throw new RuntimeException(e);
        }

        JsonNode dataNode = rootNode.get("data");

        for (JsonNode jsonNode : dataNode) {
            JsonNode contentNode = jsonNode.get("content");
            String contentId = contentNode.get("content_id").asText();
            contentIdList.add(contentId);
            System.out.println("content_id: " + contentId);
        }
    }

# Jsoup 解析 HTML 文档

@Test
void getContents() {
for (String contendId : contentIdList) {
    // 定义请求 URL
    String url = String.format("https://juejin.cn/post/%s", contendId);
    try {
        Document doc = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.81")
                .get();

        // 获取博文标题
        Elements title = doc.select(".article-area .article-title");
        System.out.println("----------------博文标题----------------");
        System.out.println(title);
        // 获取博文正文
        Element content = doc.getElementById("article-root");
        System.out.println("---------------博文正文------------------");

        byte[] contentBytes = content.toString().getBytes(StandardCharsets.UTF_8);

        // 获取博文
        Article article = new Article();
        article.setId(Long.valueOf("7313418992310976777"));
        article.setTitle(title.text());
        article.setContent(contentBytes);
        article.setAuthorId(0L);
        article.setView(0);
        article.setLikes(0);
        article.setComments("");
        article.setCollects(0);
        article.setTags("");

        // 存储至 MySQL
        articleService.save(article);

        String decodedContent = new String(contentBytes, StandardCharsets.UTF_8);
        System.out.println("-------------解码后--------------");
        System.out.println(decodedContent);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
}

至此，启动后端项目之后，每隔六小时，系统便能够自动发送请求，爬取外部网站的热榜博文数据，从响应值中解析出博文数据并持久化至数据库 MySQL 中

← 缓存性能调优权限校验机制→