V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
godblessumilk
V2EX  ›  算法

[救救孩子] 有多个段落,如何提取出在每个段落中都出现过的 字/词/句/段落 ?

  •  
  •   godblessumilk · Aug 30, 2021 · 1528 views
    This topic created in 1702 days ago, the information mentioned may be changed or developed.
    有五个段落:

    para1 = "this is para one. I am cat. I am 10 years old. I like fish"
    para2 = "this is para two. I am dog. my age is 12. I can swim"
    para3 = "this is para three. I am cat. I am 9 years. I like rat"
    para4 = "this is para four. I am rat. my age is secret. I hate cat"
    para5 = "this is para five. I am dog. I am 10 years old. I like fish"

    希望提取得到如下结果:

    this is para
    I am
    I

    爸爸们,咋整?或者有没有现成的好用的 diff 工具能让我构造一条命令去执行系统调用,然后接收它的输出呜呜呜
    MorningStar0
        1
    MorningStar0  
       Aug 30, 2021
    直接上后缀树
    godblessumilk
        2
    godblessumilk  
    OP
       Aug 30, 2021
    @MorningStar0 后缀树上后缀果
    Grouie
        4
    Grouie  
       Aug 31, 2021 via iPhone
    tf-idf
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   1004 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 19:22 · PVG 03:22 · LAX 12:22 · JFK 15:22
    ♥ Do have faith in what you're doing.