Home>
I want to know

I'd like to replace a string to fix the notation in python.
For example, suppose you have "green onion", "long onion", "kujo onion" in the text. I want to replace all the words that contain "leeks" with "leeks".
I can't find anything that suits what I want to do even after searching.
With replace () you can only replace one word, and it takes a long time to write all the patterns.
Is there any good way?

Additional
Is it possible to replace a green onion, a long onion, or a Kujo onion instead of specifying a green onion? If there are many types, it is difficult to grasp everything and it is difficult to write all patterns.

  • Answer # 1

      

    Replace all words that contain "leeks" with "leeks"

    To do this, you first need to do a morphological analysis.

    I only need to know the word to replace.

  • Answer # 2

    Carnegie Hall, Negitro, Koganegiku, Magical Teacher Negima, Negishi Systex, Takamine Guitar, etc.

    In simpler terms, are onions and leeks together?

    (The above is obviously a bad example. The onion and the onion are different depending on what you want to make)


    It takes a lot of work to maintain a dictionary manually, or to search for word-like sequences by summing up the frequency of neighborhoods of characters from large-scale text data.
    Also, I think that it is quite difficult to automatically judge whether you can put together as a representative just because they match. Whether it can be adopted as a hierarchical relationship of concepts is a process that must be made from information that is still manpowered or manually organized.


      

    Is it possible to replace a green onion, a long onion, or a Kujo onion instead of a green onion? If there are many types, it is difficult to grasp everything and it is difficult to write all patterns.

    If there is a kind of data that is difficult to understand,if it is true, when writing in a pattern, "how much should have been rewritten "Is there anything that should be rewritten and how much has been leaked?"I can't measure.
    It's trying to make a system that doesn't know how well it works.
    If "There are so many kinds that it is difficult to grasp everything" is correct, it is better to avoid rewriting with patterns.
    Is it true that there are so many kinds that are difficult to grasp?

    (Since there are many users and error collection can be done gradually after operation, it may be coarse at first, but it would be nice to talk like that)

  • Answer # 3

    It is better to use regular expressions Is it not? Use theremodule.

    In [1]: import re
    In [2]: target = 'Green onion, long onion and Kujo onion are onions'
    In [3]: re.sub ('(blue | long | kujo) leek', 'leek', target)
    Out [3]: 'Onions, leeks and leeks are leeks'

    '(Blue | Long | Kujo) Leek'is a word that is blue, long or Kujo followed by a leek

  • Answer # 4

    python Replace
    Please try google search.
    A variety of information can be obtained quickly from the answers.
    Although it is necessary to judge whether the information is good or bad, it should be the same for the answers on the QA site.

  • Answer # 5

    If you try to do it straightforwardly, complex natural language processing is likely to be required.

    However, whether or not you really need it depends on the purpose of use. So please indicate what you want to use.