Home>

I tried to put HTML acquired from a web page with GAS into a variablesourceand assign multiple data to an array with.match.
I don't know how to write a regular expression.


<p>10 days</p>
: Omitted
<p>18 days</p>
: Omitted
<p>20 days</p>


Assuming that there is the above part in the HTML of the web page, I would like to extract the** daypart of this and assign it to the array.
How should I write the regular expression in this case?


I tried various things, but I couldn't extract it from the whole page becausejss ***was unspecified.

The numerical part ofjss ***is fixed as 3 digits.

Other

HTML is assigned tosourcewith the following code.

const URL = 'https://example.com';// Target URL
  var key = '##-#####-#####-#####-#####-#####';
  var option =
      {url: URL,
       renderType: "HTML",
       outputAsJson: true};
  var payload = JSON.stringify (option);
  payload = encodeURIComponent (payload);
  var url = "https://phantomjscloud.com/api/browser/v2/"+ key +" /? request = "+ payload;
  var response = UrlFetchApp.fetch (url);
  var json = JSON.parse (response.getContentText ());
  var source = json ["content"] ["data"];

The other parts are a little brute force, but the other parts could be extracted as follows.

var itemRegexp = new RegExp (/ /g);
  var item = source.match (itemRegexp);
  for (var i = 0;i<item.length;i ++) {
    item [i] = item [i] .replace (" "," ");
    Logger.log (item [i]);
  }

Thank you

Addition (2019/11/27)

It seems that it was difficult to understand the execution environment.

Execution environment
Google Apps Script
Since this script file is written in ES5, I want to use ES5 basically.
  • Answer # 1

    If you can use matchAll, you can write ...

    const days = [... `
      

    10 days

      : Omitted   

    18 days

      : Omitted   

    20 days

    `.matchAll (/ jss \ d {3}">(. *?)match [1]); console.log (days); // =>["10 days", "18 days", "20 days"]


    Is it the following because it will not be used yet? I think that this part of the explanation of exec will be helpful.

    const source = `
      

    10 days

      : Omitted   

    18 days

      : Omitted   

    20 days

    `; const regex = /jss\d{3}">(.*?)["10 days", "18 days", "20 days"]


    This can also be achieved by passing the replacer function as the second argument of replace and doing the following:

    const days = [];
    `
    

    10 days

    : Omitted

    18 days

    : Omitted

    20 days

    `.replace (/ jss \ d {3}">(. *?){   days.push (sub); }); console.log (days); // =>["10 days", "18 days", "20 days"]

  • Answer # 2

      

    Tried to assign multiple data to an array

    If you are interested in arraying, you can also specify querySelectorAll () argument with attribute selector.

    The next three implement matching functions like regular expressions.

    [Attribute name^ =Value]

    [Attribute name$=Value]

    [Attribute name* =Value]

    If you include thejssof your question, you can get it withp [class * = jss].

      

    Assuming that there is the above part in the HTML of the web page, I would like to extract the ** day part from this and assign it to the array.
      How should I write the regular expression in this case?

    let paragraphs = document.querySelectorAll ("p [class * = jss]");
    let days = Array.from (paragraphs) .map (el =>el.textContent);
    console.log (days): // ["10 days", "18 days", "20 days"]


    Addendum)
    This is an example of handling DOM without acquiring HTML source.

Related articles