php 자체적으로 제공하는 DOMDocument class 를 이용해서 html xml parsing 예제 > 개발

사이트 내 전체검색

개발

php 자체적으로 제공하는 DOMDocument class 를 이용해서 html xml parsing 예제

페이지 정보

작성자 관리자 (112.♡.173.204)
댓글 0건 조회 3,591회 작성일 21-05-10 13:04

본문

## 출처
https://codingreflections.com/php-parse-html/

## html xml 파일 로드
```
$dom = new DOMDocument();

//examples

//methods to load HTML
$dom->loadHTML($html_string);
$dom->loadHTMLFile('path/to/htmlfile.html');

//methods to load XML
$dom->load('path/to/xmlfile.xml');
$dom->loadXML($xml_string);

$documentElement = $dom->documentElement;
//object of DOMElement Class which gives access to the document element
```

## Id 로 선택
```
$dom = new DomDocument();
@ $dom->loadHTML($res);

$table = $dom->getElementById('tablepress-3'); //DOMElement
$child_elements = $table->getElementsByTagName('tr'); //DOMNodeList
$row_count = $child_elements->length - 1;

echo "No. of rows in the table is " . $row_count;
```

## TagName 으로 선택
```
$dom = new DomDocument();
@ $dom->loadHTML($res);

$h2s = $dom->getElementsByTagName('h2');
foreach( $h2s as $h2 ) {
echo $h2->textContent . "\n";
}
```

## XPath 를 이용
```
$dom = new DomDocument();
@ $dom->loadHTML($res);

$xpath = new DOMXpath($dom);
$tables = $xpath->query("//table[contains(@class,'tablepress')]");
$count = $tables->length;

echo "No. of tables " . $count;
```

## a tag 링크 추출
```
$dom = new DomDocument();
@ $dom->loadHTML($res);

$links = $dom->getElementsByTagName('a');
$urls = [];
foreach($links as $link) {
$url = $link->getAttribute('href');
$parsed_url = parse_url($url);
if( isset($parsed_url['host']) && $parsed_url['host'] === 'wordpress.org' ) {
$urls[] = $url;
}
}
var_dump($urls);
```

## 문서에 새 HTML 요소 삽입
```
$dom = new DomDocument();
@ $dom->loadHTML($html);

$ps = $dom->getElementsByTagName('p');
$first_para = $ps->item(0);

$html_to_add = '<div><a hreh="#"><img src="image.jpeg"/></a></div>';
$dom_to_add = new DOMDocument();
@ $dom_to_add->loadHTML($html_to_add);
$new_element = $dom_to_add->documentElement;

$imported_element = $dom->importNode($new_element, true);
$first_para->parentNode->insertBefore($imported_element, $first_para->nextSibling);

$output = @ $dom->saveHTML();
echo $output;
```

## 문서에서 요소 삭제
```
$html = '<p>This is our first paragraph</p>
<div class="del">Delete this</div>
<p>This is our second paragraph</p>
<p>This is our third paragraph</p>
<div class="del">Delete this too</div>';

$dom = new DomDocument();
@ $dom->loadHTML($html);
$documentElement = $dom->documentElement;
echo $dom->saveHTML();

$xpath = new DOMXpath($dom);
$elems = $xpath->query("//div[@class='del']");

foreach( $elems as $elem ) {
$elem->parentNode->removeChild($elem);
}
echo '<br><br>-------after deletion--------<br><br>';
echo $dom->saveHTML();
```

## 속성 조작
```
getAttribute($attribute_name) // get the value of an attribute
setAttribute($attribute_name, $attribute_value) – set the value of an attribute
hasAttribute($attribute_name) – checks whether an element has a certain attribute and returns a true or false
$html = '<span class="myclass" data-action="show">Content</span>';
$dom = new DomDocument();
@ $dom->loadHTML($html);
$elem = $dom->getElementsByTagName('span')->item(0);

if( $elem->hasAttribute('data-action') ) {
echo 'attribute value is "' . $elem->getAttribute('data-action') . '"';
$elem->setAttribute('data-action', 'hide');
echo '<br>updated attribute value is "' . $elem->getAttribute('data-action') . '"';
}
```

## 출처
https://codingreflections.com/php-parse-html/
https://stackoverflow.com/questions/14395239/class-domdocument-not-found

추천0

댓글목록

등록된 댓글이 없습니다.

개발 목록
번호	제목	조회	날짜
326	그누보드 쿠키 시스템 이해 - PHP 와 Javascript 간 쿠키 공유에 대해서	2569	05-14
325	Protocol-relative_URL - http: 나 https: 지정이 없이 // 로 시작하는 URL 에 대해서	2788	05-11
열람중	php 자체적으로 제공하는 DOMDocument class 를 이용해서 html xml parsing 예제	3592	05-10
323	python venv 가상환경 전환시 '이 시스템에서 스크립트를 실행할 수 없으므로' 오류 해결방법	2741	05-08
322	pyinstaller 로 실행 파일 생성시 upx 를 이용해 exe 파일 용량 줄이기	4493	05-07
321	안드로이드 스튜디오에서 코드 작성시 중간선(취소선)의 의미	2947	05-07
320	kotlin 에서 화면 아래로 Pull Swipe 시에 새로고침 구현 - SwipeRefreshLayout	2590	05-06
319	Android Studio 4.2로 업데이트 후 Could not find org.jetbrains.kotlin:kotlin-gradle-plugin:1.5.0-release-764 오류	7009	05-06
318	Kotlin Android Extensions: Say goodbye to findViewById (KAD 04)	3097	05-04
317	python 에서 iterable한 객체를 argument로 받는 함수 종류	3765	05-04
316	python 문자열 찾기, list 의 각 요소에 특정 문자열이 포함된 요소 검색	1684	05-04
315	python 문자열 찾기, 문자열이 list 에 있는 요소를 포함하는지 체크하기	1632	05-04
314	python 에서 기본 datetime 모듈로 날짜 시간 출력하기	3157	05-04
313	php 에서 heredoc 과 nowdoc 에 대해서	1856	05-02
312	카카오맵 지도퍼가기 HTML 태그 복사 소스 생성하기 일반지도 에서 생성되는 소스	3161	05-01
311	CentOS 7 에 python pip 설치하기	2100	05-01
310	python BeautifulSoup4 에서 find 와 select 의 차이점	2026	05-01
309	python 에서 BeautifulSoup 을 사용할 때 lxml parser 를 사용하는 이유	2776	04-30
308	python 개발시 유의사항	1839	04-30
307	python selenium webdriver 로 웹사이트 full page scroll capture 하기	3450	04-30

개발 목록

번호

제목

조회

추천

날짜

326

그누보드 쿠키 시스템 이해 - PHP 와 Javascript 간 쿠키 공유에 대해서

2569

05-14

325

Protocol-relative_URL - http: 나 https: 지정이 없이 // 로 시작하는 URL 에 대해서

2788

05-11

열람중

php 자체적으로 제공하는 DOMDocument class 를 이용해서 html xml parsing 예제

3592

05-10

323

python venv 가상환경 전환시 '이 시스템에서 스크립트를 실행할 수 없으므로' 오류 해결방법

2741

05-08

322

pyinstaller 로 실행 파일 생성시 upx 를 이용해 exe 파일 용량 줄이기

4493

05-07

321

안드로이드 스튜디오에서 코드 작성시 중간선(취소선)의 의미

2947

05-07

320

kotlin 에서 화면 아래로 Pull Swipe 시에 새로고침 구현 - SwipeRefreshLayout

2590

05-06

319

Android Studio 4.2로 업데이트 후 Could not find org.jetbrains.kotlin:kotlin-gradle-plugin:1.5.0-release-764 오류

7009

05-06

318

Kotlin Android Extensions: Say goodbye to findViewById (KAD 04)

3097

05-04

317

python 에서 iterable한 객체를 argument로 받는 함수 종류

3765

05-04

316

python 문자열 찾기, list 의 각 요소에 특정 문자열이 포함된 요소 검색

1684

05-04

315

python 문자열 찾기, 문자열이 list 에 있는 요소를 포함하는지 체크하기

1632

05-04

314

python 에서 기본 datetime 모듈로 날짜 시간 출력하기

3157

05-04

313

php 에서 heredoc 과 nowdoc 에 대해서

1856

05-02

312

카카오맵 지도퍼가기 HTML 태그 복사 소스 생성하기 일반지도 에서 생성되는 소스

3161

05-01

311

CentOS 7 에 python pip 설치하기

2100

05-01

310

python BeautifulSoup4 에서 find 와 select 의 차이점

2026

05-01

309

python 에서 BeautifulSoup 을 사용할 때 lxml parser 를 사용하는 이유

2776

04-30

308

python 개발시 유의사항

1839

04-30

307

python selenium webdriver 로 웹사이트 full page scroll capture 하기

3450

04-30

1		범죄도시4	689,912
2		쿵푸팬더4	40,748
3		스턴트맨	22,639
4	1	포켓몬스터: 성도지방 이야기, 최	19,380
5	1	남은 인생 10년	8,165
6		파묘	4,643
7	2	꼬마참새 리차드: 신비한 보석 탐	3,147
8	1	챌린저스	3,110
9	1	극장판 실바니안 패밀리: 프레야의	2,127
10	6	브레드이발소: 셀럽 인 베이커리타	1,967