# NAME Web::Query - Yet another scraping library like jQuery # SYNOPSIS use Web::Query; wq('http://www.w3.org/TR/html401/') ->find('div.head dt') ->each(sub { my $i = shift; printf("%d %s\n", $i+1, $_->text); }); # DESCRIPTION Web::Query is a yet another scraping framework, have a jQuery like interface. Yes, I know Ingy's [pQuery](https://metacpan.org/pod/pQuery). But it's just a alpha quality. It doesn't works. Web::Query built at top of the CPAN modules, [HTML::TreeBuilder::XPath](https://metacpan.org/pod/HTML::TreeBuilder::XPath), [LWP::UserAgent](https://metacpan.org/pod/LWP::UserAgent), and [HTML::Selector::XPath](https://metacpan.org/pod/HTML::Selector::XPath). So, this module uses [HTML::Selector::XPath](https://metacpan.org/pod/HTML::Selector::XPath) and only supports the CSS 3 selector supported by that module. Web::Query doesn't support jQuery's extended queries(yet?). __THIS LIBRARY IS UNDER DEVELOPMENT. ANY API MAY CHANGE WITHOUT NOTICE__. # FUNCTIONS - `wq($stuff)` This is a shortcut for `Web::Query->new($stuff)`. This function is exported by default. # METHODS ## CONSTRUCTORS - my $q = Web::Query->new($stuff, \\%options ) Create new instance of Web::Query. You can make the instance from URL(http, https, file scheme), HTML in string, URL in string, [URI](https://metacpan.org/pod/URI) object, and instance of [HTML::Element](https://metacpan.org/pod/HTML::Element). This method throw the exception on unknown $stuff. This method returns undefined value on non-successful response with URL. Currently, the only option valid option is _indent_, which will be used as the indentation string if the object is printed. - my $q = Web::Query->new\_from\_element($element: HTML::Element) Create new instance of Web::Query from instance of [HTML::Element](https://metacpan.org/pod/HTML::Element). - `my $q = Web::Query->new_from_html($html: Str)` Create new instance of Web::Query from HTML. - my $q = Web::Query->new\_from\_url($url: Str) Create new instance of Web::Query from URL. If the response is not success(It means /^20\[0-9\]$/), this method returns undefined value. You can get a last result of response, use the `$Web::Query::RESPONSE`. Here is a best practical code: my $url = 'http://example.com/'; my $q = Web::Query->new_from_url($url) or die "Cannot get a resource from $url: " . Web::Query->last_response()->status_line; - my $q = Web::Query->new\_from\_file($file\_name: Str) Create new instance of Web::Query from file name. ## TRAVERSING ### add Add elements to the set of matched elements. - add($html) An HTML fragment to add to the set of matched elements. - add(@elements) One or more @elements to add to the set of matched elements. - add($wq) An existing Web::Query object to add to the set of matched elements. - add($selector, $context) $selector is a string representing a selector expression to find additional elements to add to the set of matched elements. $context is the point in the document at which the selector should begin matching ### contents Get the immediate children of each element in the set of matched elements, including text and comment nodes. ### each Visit each nodes. `$i` is a counter value, 0 origin. `$elem` is iteration item. `$_` is localized by `$elem`. $q->each(sub { my ($i, $elem) = @_; ... }) ### end Back to the before context like jQuery. ### filter Reduce the elements to those that pass the function's test. $q->filter(sub { my ($i, $elem) = @_; ... }) ### find Get the descendants of each element in the current set of matched elements, filtered by a selector. my $q2 = $q->find($selector); # $selector is a CSS3 selector. __NOTE__ If you want to match the element itself, use ["filter"](#filter). __INCOMPATIBLE CHANGE__ From v0.14 to v0.19 (inclusive) find() also matched the element itself, which is not jQuery compatible. You can achieve that result using `filter()`, `add()` and `find()`: my $wq = wq('<div class="foo"><p class="foo">bar</p></div>'); # needed because we don't have a global document like jQuery does print $wq->filter('.foo')->add($wq->find('.foo'))->as_html; # <div class="foo"><p class="foo">bar</p></div><p class="foo">bar</p> ### first Return the first matching element. This method constructs a new Web::Query object from the first matching element. ### last Return the last matching element. This method constructs a new Web::Query object from the last matching element. ### map Creates a new array with the results of calling a provided function on every element. $q->map(sub { my ($i, $elem) = @_; ... }) ### parent Get the parent of each element in the current set of matched elements. ### prev Get the previous node of each element in the current set of matched elements. my $prev = $q->prev; ### next Get the next node of each element in the current set of matched elements. my $next = $q->next; ## MANIPULATION ### add\_class Adds the specified class(es) to each of the set of matched elements. # add class 'foo' to <p> elements wq('<div><p>foo</p><p>bar</p></div>')->find('p')->add_class('foo'); ### after Insert content, specified by the parameter, after each element in the set of matched elements. wq('<div><p>foo</p></div>')->find('p') ->after('<b>bar</b>') ->end ->as_html; # <div><p>foo</p><b>bar</b></div> The content can be anything accepted by ["new"](#new). ### append Insert content, specified by the parameter, to the end of each element in the set of matched elements. wq('<div></div>')->append('<p>foo</p>')->as_html; # <div><p>foo</p></div> The content can be anything accepted by ["new"](#new). ### as\_html Return the elements associated with the object as strings. If called in a scalar context, only return the string representation of the first element. ### ` attr ` Get/Set the attribute value in element. my $attr = $q->attr($name); $q->attr($name, $val); ### before Insert content, specified by the parameter, before each element in the set of matched elements. wq('<div><p>foo</p></div>')->find('p') ->before('<b>bar</b>') ->end ->as_html; # <div><b>bar</b><p>foo</p></div> The content can be anything accepted by ["new"](#new). ### clone Create a deep copy of the set of matched elements. ### detach Remove the set of matched elements from the DOM. ### has\_class Determine whether any of the matched elements are assigned the given class. ### ` html ` Get/Set the innerHTML. my @html = $q->html(); my $html = $q->html(); # 1st matching element only $q->html('<p>foo</p>'); ### insert\_before Insert every element in the set of matched elements before the target. ### insert\_after Insert every element in the set of matched elements after the target. ### ` prepend ` Insert content, specified by the parameter, to the beginning of each element in the set of matched elements. ### remove Delete the elements associated with the object from the DOM. # remove all <blink> tags from the document $q->find('blink')->remove; ### remove\_class Remove a single class, multiple classes, or all classes from each element in the set of matched elements. ### replace\_with Replace the elements of the object with the provided replacement. The replacement can be a string, a `Web::Query` object or an anonymous function. The anonymous function is passed the index of the current node and the node itself (with is also localized as `$_`). my $q = wq( '<p><b>Abra</b><i>cada</i><u>bra</u></p>' ); $q->find('b')->replace_with('<a>Ocus</a>); # <p><a>Ocus</a><i>cada</i><u>bra</u></p> $q->find('u')->replace_with($q->find('b')); # <p><i>cada</i><b>Abra</b></p> $q->find('i')->replace_with(sub{ my $name = $_->text; return "<$name></$name>"; }); # <p><b>Abra</b><cada></cada><u>bra</u></p> ### size Return the number of elements in the Web::Query object. wq('<div><p>foo</p><p>bar</p></div>')->find('p')->size; # 2 ### text Get/Set the text. my @text = $q->text(); my $text = $q->text(); # 1st matching element only $q->text('text'); If called in a scalar context, only return the string representation of the first element # HOW DO I CUSTOMIZE USER AGENT? You can specify your own instance of [LWP::UserAgent](https://metacpan.org/pod/LWP::UserAgent). $Web::Query::UserAgent = LWP::UserAgent->new( agent => 'Mozilla/5.0' ); # INCOMPATIBLE CHANGES - 0.10 new\_from\_url() is no longer throws exception on bad response from HTTP server. # AUTHOR Tokuhiro Matsuno <tokuhirom AAJKLFJEF@ GMAIL COM> # SEE ALSO [pQuery](https://metacpan.org/pod/pQuery) # LICENSE Copyright (C) Tokuhiro Matsuno This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.