Saturday, June 25, 2005

Re: Ant: Re: Ant: Re: link checker task ?

#include<stdio.h>
#include<expat.h>
#include<string.h>

#define string_t char *
#define XML_BUFFER_BLK 8192

void *
memalloc(size_t n)
{
void *mp;
int errno=0;
if((mp=(void *)calloc(n,sizeof(string_t)))==NULL)
fprintf(stderr,"Out of Memory");
return mp;
}

static void
link_start(void *data,const char *el,const char **attr)
{
int i=0;
printf("\n<%s>",el);

for(i=0;attr[i];i++)
printf("\n%s",*(attr+i));
}

static void
link_end(void *data,const char *el)
{
printf("\n<\\%s>\n\n",el);
/* why do we need this function */
}

int
link_parser(string_t filename)
{
XML_Parser parser;
char *xml_buf;
int flag;
size_t length;
FILE *fp;
string_t msg;

if((fp=fopen(filename,"rb"))==NULL)
{
fprintf(stderr,"Could not open the file");
exit(-1);
}

if((parser=XML_ParserCreate(NULL))==NULL)
{
fprintf(stderr,"Could not create the parser");
exit(-1);
}

XML_SetElementHandler(parser,link_start,link_end);

/* parse the document */

xml_buf=(string_t)memalloc(XML_BUFFER_BLK+1);

do
{
length=fread(xml_buf,1,XML_BUFFER_BLK,fp);
flag= length < strlen(xml_buf);

if(XML_Parse(parser,xml_buf,length,flag)==XML_STATUS_ERROR)
{
fprintf(stderr,"Parse Error at line %d\n%s\n",XML_GetCurrentLineNumber(parser),XML_ErrorString(XML_GetErrorCode(parser)));
exit(-1);
}

}while(!flag);

return 0;
}

int main(int argc,char **argv)
{
link_parser(argv[1]);

return 0;
}

Hi Alan,
Greetings!

I could manage to write a basic outline of the link-parser(self-contained).
I have output the tags and the way of identifying links. It will display
tags <a> ,href and URL. Got enormous help from expat doc,files as well as
digestp.c.

But these are yet to be done.
- Check to See if the resource being pointed exists.
- Generate link Report
I shall be working on it as well.

Please find my linkparser.c below (and as an attached file):
Let me know your comments:

---
#include<stdio.h>
#include<expat.h>
#include<string.h>

#define string_t char *
#define XML_BUFFER_BLK 8192

void *
memalloc(size_t n)
{
void *mp;
int errno=0;
if((mp=(void *)calloc(n,sizeof(string_t)))==NULL)
fprintf(stderr,"Out of Memory");
return mp;
}

static void
link_start(void *data,const char *el,const char **attr)
{
int i=0;
printf("\n<%s>",el);

for(i=0;attr[i];i++)
printf("\n%s",*(attr+i));
}

static void
link_end(void *data,const char *el)
{
printf("\n<\\%s>\n\n",el);
/* why do we need this function */
}

int
link_parser(string_t filename)
{
XML_Parser parser;
char *xml_buf;
int flag;
size_t length;
FILE *fp;
string_t msg;

if((fp=fopen(filename,"rb"))==NULL)
{
fprintf(stderr,"Could not open the file");
exit(-1);
}

if((parser=XML_ParserCreate(NULL))==NULL)
{
fprintf(stderr,"Could not create the parser");
exit(-1);
}

XML_SetElementHandler(parser,link_start,link_end);

/* parse the document */

xml_buf=(string_t)memalloc(XML_BUFFER_BLK+1);

do
{
length=fread(xml_buf,1,XML_BUFFER_BLK,fp);
flag= length < strlen(xml_buf);

if(XML_Parse(parser,xml_buf,length,flag)==XML_STATUS_ERROR)
{
fprintf(stderr,"Parse Error at line
%d\n%s\n",XML_GetCurrentLineNumber(parser),XML_ErrorString(XML_GetErrorCode(parser)));
exit(-1);
}

}while(!flag);

return 0;
}

int main(int argc,char **argv)
{
link_parser(argv[1]);

return 0;
}

---

Warm Regards,
Senthil

> Thanks for the update - I realise that it will take time to get up to
> speed (there is a lot to read through ;)
>
> All the best,
> Alan.
>
> senthil@puggy.symonds.net schrieb:
> Alan,
> Just to keep you updated. I am still reading the code,understanding and
> trying to get started.
> Understood the recent updates,which I viewed using cvs diff.
> I shall email you with things I can puttogether and with the questions I
> have.
>
> ~, just to keep you updated...
>
> Thanks a lot!
> Senthil
>
>
>
>
>> Thanks,
>>
>> In that case I will assign you the link task. Don't worry about the
>> other
>> bugs (incl. -d) option for now as these are code stability details
>> (rather
>> than features) so I will clean them up as I would like to create a
>> workable distribution tarball reasonably soon.
>>
>> Now that you have installed, configured and got rapple to run then any
>> feedback concerning the documentation on the web would also be useful
>> (or
>> indeed anything you think needs to be added to the faq).
>>
>> Regards,
>> Alan.
>>
>>
>> senthil@puggy.symonds.net schrieb:
>> Hi Alan,
>>
>> Yeah, I would like to work on this module.
>> I have updated the rapple cvs and saw the digest file as well. But did
>> not
>> check the functionality yet.
>> I shall work on both the parser and handler part. I was also looking
>> into
>> the -d option feature request.
>>
>> I shall start the work on Monday, if it is ok with you. else, u can
>> assign
>> me any other other task as well.
>>
>> I am going to attend my friends wedding tommrow and I will be back home
>> only on Monday.
>>
>> Thanks for explaining this to me. I have come across and coded few
>> parser
>> related snippets from K&R, Together with that and with the other rapple
>> files, I think I should be able to do this.
>>
>> Regards,
>> Senthil
>>
>>
>>
>>
>>> Senthil,
>>>
>>> Here is a suggestion for a self contained but
>>> challenging task you might be interested in: do you
>>> want to give a try at writing the link checker parser
>>> (Task 115866) ?
>>>
>>> The idea is that there are two files involved: a
>>> parser and a handler. The handler invokes the parser
>>> and passes files to it, e.g., the handler would
>>> recursively traverse a directory tree and invoke the
>>> parser on each transformable file it can find.
>>>
>>> The parser is limited to processing inndividual files
>>> but would work like this: it reads the input file and
>>> scans it looking for certain elements that have
>>> attributes that link to resources (e.g., "img", "a").
>>> When it finds such an element it checks the
>>> appropriate attribute (e.g., for "img" it is "src" and
>>> for "a" it is "href") and checks to see if the
>>> resource being pointed to exists (e.g., is the "src"
>>> or "href" file present in the datastore). For now I
>>> would not propose you check external links (e.g., if
>>> "href" begins with "http://" then just ignore it and
>>> also ignore "mailto:" links etc.) The parser should
>>> generate a link report as it goes along (perhaps just
>>> naming files that are missing).
>>>
>>> If you have never worked with parsers before (they can
>>> be a bit confusing at first) then take a look at
>>> "digestp.*" files (which are the parsers) and
>>> "catalog.*" (which are handlers) for the digest parser
>>> I wrote last weekend (you will have to update your
>>> local working copies if you have not done so
>>> recently).
>>>
>>> If you like I can write the handler for you so that
>>> you can focus on the parser - other parser examples
>>> can be found in the examples directory of the expat
>>> source code.
>>>
>>> Let me know what you think.
>>>
>>> Regards,
>>> Alan.

No comments: